EI / SCOPUS / CSCD 收录

中文核心期刊

基于声纹特征的伪造语音检测

Spoof speech detection based on speaker features

  • 摘要: 目前的伪造语音检测方法在特定数据集上表现良好, 但鲁棒性和可解释性较差。伪造语音生成通常通过单一特征实现说话人表示, 缺乏对声纹特征的精细控制, 导致真伪语音声纹特征分布存在差异。为此提出了一种基于声纹特征的伪造语音检测方法。该方法通过在预训练声纹识别系统基础上训练部分参数, 建模真伪语音之间浅层帧级声纹特征的分布差异, 实现伪造语音检测。该方法还改善了直接使用声纹特征鉴伪难以应对单元选择合成等音色高度相似的伪造算法的情况。所提方法在ASVspoof 2019 LA测试集中相比训练声纹识别系统全部参数的基线系统等错率相对降低69.6%, 且在跨信道和切除静音等场景中都具有良好的鲁棒性。

     

    Abstract: Current methods for detecting spoof speech exhibit good performance on specific datasets but suffer from poor robustness and interpretability. The generation of spoof speech typically relies on a fixed feature for speaker representation, which lacks fine-grained control over speaker features, leading to differences in the distribution of speaker features between bonafide and spoof speech. Based on this pattern, a speaker feature based spoof speech detection method is proposed. This method trains a subset of parameters based on pre-trained speaker verification systems, modeling the differences in the distribution of frame-level speaker features between bonafide and spoof speech to achieve detection. Furthermore, this approach ameliorates the challenges faced when directly using speaker features for detection, especially in scenarios involving highly similar spoofing algorithms such as unit selection synthesis. The proposed method reduces the equal error rate by 69.6% on the ASVspoof 2019 LA test set compared to the baseline system that trains all parameters of the speaker verification system, while demonstrating good robustness in scenarios such as cross-channel and silence removal.

     

/

返回文章
返回