EI / SCOPUS / CSCD 收录

中文核心期刊

使用全局自注意Teager能量倒谱系数检测重放欺骗语音

Detection of replay spoof speech using global self-attentive Teager energy features

  • 摘要: 提出了一种基于能量的前端特征提取方法, 旨在应对自动说话人验证系统中面临的重放攻击威胁。该方法实现了全频段上的可变分辨率, 以充分利用重放语音与真实语音在子带能量上的高鉴别非线性信息。首先, 通过采用F-ratio方法统计分析了多种录音和播放设备。接着, 根据统计结果在全频段上设计了一组滤波器, 旨在捕获高鉴别能量信息。最后, 利用Teager能量算子计算子带滤波信号的能量, 提出了全局自注意Teager能量倒谱系数(GSTECC)。为了验证所提方法的有效性, 采用高斯混合模型作为分类器, 在ASVspoof 2017 V2和ASVspoof 2021 PA数据库上进行了一系列测试实验。实验结果表明, 相对于其他先进特征提取方法, 所提GSTECC特征在检测重放攻击方面表现出更优异的性能。

     

    Abstract: This paper proposes an energy-based front-end feature extraction method to address the threat of replay attacks in automatic speaker verification systems. This method achieves variable resolution over the entire frequency band to fully utilize the highly discriminative nonlinear information in sub-band energy between replayed speech and real speech. First, statistical analysis of various recording and playback devices is carried out by adopting the F-ratio method. Then, according to the statistical results, a set of filters on the whole frequency band is designed to capture high discriminative energy information. Finally, the Teager energy operator is used to calculate the energy of the sub-band filtered signal, and the global self-attentive Teager energy cepstral coefficients (GSTECC) is proposed. In order to verify the effectiveness of the proposed method, the Gaussian mixture model is used as the classifier, and a series of test experiments are conducted on the ASVspoof 2017 V2 and ASVspoof 2021 PA databases. Experimental results show that the proposed GSTECC feature performs better in detecting replay attacks compared to other advanced feature extraction methods.

     

/

返回文章
返回