Detection of replay spoof speech using global self-attentive Teager energy features
-
Graphical Abstract
-
Abstract
This paper proposes an energy-based front-end feature extraction method to address the threat of replay attacks in automatic speaker verification systems. This method achieves variable resolution over the entire frequency band to fully utilize the highly discriminative nonlinear information in sub-band energy between replayed speech and real speech. First, statistical analysis of various recording and playback devices is carried out by adopting the F-ratio method. Then, according to the statistical results, a set of filters on the whole frequency band is designed to capture high discriminative energy information. Finally, the Teager energy operator is used to calculate the energy of the sub-band filtered signal, and the global self-attentive Teager energy cepstral coefficients (GSTECC) is proposed. In order to verify the effectiveness of the proposed method, the Gaussian mixture model is used as the classifier, and a series of test experiments are conducted on the ASVspoof 2017 V2 and ASVspoof 2021 PA databases. Experimental results show that the proposed GSTECC feature performs better in detecting replay attacks compared to other advanced feature extraction methods.
-
-