Abstract:
Current methods for detecting spoof speech exhibit good performance on specific datasets but suffer from poor robustness and interpretability. The generation of spoof speech typically relies on a fixed feature for speaker representation, which lacks fine-grained control over speaker features, leading to differences in the distribution of speaker features between bonafide and spoof speech. Based on this pattern, a speaker feature based spoof speech detection method is proposed. This method trains a subset of parameters based on pre-trained speaker verification systems, modeling the differences in the distribution of frame-level speaker features between bonafide and spoof speech to achieve detection. Furthermore, this approach ameliorates the challenges faced when directly using speaker features for detection, especially in scenarios involving highly similar spoofing algorithms such as unit selection synthesis. The proposed method reduces the equal error rate by 69.6% on the ASVspoof 2019 LA test set compared to the baseline system that trains all parameters of the speaker verification system, while demonstrating good robustness in scenarios such as cross-channel and silence removal.