声门下共鸣的谱规整用于非特定人的语音识别
Spectrum warping based on sub-glottal resonances in speaker-independent speech recognition
-
摘要: 提出在参数的提取过程中用不同的感知规整因子对不同人的参数归一化,从而实现在非特定人语音识别中对不同人的归一化处理。感知规整因子是基于声门上和声门下之间耦合作用产生声门下共鸣频率来估算的,与采用声道第三共振峰作为基准频率的方法比较,它能较多的滤除语义信息的影响,更好地体现说话人的个性特征。本文提取抗噪性能优于Mel倒谱参数的感知最小方差无失真参数作为识别特征,语音模型用经典的隐马尔可夫模型(HMM)。实验证明,本文方法与传统的语音识别参数和用声道第三共振峰进行谱规整的方法相比,在干净语音中单词错误识别率分别下降了4%和3%,在噪声环境下分别下降了9%和5%,有效地改善了非特定人语音识别系统的性能。Abstract: In an effort to reduce the degradation caused by variation of different speaker in speech recognition,a new perceptual frequency warping based on sub-glottal resonances to speaker normalization is investigated.A new warping factor is extracted from the second sub-glottal resonance that is based on acoustic coupling between the sub-glottal and vocal tract.Second sub-glottal resonance is independent of the speech content,and it embodiment speaker character more than the third format.Then it is used to normalize the PMVDR coefficients,which are a speech coefficients based on perceptual Minimum variance distortionless response (PMDVR) and is more robustness and anti-noise than traditional MFCC,utilizing the normalized coefficients to speech mode training and recognition. The experiments show that the word error rate comparing with mel frequency cepstrum and the spectrum warping by third formant decreases 4% and 3% in clean speech recognition,9% and 5% in noise speech recognition.The results demonstrate this method to improve word recognition accuracy of speaker independent recognition system.