用于噪声鲁棒性语音识别的子带能量规整感知线性预测系数
Sub-band power normalized perceptual linear predictive coefficients for robust automatic speech recognition
-
摘要: 为了提高感知线性预测系数(PLP)在噪声环境下的识别性能,使用子带能量偏差减的方法,提出了一种基于子带能量规整的感知线性预测系数(SPNPLP)。PLP有效地集中了语音中的有用信息,在安静环境下自动语音识别系统使用PLP可以取得良好的识别率;但是在噪声环境中其识别性能急剧下降。通过使用能量偏差减的方法对PLP的子带能量进行规整,抑制背景噪声激励,提出了SPNPLP,增强自动语音识别系统在噪声环境下的鲁棒性。在一个语法大小为501的孤立词识别任务和一个大词表连续语音识别任务上做了测试,SPNPLP在这两个任务上,与PLP相比,汉字识别精度分别绝对提升了11.26%和9.2%。实验结果表明SPNPLP比PLP具有更好的噪声鲁棒性。Abstract: In order to improve the noise robustness of perceptual linear predictive (PLP) coefficients,one kind of features called sub-band power normalized perceptual linear predictive (SPNPLP) coefficients using power bias subtraction is presented.PLP captures the most useful information of speech and fits well with the assumptions used in hidden Markov models.Automatic speech recognition (ASR) systems with PLP have obtained satisfactory performance in benign environments.Nevertheless,performance of ASR drops dramatically in noisy environments.In this work,power bias subtraction that suppresses background excitation is introduced to normalize the sub-band power of PLP,and SPNPLP is proposed to increase the robustness of ASR against additive background noise.Recognition performances are evaluated on an isolated-word recognition task with 501 items and a large vocabulary continuous speech recognition(LVCSR) task.The average improvements upon the standard PLP are 11.26 and 9.2 respectively on these two tasks. The experimental results show that the proposed SPNPLP is consistently more robust than PLP.