基于最大似然多项式回归的鲁棒语音识别
Maximum likelihood polynomial regression for robust speech recognition
-
摘要: 本文针对最大似然线性回归算法线性假设的缺点,将多项式回归方法用于模型自适应,构建了基于最大似然多项式回归的非线性模型自适应算法。该算法在对数谱域用多项式回归方法,逼近每个Mel子带上识别环境模型均值与训练环境模型均值之间的非线性关系。多项式系数通过EM算法和最大似然准则从识别环境下的少量自适应数据中估计。实验结果表明,二阶多项式就可以较好地逼近模型均值的非线性环境变换关系。在噪声补偿和说话人自适应实验中,最大似然多项式回归算法的误识率都明显低于最大似然线性回归算法。本文算法较好地克服了线性模型自适应算法线性假设的缺陷,可同时减小噪声,和说话人的改变或其它因素对语音识别系统的影响,尤其适合说话人和噪声的联合自适应。Abstract: The linear hypothesis is the main disadvantage of maximum likelihood linear regression(MLLR).This paper applies the polynomial regression method to model adaptation and establishes a nonlinear adaptation algorithm using maximum likelihood polynomial regression(MLPR) for robust speech recognition.In this algorithm,the nonlinear relationship between training and testing mean vectors in every Mel-band is approximated by a set of polynomials.The polynomial coefficients are estimated from small adaptation data in test environment by the expectation-maximization (EM) algorithm and maximum likelihood(ML) criterion.The experimental results show that the second-order polynomial can approximate the nonlinear function of training and testing mean vectors perfectly.In noise compensation and speaker adaptation,the word error rates of MLPR are significantly lower than those of MLLR.The proposed algorithm overcomes the limitation of linear hypothesis well and can decrease the impact of noise,speaker and other factors simultaneously. It is especially suitable for joint adaptation of speaker and noise.