基于帧特征、段特征联合建模的语音识别模型
A model for speech recognition based on joint modeling of frame-based and segmental features
-
摘要: 提出了基于帧特征、段特征联合建模的语音识别模型。该模型采用描述谱参数轨迹的段特征,在段尺度上实现了对语音信号帧间相关性的显式建模;采用段特征依赖的非平稳时间序列产生模型,实现了段特征与帧特征间的相关性建模,并在帧尺度上通过参数化的均值轨迹函数,实现了对语音信号帧间相关性的隐式建模。本文给出了基于帧特征、段特征联合统计距离优化的分段算法以及内嵌EM迭代的模型参数估计算法。对非特定人汉语孤立韵母以及多话者汉语基本音节的识别实验表明,该模型的识别性能优于标准HMM及趋势HMM。Abstract: This paper presents a model for speech recognition based on the joint modeling of the frame-based andsegmental features.The new model explicitly models the correlation among successive frames of speech signals onsegment scale by using segmental features representing contours of spectral parameters.By using a proposed segmentalfeature dependent non-stationary time series model,the new model not only achieves the modeling of correlation betweenframe-based features and segmental features,but also implicitly models the correlation among neighboring frames onframe scale via parametric mean trajectory function.In this paper,a modified Viterbi algorithm based on joint statisticaldistance of frame-based and segmental features is proposed,and an algorithm with embedded EM iteration for estimatingthe model parameters is also proposed in the training.Experimental results on a speaker independent isolated mandarinfinal database and a multi-speaker isolated mandarin base syllable database show that the new model achieves betterperformance than the standard HMM and the trended HMM.