综合非语境因素的语音数据分类与声学建模研究

Integrating non-context features in speech data classification and modeling

摘要: 分别采用基于数据聚类和基于先验知识的两种研究方法,深入探讨了性别、口音、语速、信道等非语境因素对语音数据分类与建模的影响。为了综合考虑语境、非语境因素在统一框架下建模的问题,采用非语境因素扩展决策树方法。而对于这种方法生成的多套非语境因素相关的高精度声学模型,提出一种依据最大似然准则,动态组合生成测试人相关声学模型的算法。这种方法可以使系统相对误识率平均降低8%~10%。实验结果说明为非语境因素分类建模可以提高声学模型的建模能力,而且模型组合算法可以有效解决统一建模所带来的模型选择问题。

Abstract: Effects of the non-context features, such as gender, speaker group identity, speaking rate and channel, for the classification and modeling of the speech data are studied based on data clustering and pre-classification knowledge methods. In order to incorporate non-context features with the context ones in the modeling process, generalized feature decision tree scheme is adopted and extended for the building of multiple high resolution acoustic models. Maximum likelihood model combination is then advanced to solve the subsequent model selection problem. Experimental results on two sets indicated that 8.