段长信息在连续语音识别中的应用研究
A study of duration in continuous speech recognition based on DDBHMM
-
摘要: 基于段长分布的隐含马尔可夫模型(DDBHMM)有效地解决了经典HMM的缺陷.本文以DDBHMM模型为基础,详细研究了如何在连续语音识别中有效地利用段长信息。文中首先介绍了段长分布的统计方法,然后按照不同的说话速度对数据文件进行了分类,据此进行的识别实验表明,段长信息对于速度慢的文件效果最好,速度中等的次之,速度快的效果较小.作者认为,段长信息最大的作用在于能够得到更加精确的音节和状态分割点,并因而提高识别效果.同时,通过段长信息的有效利用,还能够提高识别系统对于说话速度的稳健性、作者又进行了细化研究,提出了利用分类段长和规整化的段长的研究方法,发现两者均可使识别效果有进一步的提高.为了研究如何利用段长之间的相关性,文中还提出了段长的Bigram的方法,并对之作了分析.最后,本文研究了采用后处理方法利用段长信息的效果,进一步说明了只有基于DDBHMM,在识别过程中同步利用段长信息,才能得到卓有成效的性能提高。Abstract: DDBHMM solved the defects of traditional HMM.Based on DDBHMM,the problem of how to effectivelyutilize the duration information is studied in detail.The approach on estimating the duration distribution is introducedfirstly?then the data file is classified according to the speak rate.The recognition experiment shows that,the durationinformation behaves best on the data of low speak rate,behaves normal on the data of medium speak rate and has littleeffect on the data of fast speak rate.Therefore,the most importance of duration is that by it the more accurate statesegmentation point could be obtained and then the recognition rate can be improved.At the same time,the robustnessof the system to speaking rate is improved with the employment of the duration information.Furthermore,the methodof classified duration and normalized duration is also put forward and studied in detail,it shows that both of the twomethod can improve the effect.In order to study the dependency between the duration,the method of using the Bigramof the duration is proposed and analyzed.At last,the approach of post processing duration is studied,it shows thatonly based on DDBHMM,and utilizing the duration information synchronously in the recognition process,then theperformance can be improved greatly.