Abstract:
This paper focuses research on acoustic modeling unit selection in Chinese Putonghua spontaneous speech recognition. Under HMM three-state models,two most popular modeling units,namely extended initial/final (XIF) units and phoneme units,have their own advantages and drawbacks.On one hand,from the perspective of serious pronunciation variation problem in spontaneous speech,the coarsely granular XIF units are preferred to gather up all kinds of pronunciation variations.On the other hand,from the perspective of the low-distinguish ability of three-state structure for complex modeling units,the finely granular phoneme units are preferred.In this paper,based on theoretical achievements of experimental phonetics and the experimental results of duration analysis of XIF units,we propose an XIF model with separating nasal coda.Experiments carried out on a Chinese Putonghua spontaneous speech recognition task show that our proposed method is better than the XIF modeling and phoneme-based modeling,with the character error rate is reduced by 2.23% and 9.45% respectively.