EI / SCOPUS / CSCD 收录

中文核心期刊

维吾尔语方言识别及相关声学分析

Acoustic analysis and language recognition of Uygur

  • 摘要: 根据语音识别和声纹识别等语音应用研究的实际需要,首次对和田方言的声学特性和识别进行研究。首先选取和田方言语音进行人工多层级标注,对元音的共振峰、时长和音强进行统计分析,描绘出和田方言主体格局及男性和女性的发音特点。然后运用方差分析和非参数分析法对维吾尔语3种方言的共振峰样本进行检验,结果表明3种方言的男性元音、女性元音及整体元音的共振峰分布模式存在显著差异。最后,分别构建基于GMM-UBM (Gaussian Mixture Model-Universal Background Model)、DNN-UBM (Deep Neural Networks-Universal Background Model)和LSTM-UBM (Long Short Term MemoryUniversal Background Model)维吾尔语方言识别模型,对基于梅尔频率倒谱系数及其与共振峰频率组合做输入特征提取的方言i-vector区分性进行对比实验。实验结果表明融入共振峰系数的组合特征可以增加方言的辨识度,且LSTM-UBM模型较GMM-UBM和DNN-UBM能提取到更具区分性的方言i-vector。

     

    Abstract: According to meet the need of speech recognition and speaker recognition in Hotan area,we have completed the acoustic analysis and model of Hotan dialect for the first time.At first,by choosing and annotation the sentences of Hotan dialect,we conduct the acoustic statistical analysis of vowel formant frequency,duration and intensity.Based on that,the main pattern of Hotan dialect vowels,vowels pronunciation spoken by male and female are described.Then we have built the based GMM-UBM,DNN-UBM and LSTM-UBM dialect accent recognition model respectively,which based on to compare the formant patter,of three of Uygur dialects,and find some significant differences between them.At last we compared the dialect discrimination of i-vectors between using MFCC coefficient with and without formant frequency as input feature respectively.It shows that the recognition rate using combination features is better than a single feature.

     

/

返回文章
返回