

Statistical analysis of prosodie parameters and emotion recognition of multilingual speech

  • 摘要: 韵律特征参数的变化是语音信号中情感信息主要体现。为了研究基于少量韵律特征的多语种语音样本情感识别的可行性,以提高情感识别系统对语种信息的鲁棒性,实验选取七种典型的情感状态,对指定句式下同一说话人在汉语、英语、日语多语种语音样本中的基频、能量、时间等韵律参数的动态特性进行统计分析。统计结果表明,不同语种情感语音样本的各种韵律特征参数的变化结构有较好的一致性。在这一结论基础上,利用主元素分析方法(PCA)对多语种混合样本进行了初步的情感识别实验,平均错误率为27.74%,最低识别错误率为11%。可见,通过基本的韵律参数可以实现对几种基本情感忽略语种信息的初步有效识别。


    Abstract: The features of prosodie parameters are considered as the direct reflection of emotional information in speech signals. In order to research the feasibility of emotion recognition based on basic prosodie parameters and improve the robust of language-independent emotion recognition system, statistical analysis of pitch, energy and time parameters of multilingual emotional speech is discussed. A corpus of emotional speech spoken by one speaker in Chinese, English, and Japanese is collected. Principle Component Analysis (PCA) method is used to recognize the states of emotion in multilingual speech. The mean error rate of recognition is 27.74% and the lowest error rate is 11%. The statistical analysis shows that language factor doesn't effect pitch variation features of some given emotion obviously. And according to the recognition results we can conclude that basic emotion states in multilingual speech can be recognized by a few simple prosodie parameters.


