EI / SCOPUS / CSCD 收录

中文核心期刊

全局特征及弱尺度融合策略的小样本语音情感识别

Small sample size speech emotion recognition based on global features and weak metric learning

  • 摘要: 语音是一种短时平稳时频信号,因此大多数的研究者都通过分帧来提取情感特征。然而,分帧后提取的特征为局部特征,无法准确反应情感语音动态特性,故单纯采用局部特征往往无法构建鲁棒的情感识别系统。针对这个问题,先在不分帧的语音信号里通过多尺度最优小波包分解提取语句级全局特征,分帧后再提取384维的语句级局部特征,并利用Fisher准则进行降维,最后提出一种弱尺度融合策略来将这两种语句级特征进行融合,再利用SVM进行情感分类。基于柏林情感库的实验结果表明本文方法较单纯使用语句级局部特征最后识别率提高了4.2%到13.8%,特别在小样本的情况下,语音情感识别率波动较小。

     

    Abstract: The emotional speech is a kind of non-stationary time and frequency signal,and it has been shown that local features extracted from each frame make great contribution to speech emotion recognition.However,it's inadequate to use only local features to build a robust speech emotion classification system,as local features extracted from speech divided into frames can not reflect the dynamic characteristics of emotion speech signal accurately.In this paper, utterance-level global features without dividing the emotion speech into frames based on multi-scale optimal wavelet packet decomposition,and 384-dimensional utterance-level local features,are extracted together to improve the robustness and recognition rate of classification system.Given less training samples,while the dimensions of eigenvectors being reduced by Fisher discriminant,a fusion strategy with metric learning,which is called weak metric learning in this work,is adopt for fusing global and local utterance-level features.The experimental results with LIBSVM show that our method achieves significant improvements about 4.2% to 13.8% with comparison to using local utterance-level feature merely,and the speech emotion recognition rate has less fluctuations especially in the case of small sample size.

     

/

返回文章
返回