Speech emotion recognition using stacked generative and discriminative hybrid models
-
-
Abstract
Generative models and discriminative models have advantages and disadvantages on internal distribution, optimize classification results,dynamic variation characteristics of emotion.This paper attempts to fuse the two kinds of models together and speech emotion recognition based on stacked hybrid generative and discriminative models.First, we reduce the dimensions of utterance-level eigenvectors from 63 to 12 by fisher discriminant,which is used for the stacked discriminative models.Then we use Sequential Forward Selection to select 8 dimensional frame-level features from the total 69 dimensional features,and two kind of GMM multidimensional likelihoods(the same dimension as eigenvector and mixtures of GMM) are proposed for hybrid generative and discriminative models.Experimental results on Berlin emotional speech databases show that(1) hybrid generative and discriminative models achieves significant improvements than merely using WNN,GMM,HMM,or SVM;(2) the recognition rate of the stacked generative and discriminative hybrid models is higher than the stacked discriminative models(3) the GMM-MAP/SVM series hybrid model(the mixtures of GMM is 13,GMM multidimensional likelihoods is the same dimension with eigenvector) is the optimal stacked generative and discriminative hybrid Models,with the recognition rate up to 85.1%.
-
-