Voice conversion using structured Gaussian mixture model in eigen space
-
-
Abstract
Under the condition of non-parallel corpora without joint training, a new methodology of voice conversion in eigen space based on structured Gaussian mixture model is proposed. For every speaker, after the eepstrum feature parameters are extracted, they are further mapped to the eigen space which is formed by eigen vectors of scatter matrix of tile eepstrum features, then train speaker's Structured Gaussian Mixture Model in the Eigen Space (SGMM-ES). The source and target speaker's SGMM-ES are trained respectively, then based on Acoustic Universal Structure (AUS) principle to achieve spectrum transform function. Experimental results show the correct recognition average rate of conversion speech achieves 95.25%, and the value of average spectral distortion is 1.25, in terms of relative SGMM method increased by 0.8% and 7.3%. ABX and MOS evaluations indicate the conversion performances are quite close to the traditional method under the parallel corpora condition. The results show the eigen space based on structured Gaussian mixture model for voice conversion under the non-parallel corpora is effective.
-
-