EI / SCOPUS / CSCD 收录

中文核心期刊

倒谱本征空间结构化高斯混合模型语音转换方法

Voice conversion using structured Gaussian mixture model in eigen space

  • 摘要: 针对非平行语料非联合训练条件下的语音转换,提出一种基于倒谱本征空间结构化高斯混合模型的方法。提取说话人语音倒谱特征参数之后,根据其散布矩阵计算本征向量构造倒谱本征空间并训练结构化高斯混合模型SGMM-ES(Structured Gaussian Mixture Model in Eigen Space)。源和目标说话人各自独立训练的SGMM-ES根据全局声学结构AUS(Acoustical Universal Structure)原理进行匹配对准,最终得到基于倒谱本征空间的短时谱转换函数。实验结果表明,转换语音的目标说话人平均识别率达到95.25%,平均谱失真度为1.25,相对基于原始倒谱特征空间的SGMM方法分别提高了0.8%和7.3%,而ABX和MOS测评表明转换性能非常接近于传统平行语料方法。这一结果说明采用倒谱本征空间结构化高斯混合模型进行非平行语料条件下的语音转换是有效的。

     

    Abstract: Under the condition of non-parallel corpora without joint training, a new methodology of voice conversion in eigen space based on structured Gaussian mixture model is proposed. For every speaker, after the eepstrum feature parameters are extracted, they are further mapped to the eigen space which is formed by eigen vectors of scatter matrix of tile eepstrum features, then train speaker's Structured Gaussian Mixture Model in the Eigen Space (SGMM-ES). The source and target speaker's SGMM-ES are trained respectively, then based on Acoustic Universal Structure (AUS) principle to achieve spectrum transform function. Experimental results show the correct recognition average rate of conversion speech achieves 95.25%, and the value of average spectral distortion is 1.25, in terms of relative SGMM method increased by 0.8% and 7.3%. ABX and MOS evaluations indicate the conversion performances are quite close to the traditional method under the parallel corpora condition. The results show the eigen space based on structured Gaussian mixture model for voice conversion under the non-parallel corpora is effective.

     

/

返回文章
返回