采用独立说话人模型的语音转换
Voice conversion based on isolated speaker model
-
摘要: 提出一种基于完全独立的说话人语音模型进行语音转换的方法。首先每个说话人采用各自的语料训练结构化高斯混合模型(Structured Gaussian Mixture Model,SGMM),然后根据源和目标说话人各自的模型采用全局声学结构(AcousticalUniversal Structure,AUS)进行匹配和高斯分布对准,最终得到相应的转换函数进行语音转换。ABX和MOS实验表明可以得到与传统的平行语料联合训练方法接近的转换性能,并且转换语音的目标说话人识别正确率达到94.5%。实验结果充分说明了本文提出的方法不仅具有较好的转换性能,而且具有较小的训练量和很好的系统扩展性。Abstract: A voice conversion scheme using isolated speaker model without need of joint training and parallel speech corpus is proposed.In system training,Structured Gaussian Mixture Model is trained for every speaker using speaker dependent speech samples.In conversion,the Gaussian components of source and target speakers models are first aligned by a technology named as Acoustic Universal Structure,then transform formula is achieved.Experiments show the proposed method not only has the equivalent conversion performance,but also fewer computational consume and more flexible system extension ability.