采用动态核特征及贝叶斯最大后验估计的语音转换方法
Voice conversion using bayesian analysis and dynamic kernel features
-
摘要: 针对小样本情况下,使用混合概率线性回归(Mixture of Probabilistic Linear Regressions,MPLR)模型进行语音转换容易出现过拟合的问题,提出利用动态核特征替代源说话人语音谱特征后,对转换函数参数进行贝叶斯最大后验估计(Maximum a Posterior,MAP)求解的方法。首先采用核函数将源说话人的语音谱特征转化为动态核特征,再引入转换函数参数的先验知识,最后根据对误差的不同假设,提出两种求解转换函数参数的方法。客观评测实验结果表明,所提出方法的平均谱失真值相对于MPLR模型转换方法平均降低了4.25%。主观评测实验结果表明,所提出的方法在转换语音的相似度和自然度方面的得分均高于MPLR方法。实验结果证明,所提出方法有效地改善了语音转换中的过拟合问题。Abstract: When the training utterances are sparse, the voice conversion method based on Mixture of Probabilistic Linear Regressions is subjected to overfitting problem. To address that case, we adopt dynamic kernel features to replace the cepstrum features of the original speaker and estimate the transformation parameters in sense of Maximizing a Posterior with Bayesian inference. First, the features of the original speaker are converted into dynamic kernel features by kernel transformation. Then the prior information of the transformation parameters is introduced. Finally, according to different assumptions about conversion error, we propose two different methods to estimate the transformation parameters. Compared to MPLR, the proposed method achieves 4.25% relative decrease on the average cepstrum distortion in objective evaluations and obtains higher score about naturalness and similarity in subjective evaluations. Experimental results indicate that the proposed method can alleviate the overfitting problem.