有效高斯分量通用背景模型下耳语音声道系统转换研究
Research of whispered speech vocal tract system conversion based on universal background model and effective Gaussian components
-
摘要: 为了改善耳语音转换中声道系统的转换性能,针对定值转换方法在非特定人耳语音转换系统中效果不理想的情况,提出使用通用背景模型建立独立于说话人的声道系统转换模型。进一步针对在通用背景模型中由于较大分量数产生的声学概率密度统计模型的误差问题,提出基于最小谱失真度的后验概率和有效高斯分量选择方法优化特征矢量的转换性能。定义了板仓一斋田谱失真测度的性能指标对该模型进行分析比较,实验表明,基于通用背景模型的转换特征矢量平均谱失真度性能指标优于定值偏移方法,且稳定性明显好于定值偏移方法。通用背景模型基础上有效高斯分量选择方法可进一步将性能指标提高5.11%,主观听觉测试表明本文方法可改善转换语音的清晰度和准确度。Abstract: Directing to the weakness of the present fixed values mapping methods(method_F),a vocal tract systen conversion method based on the universal background model(UBM) is proposed for improving the performance of the speech conversion system from Chinese whispered speech to normal speech.For the numerous components of UBM the errors produced by the acoustical probability density statistical model can't be ignored.Thus an effective Gaussian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance.The proposed method(method_U) is analyzed and compared using the performance index(PI) based on Itakura-Saito spectral distortion measure.It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F.The average PI of method_U is better than method_F.It is shown that by selecting effective Gaussian mixture components the PI of method_U can be further improved 5.11%.Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.