采用低维特征映射的耳语音向正常音转换
Whisper to normal conversion based on low dimension feature mapping
-
摘要: 在将耳语音转换为正常音时,为了研究降维后语音特征对耳语音转换的影响,分别对耳语音和正常音谱包络进行自适应编码以提取耳语音和正常音的低维特征,然后使用BP网络建立耳语音和正常音低维谱包络特征之间的映射关系以及正常音基频和耳语音低维谱包络特征之间的关系。转换时,根据耳语音低维谱包络特征获得对应正常音的低维谱包络特征和基频,对低维谱包络特征进行解码后获得对应的正常音谱包络。实验结果表明,采用此方法转换后的语音与正常音之间的倒谱距离相比高斯混合模型方法下降了10%,转换后语音的自然度和可懂度都有所提高。Abstract: In order to characterize the relationship between whisper and its corresponding normal speech for whisper to normal speech conversion, the low dimension features of spectrum envelope in whisper and normal speech are extracted and represented by a sparse auto-encoder. In the low dimension space, two BP networks are then trained. One is used to model the spectrum relation between the whisper and its corresponding normal speech and the other is used to model the relation between the whisper spectrum and the pitch of normal speech. In the conversion stage, the spectral envelope of whisper is sparsely encoded to obtain low dimension spectral envelope feature. The low dimension normal speech feature and pitch are then estimated respectively through the trained BP networks. With sparse decoding, the envelope spectrum of normal speech is then obtained and used to reconstruct the normal speech. Experimental results show that the ceptral distance of the normal speech estimated by the proposed method decreases 10% compared with that of the GMM-based method. Subjective listening tests also show better naturalness and intelligibility obtained by the proposed method.