在波形网络中融合相位信息的骨导语音增强
Bone-conducted speech enhancement using WaveNet fused with phase information
-
摘要: 已有骨导语音增强算法重点关注语音幅度谱增强,在波形合成时会因为相位不匹配导致语音质量下降。为解决该问题,提出了一种融合相位信息的波形网络(WaveNet)模型实现骨导语音增强波形生成。该方法以频带扩展WaveNet为基础,融合骨导语音相位谱信息与增强的语音幅度谱作为模型的条件特征,根据融合特征生成增强语音波形,实现了相位信息的有效利用。仿真实验综合对比了群时延谱和瞬时频率偏差谱相位特征,主客观结果表明,不论是采用串联融合还是卷积融合方式,骨导语音相位信息均有效补充了原有幅度谱条件特征,改善了语音增强效果。利用串联方式融合群时延谱特征可得到最佳结果,相比于原始骨导语音,平均意见得分(MOS)提升了约54.3%。Abstract: The existing bone-conducted speech enhancement algorithms mainly focus on the enhancement of speech magnitude,and use the mismatch phase to synthesize waveform,which leads to the degradation of speech quality.In order to solve this problem,a WaveNet model based on phase information fusion is proposed to generate the enhanced waveform.The proposed method is based on bandwidth extended WaveNet,and combines the phase information of bone-conducted speech and the magnitude of enhanced speech as the conditional features.The waveform is generated under the fused feature conditions,where the phase information is effectively utilized.The performances of group delay spectrum and instantaneous frequency deviation spectrum are compared in experiments.The results show that the phase information of bone-conducted speech can effectively complement the original magnitude condition and improve the performance of speech enhancement,no matter whether they are fused by concatenation or convolution.The best result is obtained by fusing the group delay spectrum by concatenation.Compared with the original bone-conducted speech,the Mean Opinion Score(MOS) score is improved by 54.3%.