EI / SCOPUS / CSCD 收录

中文核心期刊

联合深度神经网络和凸优化的单通道语音增强算法

Monaural speech enhancement combining deep neural network and convex optimation

  • 摘要: 噪声估计的准确性直接影响语音增强算法的好坏,为提升当前语音增强算法的噪声抑制效果,有效求解无约束优化问题,提出一种联合深度神经网络(DNN)和凸优化的时频掩蔽优化算法进行单通道语音增强。首先,提取带噪语音的能量谱作为DNN的输入特征;接着,将噪声与带噪语音的频带内互相关系数(ICC Factor)作为DNN的训练目标;然后,利用DNN模型得到的互相关系数构造凸优化的目标函数;最后,联合DNN和凸优化,利用新混合共轭梯度法迭代处理初始掩蔽,通过新的掩蔽合成增强语音。仿真实验表明,在不同背景噪声的低信噪比下,相比改进前,新的掩蔽使增强语音获得了更好的对数谱距离(LSD)、主观语音质量(PESQ)、短时客观可懂度(STOI)和分段信噪比(segSNR)指标,提升了语音的整体质量并且可以有效抑制噪声。

     

    Abstract: The accuracy of noise estimation directly affects the quality of speech enhancement algorithm.To improve the noise suppression effect of current speech enhancement algorithm when noise is estimated and effectively solve the unconstrained optimization problem,a time-frequency mask algorithm based on DNN(Deep Netual Networks) combined with convex optimization is proposed for monaural speech enhancement.Firstly,the power spectra of noisy speech is extracted as the input of DNN;Secondly,the inter-channel correlation factor between noise and speech is taken as the training target of DNN;Then,the objective function of convex optimization is constructed by using the correlation factor obtained from DNN model;Finally,new hybrid conjugate gradient method based on DNN combined with convex optimization,is used to perform iterative processing for initial mask.The final updated mask is used to obtain the enhanced speech.Simulation experimental results show that under different background noise with low SNR,compared with conventional methods,the obtained ratio mask makes the enhanced speech obtain better LSD(Log Spectral Distance),PESQ(Perceptual Evaluation of Speech Quality),STOI(Short-Time Objective Intelligibility) and segSNR(segmental Signal to Noise Ratio) indices,and improves the overall quality of speech and can effectively suppress noise.

     

/

返回文章
返回