基于非线性时频掩蔽的语音盲分离方法

徐舜; 陈绍荣; 刘郁林

doi:10.15949/j.cnki.0371-0025.2007.04.015

基于非线性时频掩蔽的语音盲分离方法

Blind speech source separation via nonlinear time-frequency masking

摘要

摘要: 针对语音信号的欠定卷积混合模型,利用独立语音在时频域上的近似W-分离正交性(W-DO),提出了一种基于非线性时频掩蔽的盲分离方法。首先对多传声器观测信号在时频域上进行规范化处理,使混合信号在每个时频槽的表示与频率无关,然后采用动态聚类算法获取时频槽对应的活跃源信息,选择关于簇中心偏角的非线性函数进行时频掩蔽,从而实现语音信号的盲分离。该方法解决了经典频域盲分离算法中的频率置换问题,能有效抑制分离矩阵的空间方向扩散。仿真实验表明,与BLUES方法相比具有更优的分离性能,信噪比增益平均增加1.58 dB。

Abstract: A blind speech source separation method for the underdetermined convolutive mixture model is proposed via nonlinear time-frequency masking, the approximate W-disjoint orthogonality (W-DO) property of independent speech signals in the time-frequency domain is exploited. Firstly the observation mixture signal from multi-microphones is normalized to be independent of frequency in the time-frequency domain, then the dynamic clustering algorithm is developed to obtain the active source information in each time-frequency slot, a nonlinear function of deflection angle from the clustering center is selected for time-frequency masking, finally the blind separation of mixture speech signals can be achieved. This novel method can not only overcome the problem of frequency permutation which may be met in most classic frequency-domain blind separation techniques, but suppress the spatial direction diffusion of the separation matrix. Simulation results demonstrate that our proposed separation method outperform the typical BLUES method, the signal-noise-ratio gain (SNRG) is improved 1.58 dB averagely.

HTML全文

参考文献(0)

施引文献

资源附件(0)