基于噪声追踪的二值时频掩蔽到浮值掩蔽的泛化算法
A generalization algorithm from binary time-frequency masking to ratio masking based on noise-tracking
-
摘要: 虽然浮值掩蔽比二值掩蔽有更好的语音分离效果,但是由于理想浮值掩蔽难以直接估计,现有的语音分离系统通常以理想二值掩蔽估计作为计算目标。我们提出了一个二值掩蔽到浮值掩蔽的泛化算法。由于实现浮值掩蔽估计的关键在于噪声能量追踪,我们首先采用指数分布刻画以混合谱和噪声能量以混合能量及二值掩蔽为观测的条件分布。其次,采用高斯马尔柯夫条件随机场刻画噪声估计在连续几帧内的关联。最后,采用马尔柯夫链-蒙特卡洛计算噪声能量最小均方误差估计并进一步计算浮值掩蔽。实验表明,相比于基于二值掩蔽估计的常规算法,我们所提出的算法在信噪比增益和客观感知质量两方面都有显著提高。Abstract: Although ratio mask may achieve better speech separation results than that by binary mask,present speech separation systems usually set Ideal Binary Mask(IBM) as the computational goal due to the fact that it's very difficult to estimate Ideal Ratio Mask(IRM) directly.In this paper,a generalization algorithm from the binary mask to ratio mask is proposed.Since the key issue in IRM estimation is the noise tracking,we firstly use exponential distribution to model the noise power with binary mask and mixture power as conditions.Then,we use a Gaussian Markov Random Field(GMRF) to model the correlation of noise estimation between adjacent units.Finally,we apply Markov Chain Monte Carlo method to compute the minimum mean square error estimation of noise power and ratio mask.Systematic experiments show that the proposed algorithm outperforms a common binary masking based method in terms of SNR gain and PESQ scores.