两阶段复数谱卷积循环网络立体声回声消除
Convolutional recurrent network-based complex stereophonic acoustic echo cancellation with a two-stage approach
-
摘要: 提出了一种两阶段复数谱卷积循环网络(CRN)的立体声回声消除(SAEC)算法,该算法无需对立体声信号进行去相关,因而能够在保证立体声音质和空间感的同时,解决自适应滤波SAEC算法非唯一解问题。所提算法采用两个阶段进行回声消除,第一阶段根据传声器接收信号和参考信号估计回声信号,第二阶段将估计回声信号作为先验信息,联合传声器接收信号作为输入特征,估计近端语音。相对于单阶段CRN算法,该方法能够提高网络对回声和近端语音的区分度,有助于近端语音的提取。另外,网络的输入特征和训练目标均采用复数谱,降低了近端语音的相位估计误差,因而可以进一步提升算法性能。实验表明,基于两阶段复数谱CRN的SAEC算法在单端讲话时的回声抑制量和双端讲话时的语音质量都明显优于传统算法以及单阶段CRN算法。Abstract: We propose to use a two-stage Convolutional Recurrent Network (CRN) to address the Stereophonic Acoustic Echo Cancellation (SAEC) problem with complex spectral input features. The proposed algorithm avoids the decorrelation of far-end signals, which solves the non-unique solution problem of the adaptive filter-based SAEC and ensures the stereo sound quality and spatial perception. It deals with SAEC problem in two stages. In the first stage, a CRN model is used to estimate the echo signal based on the microphone and the far-end signals. In the second stage, a CRN model is used to estimate the near-end speech based on the microphone input signal and the estimated echo signal from the first stage. The discrimination between echo and near-end signal of the model can be improved by using the estimated echo signal as a priori information, which benefits the estimation of near-end signal. The input features and training targets used in the network are the complex spectral of signals, which can recover the phase information of the near-end speech. Experimental results show that the SAEC algorithm based on the proposed two-stage CRN model has significantly better performance than the traditional algorithms and single-stage CRN model in terms of both echo suppression in single-talk period and speech quality in double-talk period.