编码器−时序建模结构的时延估计及在回声抵消中的应用
Delay estimation using encoder-temporal modeling structure for acoustic echo cancellation
-
摘要: 提出了一种使用编码器−时序建模结构的时延估计方法来估计声学回声抵消中传声器信号相对远端信号的时延。该方法以短时傅里叶变换域的远端信号和传声器信号作为输入特征, 通过复数卷积神经网络构成的编码器提取带有相位信息的高维特征, 利用循环神经网络学习两输入信号之间的时延关系, 构建了从信号到时延的映射。仿真实验结果表明, 相比WebRTC-DE和GCC-PHAT, 所提方法的优势有: (1)模型的参数量和计算量不受时延长度影响; (2)有效缩短了时延估计的收敛时间和跟踪时间; (3)在长混响和双端对讲的情况下具有更小、更稳定的估计误差和标准差。将使用编码器−时序建模结构的时延估计方法与自适应回声抵消级联的实验验证了新方法的有效性。Abstract: A delay estimation method based on encoder-temporal modeling structure is proposed to estimate the delay of microphone signal relative to the far-end signal in acoustic echo cancellation. In the proposed method, the far-end signal and the microphone signal in the short-time Fourier transform domain are used as input features. High-dimensional features with phase information are extracted by an encoder composed of complex convolutional neural networks. The memory ability of recurrent neural network is used to learn the time delay relationship between two input signals. A mapping from signal to delay is constructed by the proposed method. The simulation results show that the proposed method has the following advantages over WebRTC-DE and GCC-PHAT: (1) the number of parameters and computational complexity of the model are not affected by the delay; (2) the convergence time and tracking time of delay estimation are effectively reduced; (3) smaller and more stable estimation error and standard deviation are achieved in the case of long reverberation time and double-talk. Experiments on adaptive echo cancellation cascaded with the proposed delay estimation module verify the effectiveness of the new method.