Citation: | LIU Yang, YANG Feiran, YANG Jun. Delay estimation using encoder-temporal modeling structure for acoustic echo cancellation[J]. ACTA ACUSTICA, 2023, 48(5): 1036-1044. DOI: 10.12395/0371-0025.2022045 |
A delay estimation method based on encoder-temporal modeling structure is proposed to estimate the delay of microphone signal relative to the far-end signal in acoustic echo cancellation. In the proposed method, the far-end signal and the microphone signal in the short-time Fourier transform domain are used as input features. High-dimensional features with phase information are extracted by an encoder composed of complex convolutional neural networks. The memory ability of recurrent neural network is used to learn the time delay relationship between two input signals. A mapping from signal to delay is constructed by the proposed method. The simulation results show that the proposed method has the following advantages over WebRTC-DE and GCC-PHAT: (1) the number of parameters and computational complexity of the model are not affected by the delay; (2) the convergence time and tracking time of delay estimation are effectively reduced; (3) smaller and more stable estimation error and standard deviation are achieved in the case of long reverberation time and double-talk. Experiments on adaptive echo cancellation cascaded with the proposed delay estimation module verify the effectiveness of the new method.
[1] |
周翊, 郑成诗, 李晓东. 一种用于立体声声学回波消除的新型鲁棒梯度法格梯形自适应滤波算法. 声学学报, 2010; 35(2): 223—229 DOI: 10.15949/j.cnki.0371-0025.2010.02.001
|
[2] |
路阳, 程晓斌, 李晓东, 等. 结合房间声学特点的子带自适应滤波声学回音抵消算法. 电声技术, 2006(8): 54—56 DOI: 10.3969/j.issn.1002-8684.2006.08.015
|
[3] |
陈智颖, 陈锴, 卢晶, 等. 双通道回声抵消系统中改进算法的定点化实现. 应用声学, 2009; 28(3): 166—173 DOI: 10.3969/j.issn.1000-310X.2009.03.002
|
[4] |
Benesty J, Morgan D R, Sondhi M M, et al. Advances in network and acoustic echo cancellation. New York: Springer-Verlag Berlin Heidelberg, 2001
|
[5] |
Lee C M, Shin J W, Kim N S. DNN-based residual echo suppression. IEEE International Conference on Acoustic, Speech and Signal Processing, Dresden, ON, Germany, 2015: 1775—1779
|
[6] |
Valin J M, Tenneti S, Helwani K, et al. Low-complexity, real-time joint neural echo control and speech enhancement based on PercepNet. IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, 2021: 7133—7137
|
[7] |
Westhausen N L, Meyer B T. Acoustic echo cancellation with the dual-signal transformation LSTM network. IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, 2021: 7138—7142
|
[8] |
Lu Y, Fowler R, Tian W, et al. Enhancing echo cancellation via estimation of delay. IEEE Trans. Signal Process., 2005; 53(11): 4159—4168 DOI: 10.1109/TSP.2005.857034
|
[9] |
Cutler R, Saabas A, Parnamaa T, et al. INTERSPEECH 2021 acoustic echo cancellation challenge. Proc. Interspeech, Czechia, 2021: 4748—4752
|
[10] |
王心一, 杜光. 降采样固定时延估算法在声回波对消系统中的应用. 山东大学学报(工学版), 2011; 41(3): 42—45
|
[11] |
陈华伟, 赵俊渭, 郭业才, 等. 一种维纳加权频域自适应时延估计算法. 声学学报, 2003; 27(6): 514—517 DOI: 10.3321/j.issn:0371-0025.2003.06.006
|
[12] |
Knapp C H K, Carter C. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process., 1976; 24(4): 320—327 DOI: 10.1109/TASSP.1976.1162830
|
[13] |
Volcker B, Kleijn W B. Robust and low complexity delay estimation. International Workshop on Acoustic Signal Enhancement, VDE, Aachen, Germany, 2012: 4—6
|
[14] |
Peng R, Cheng L, Zheng C, et al. Acoustic echo cancellation using deep complex neural network with nonlinear magnitude compression and phase information. Proc. Interspeech, China, 2021: 4768—4772
|
[15] |
Hu Y, Liu Y, Lv S, et al. DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement. Proc. Interspeech, China, 2020: 2472—2476
|
[16] |
武瑞沁, 陈雪勤, 俞杰, 等. 结合注意力机制的改进U-Net网络在端到端语音增强中的应用. 声学学报, 2022; 47(2): 266—275 DOI: 10.15949/j.cnki.0371-0025.2022.02.011
|
[17] |
Comanducci L, Cobos M, Antonacci F, et al. Time difference of arrival estimation from frequency-sliding generalized cross-correlations using convolutional neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 2020: 4945—4949
|
[18] |
Pertilä P, Parviainen M, Myllylä V, et al. Time difference of arrival estimation with deep learning – from acoustic simulations to recorded data. IEEE 22nd International Workshop on Multimedia Signal Processing, Tampere, Finland, 2020: 1—6
|
[19] |
Salvati D, Drioli C, Foresti G L. Time delay estimation for speaker localization using CNN-based parametrized GCC-PHAT features. Proc. Interspeech, Czechia, 2021: 1479—1483
|
[20] |
Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge, MA, USA: MIT Press, 2016
|
[21] |
Chung J, Gulcehre C, Cho K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. NIPS 2014 Workshop on Deep Learning, 2014
|
[22] |
Mönning N, Manandhar S. Evaluation of complex-valued neural networks on real-valued classification tasks. arXiv preprint: 1811.12351, 2018
|
[23] |
Yang F, Yang J. Optimal step-size control of the partitioned block frequency-domain adaptive filter. IEEE Trans. Circuits Syst. II, 2018; 65(6): 814—818 DOI: 10.1109/TCSII.2017.2780880
|
[24] |
Panayotov V, Chen G, Povey D, et al. LibriSpeech: An ASR corpus based on public domain audio books. IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, QLD, Australia, 2015: 5206—5210
|
[25] |
Reddy C K A, Dubey H, Koishida K, et al. INTERSPEECH 2021 deep noise suppression challenge. Proc. Interspeech, Czechia, 2021: 2796—2800
|
[26] |
Diaz-Guerra D, Miguel A, Beltran J R. gpuRIR: A python library for room impulse response simulation with GPU acceleration. Multimed. Tools Appl., 2021; 80(4): 5653—5671 DOI: 10.1007/s11042-020-09905-3
|
[27] |
Nollett B S, Jones D L. Nonlinear echo cancellation for hands-free speakerphones. NSIP'97, 1997: 8—10
|
[28] |
Comminiello D, Scarpiniti M, Azpicueta-Ruiz L A, et al. Functional link adaptive filters for nonlinear acoustic echo cancellation. IEEE Trans. Audio Speech Lang. Process., 2013; 21(7): 1502—1512 DOI: 10.1109/TASL.2013.2255276
|
[29] |
Shi K, Ma X, Zhou G T. An efficient acoustic echo cancellation design for systems with long room impulses and nonlinear loudspeakers. Signal Process., 2009; 89(2): 121—132 DOI: 10.1016/j.sigpro.2008.07.009
|
[30] |
Kingma D P, Ba J L. Adam: A method for stochastic optimization. International Conference on Learning Representations, San Diego, USA, 2015
|
[31] |
Rix A W, Beerends J G, Hollier M P, et al. Perceptual evaluation of speech quality (PESQ) − a new method for speech quality assessment of telephone networks and codecs. IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, 2001; 2: 749—752
|