EI / SCOPUS / CSCD 收录

中文核心期刊

LIU Yang, YANG Feiran, YANG Jun. Delay estimation using encoder-temporal modeling structure for acoustic echo cancellation[J]. ACTA ACUSTICA, 2023, 48(5): 1036-1044. DOI: 10.12395/0371-0025.2022045
Citation: LIU Yang, YANG Feiran, YANG Jun. Delay estimation using encoder-temporal modeling structure for acoustic echo cancellation[J]. ACTA ACUSTICA, 2023, 48(5): 1036-1044. DOI: 10.12395/0371-0025.2022045

Delay estimation using encoder-temporal modeling structure for acoustic echo cancellation

More Information
  • PACS: 
    • 43.60  (Acoustic signal processing)
    • 43.72  (Speech processing and communication systems)
  • Received Date: July 21, 2022
  • Revised Date: November 21, 2022
  • Available Online: September 11, 2023
  • A delay estimation method based on encoder-temporal modeling structure is proposed to estimate the delay of microphone signal relative to the far-end signal in acoustic echo cancellation. In the proposed method, the far-end signal and the microphone signal in the short-time Fourier transform domain are used as input features. High-dimensional features with phase information are extracted by an encoder composed of complex convolutional neural networks. The memory ability of recurrent neural network is used to learn the time delay relationship between two input signals. A mapping from signal to delay is constructed by the proposed method. The simulation results show that the proposed method has the following advantages over WebRTC-DE and GCC-PHAT: (1) the number of parameters and computational complexity of the model are not affected by the delay; (2) the convergence time and tracking time of delay estimation are effectively reduced; (3) smaller and more stable estimation error and standard deviation are achieved in the case of long reverberation time and double-talk. Experiments on adaptive echo cancellation cascaded with the proposed delay estimation module verify the effectiveness of the new method.

  • [1]
    周翊, 郑成诗, 李晓东. 一种用于立体声声学回波消除的新型鲁棒梯度法格梯形自适应滤波算法. 声学学报, 2010; 35(2): 223—229 DOI: 10.15949/j.cnki.0371-0025.2010.02.001
    [2]
    路阳, 程晓斌, 李晓东, 等. 结合房间声学特点的子带自适应滤波声学回音抵消算法. 电声技术, 2006(8): 54—56 DOI: 10.3969/j.issn.1002-8684.2006.08.015
    [3]
    陈智颖, 陈锴, 卢晶, 等. 双通道回声抵消系统中改进算法的定点化实现. 应用声学, 2009; 28(3): 166—173 DOI: 10.3969/j.issn.1000-310X.2009.03.002
    [4]
    Benesty J, Morgan D R, Sondhi M M, et al. Advances in network and acoustic echo cancellation. New York: Springer-Verlag Berlin Heidelberg, 2001
    [5]
    Lee C M, Shin J W, Kim N S. DNN-based residual echo suppression. IEEE International Conference on Acoustic, Speech and Signal Processing, Dresden, ON, Germany, 2015: 1775—1779
    [6]
    Valin J M, Tenneti S, Helwani K, et al. Low-complexity, real-time joint neural echo control and speech enhancement based on PercepNet. IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, 2021: 7133—7137
    [7]
    Westhausen N L, Meyer B T. Acoustic echo cancellation with the dual-signal transformation LSTM network. IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, 2021: 7138—7142
    [8]
    Lu Y, Fowler R, Tian W, et al. Enhancing echo cancellation via estimation of delay. IEEE Trans. Signal Process., 2005; 53(11): 4159—4168 DOI: 10.1109/TSP.2005.857034
    [9]
    Cutler R, Saabas A, Parnamaa T, et al. INTERSPEECH 2021 acoustic echo cancellation challenge. Proc. Interspeech, Czechia, 2021: 4748—4752
    [10]
    王心一, 杜光. 降采样固定时延估算法在声回波对消系统中的应用. 山东大学学报(工学版), 2011; 41(3): 42—45
    [11]
    陈华伟, 赵俊渭, 郭业才, 等. 一种维纳加权频域自适应时延估计算法. 声学学报, 2003; 27(6): 514—517 DOI: 10.3321/j.issn:0371-0025.2003.06.006
    [12]
    Knapp C H K, Carter C. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process., 1976; 24(4): 320—327 DOI: 10.1109/TASSP.1976.1162830
    [13]
    Volcker B, Kleijn W B. Robust and low complexity delay estimation. International Workshop on Acoustic Signal Enhancement, VDE, Aachen, Germany, 2012: 4—6
    [14]
    Peng R, Cheng L, Zheng C, et al. Acoustic echo cancellation using deep complex neural network with nonlinear magnitude compression and phase information. Proc. Interspeech, China, 2021: 4768—4772
    [15]
    Hu Y, Liu Y, Lv S, et al. DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement. Proc. Interspeech, China, 2020: 2472—2476
    [16]
    武瑞沁, 陈雪勤, 俞杰, 等. 结合注意力机制的改进U-Net网络在端到端语音增强中的应用. 声学学报, 2022; 47(2): 266—275 DOI: 10.15949/j.cnki.0371-0025.2022.02.011
    [17]
    Comanducci L, Cobos M, Antonacci F, et al. Time difference of arrival estimation from frequency-sliding generalized cross-correlations using convolutional neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 2020: 4945—4949
    [18]
    Pertilä P, Parviainen M, Myllylä V, et al. Time difference of arrival estimation with deep learning – from acoustic simulations to recorded data. IEEE 22nd International Workshop on Multimedia Signal Processing, Tampere, Finland, 2020: 1—6
    [19]
    Salvati D, Drioli C, Foresti G L. Time delay estimation for speaker localization using CNN-based parametrized GCC-PHAT features. Proc. Interspeech, Czechia, 2021: 1479—1483
    [20]
    Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge, MA, USA: MIT Press, 2016
    [21]
    Chung J, Gulcehre C, Cho K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. NIPS 2014 Workshop on Deep Learning, 2014
    [22]
    Mönning N, Manandhar S. Evaluation of complex-valued neural networks on real-valued classification tasks. arXiv preprint: 1811.12351, 2018
    [23]
    Yang F, Yang J. Optimal step-size control of the partitioned block frequency-domain adaptive filter. IEEE Trans. Circuits Syst. II, 2018; 65(6): 814—818 DOI: 10.1109/TCSII.2017.2780880
    [24]
    Panayotov V, Chen G, Povey D, et al. LibriSpeech: An ASR corpus based on public domain audio books. IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, QLD, Australia, 2015: 5206—5210
    [25]
    Reddy C K A, Dubey H, Koishida K, et al. INTERSPEECH 2021 deep noise suppression challenge. Proc. Interspeech, Czechia, 2021: 2796—2800
    [26]
    Diaz-Guerra D, Miguel A, Beltran J R. gpuRIR: A python library for room impulse response simulation with GPU acceleration. Multimed. Tools Appl., 2021; 80(4): 5653—5671 DOI: 10.1007/s11042-020-09905-3
    [27]
    Nollett B S, Jones D L. Nonlinear echo cancellation for hands-free speakerphones. NSIP'97, 1997: 8—10
    [28]
    Comminiello D, Scarpiniti M, Azpicueta-Ruiz L A, et al. Functional link adaptive filters for nonlinear acoustic echo cancellation. IEEE Trans. Audio Speech Lang. Process., 2013; 21(7): 1502—1512 DOI: 10.1109/TASL.2013.2255276
    [29]
    Shi K, Ma X, Zhou G T. An efficient acoustic echo cancellation design for systems with long room impulses and nonlinear loudspeakers. Signal Process., 2009; 89(2): 121—132 DOI: 10.1016/j.sigpro.2008.07.009
    [30]
    Kingma D P, Ba J L. Adam: A method for stochastic optimization. International Conference on Learning Representations, San Diego, USA, 2015
    [31]
    Rix A W, Beerends J G, Hollier M P, et al. Perceptual evaluation of speech quality (PESQ) − a new method for speech quality assessment of telephone networks and codecs. IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, 2001; 2: 749—752
  • Related Articles

    [1]ZHAO Yanfeng, TONG Feng, MA Bole, ZHOU Yuehai, YANG Xiaoyu. Multipath-cluster-wise joint equalization method for long-range deep-sea single-element underwater acoustic communication[J]. ACTA ACUSTICA, 2024, 49(4): 764-773. DOI: 10.12395/0371-0025.2024038
    [2]LI Derui, WANG Wei, LI Yu, LI Shuqiu, YU Xueyang. Cluster restricted despreading method of long-range pilot-free orthogonal multicarrier spread spectrum communication in deep sea[J]. ACTA ACUSTICA, 2024, 49(4): 753-763. DOI: 10.12395/0371-0025.2024091
    [3]ZHAO Yibo, QIAO Gang, LIU Songzuo, QING Xin, LI Lei. An end-point detection method for beluga whistle signals under burst pulse interferences[J]. ACTA ACUSTICA, 2024, 49(3): 550-559. DOI: 10.12395/0371-0025.2022192
    [4]XING Chuanxi, WAN Zhiliang, JIANG Siyuan, YU Ruimeng. Direction of arrival estimation based on high-order cumulant by sparse reconstruction of underwater acoustic signals[J]. ACTA ACUSTICA, 2022, 47(4): 440-450. DOI: 10.15949/j.cnki.0371-0025.2022.04.010
    [5]JIANG Weihua, TONG Feng, ZHANG Hongtao, LI Bin. Dynamic discriminative compressed sensing estimation of hybrid sparse underwater acoustic channel[J]. ACTA ACUSTICA, 2021, 46(6): 825-834. DOI: 10.15949/j.cnki.0371-0025.2021.06.005
    [6]KOU Siwei, FENG Xian, BI Yang, HUANG Hui. High-resolution angle-Doppler imaging by sparse recovery of underwater acoustic signals[J]. ACTA ACUSTICA, 2021, 46(4): 519-528. DOI: 10.15949/j.cnki.0371-0025.2021.04.004
    [7]CHEN Sheng, YANG Yanming, ZHOU Hongtao, WEN Hongtao. Analysis of the variation on underwater acoustic signal across ice layer in the Arctic[J]. ACTA ACUSTICA, 2021, 46(3): 355-364. DOI: 10.15949/j.cnki.0371-0025.2021.03.004
    [8]XIE Liang, WANG Lujun, LIN Wangsheng. Localization of underwater sound source using the characteristics of pulse cluster signal arrivals in deep sea[J]. ACTA ACUSTICA, 2021, 46(2): 171-181. DOI: 10.15949/j.cnki.0371-0025.2021.02.002
    [9]WANG Peng, CHI Cheng, JI Yongqiang, HUANG Yong, LIU Jiyuan, HUANG Haining. Two-dimensional deconvolved beamforming for high-resolution underwater three-dimensional acoustical imaging[J]. ACTA ACUSTICA, 2019, 44(4): 613-625. DOI: 10.15949/j.cnki.0371-0025.2019.04.022
    [10]WANG Wei, LÜ Ping, YAN Yonghong. An improved hierarchical speaker clustering[J]. ACTA ACUSTICA, 2008, 33(1): 9-14. DOI: 10.15949/j.cnki.0371-0025.2008.01.013

Catalog

    Article Metrics

    Article views (206) PDF downloads (53) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return