EI / SCOPUS / CSCD 收录

中文核心期刊

SHI Wenhua, ZHANG Xiongwei, ZOU Xia, SUN Meng, LI Li. Time frequency masking based speech enhancement using deep encoder-decoder neural network[J]. ACTA ACUSTICA, 2020, 45(3): 299-307. DOI: 10.15949/j.cnki.0371-0025.2020.03.002
Citation: SHI Wenhua, ZHANG Xiongwei, ZOU Xia, SUN Meng, LI Li. Time frequency masking based speech enhancement using deep encoder-decoder neural network[J]. ACTA ACUSTICA, 2020, 45(3): 299-307. DOI: 10.15949/j.cnki.0371-0025.2020.03.002

Time frequency masking based speech enhancement using deep encoder-decoder neural network

  • A time-frequency masking estimation based speech enhancement method with deep encoder-decoder neural network is presented.In this method,the time-frequency masking representation is estimated using deep encoder-decoder neural network,and it is combined with the amplitude spectrum of noisy speech to get the nonlinear mapping relationship between noisy and target speech.The convolutional and de-convolutional structures are employed in the deep encoderdecoder neural network.At the encoder,the local perception characteristics are used to model the typical structural features of noisy speech in the time-frequency domain,speech features are extracted and the influence of background noise is suppressed.At the decoder,the speech signal is reconstructed from the extracted speech features and local details of speech are recovered layer by layer.Meanwhile,skip connections are introduced between the corresponding layers of encoder and decoder to reduce the loss of details at low levels which is induced by pooling and full connection operations.Experiments are carried out with speech from TIMIT database and noise from NOISEX-92 database.The simulation results demonstrate that the proposed method can effectively suppress noise and recover the detailed information of speech.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return