Short-time acoustic scene recognition method using multi-scale feature fusion

WANG Meng; ZHANG Pengyuan

doi:10.15949/j.cnki.0371-0025.2022.06.002

WANG Meng, ZHANG Pengyuan. Short-time acoustic scene recognition method using multi-scale feature fusionJ. ACTA ACUSTICA, 2022, 47(6): 717-726. DOI: 10.15949/j.cnki.0371-0025.2022.06.002

Citation:

WANG Meng, ZHANG Pengyuan. Short-time acoustic scene recognition method using multi-scale feature fusionJ. ACTA ACUSTICA, 2022, 47(6): 717-726. DOI: 10.15949/j.cnki.0371-0025.2022.06.002

Citation:

WANG Meng, ZHANG Pengyuan. Short-time acoustic scene recognition method using multi-scale feature fusionJ. ACTA ACUSTICA, 2022, 47(6): 717-726. DOI: 10.15949/j.cnki.0371-0025.2022.06.002

Short-time acoustic scene recognition method using multi-scale feature fusion

Graphical Abstract

Graphical Abstract

Abstract

Abstract

For the problem of poor recognition performance in short-time acoustic scene recognition task, a method using multi-scale feature fusion is proposed. Firstly, this method takes the sum and difference of the stereo audio's left and right channels as input. And a long frame length is used for frame processing to ensure that the extracted frame-level features contain enough audio information. Then, the features are input frame by frame into a one-dimensional convolutional neural network which uses multi-scale feature fusion to make full use of the shallow, middle and deep embedding at different scales in the network. Finally, all the frame-level soft labels are integrated to obtain the scene label of the audio. Experimental results show that the accuracy of this method on the Detection and Classification of Acoustic Scenes and Events(DCASE) 2021 short-time audio scene dataset is 79.02%, which achieves state-of-the-art performance on this dataset so far.

FullText(HTML)

References (30)

Cited By

Short-time acoustic scene recognition method using multi-scale feature fusion

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content