The self-adaptation of acoustic encoder in end-to-end automatic speech recognition under diverse acoustic scenes

LIU Yukun; ZHENG Lin; LI Ta; ZHANG Pengyuan

doi:10.12395/0371-0025.2022114

LIU Yukun, ZHENG Lin, LI Ta, ZHANG Pengyuan. The self-adaptation of acoustic encoder in end-to-end automatic speech recognition under diverse acoustic scenesJ. ACTA ACUSTICA, 2023, 48(6): 1260-1268. DOI: 10.12395/0371-0025.2022114

Citation:

The self-adaptation of acoustic encoder in end-to-end automatic speech recognition under diverse acoustic scenes

Graphical Abstract

Graphical Abstract

Abstract

Abstract

In this paper, a scene-adaptive acoustic encoder (SAE) is proposed for different speech scenes. This method adaptively designs an appropriate acoustic encoder for end-to-end speech recognition tasks by learning the differences of acoustic features in different acoustic scenes. By the application of the neural architecture search method, the effectiveness of encoder design and the performance of downstream recognition tasks are improved. Experiments on three commonly used Chinese and English dataset, Aishell-1, HKUST and SWBD, show that the proposed SAE can achieve average 5% relative character error rate reductions than the best human-designed encoders. The results show that the proposed method is an effective method for analysis of acoustic features in specific scenes and targeted design of high-performance acoustic encoders.

FullText(HTML)

References (29)

Cited By

The self-adaptation of acoustic encoder in end-to-end automatic speech recognition under diverse acoustic scenes

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content