Two-stage brain-controlled speech enhancement with integrated dual-view features

QIU Zelin; YAO Dingding; LI Junfeng

doi:10.12395/0371-0025.2024269

QIU Zelin, YAO Dingding, LI Junfeng. Two-stage brain-controlled speech enhancement with integrated dual-view features[J]. ACTA ACUSTICA, 2025, 50(2): 362-372. DOI: 10.12395/0371-0025.2024269

Citation:

QIU Zelin, YAO Dingding, LI Junfeng. Two-stage brain-controlled speech enhancement with integrated dual-view features[J]. ACTA ACUSTICA, 2025, 50(2): 362-372. DOI: 10.12395/0371-0025.2024269

Citation:

QIU Zelin, YAO Dingding, LI Junfeng. Two-stage brain-controlled speech enhancement with integrated dual-view features[J]. ACTA ACUSTICA, 2025, 50(2): 362-372. DOI: 10.12395/0371-0025.2024269

Two-stage brain-controlled speech enhancement with integrated dual-view features

Graphical Abstract

Graphical Abstract

Abstract

Abstract

A two-stage brain-controlled speech enhancement method that integrates dual-perspective features is proposed. First, a speech separation algorithm is used to separate the mixed speech signals. Then, an end-to-end speech enhancement module that incorporates dual-perspective features is utilized for auditory attention decoding. Based on the decoding results, selective output of the separated speech is performed. This enhancement module extracts both dynamic features related to speech energy variations from the electroencephalogram (EEG) signals and static features associated with the speaker’s vocal characteristics, enabling better integration of attention information within the mixed speech. Since the enhancement module is used solely for decoding, the proposed method, compared to existing methods, effectively captures auditory attention while reducing the adverse effects of EEG signals on the quality of speech output. Experimental results on the “2024 Sparse Brain-Assisted Speech Enhancement Challenge” dataset show that the proposed method improves the target speech signal-to-distortion ratio (SI-SDR) by 18.08 dB, which is 6.44 dB higher than existing methods. Moreover, the method maintains high output speech quality even when using fewer EEG channels or lower signal-to-noise ratio EEG signals.

FullText(HTML)

References (34)

Cited By

Two-stage brain-controlled speech enhancement with integrated dual-view features

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content