融合双视角特征的两阶段脑控语音增强
Two-stage brain-controlled speech enhancement with integrated dual-view features
-
摘要: 提出一种融合双视角特征的两阶段脑控语音增强方法。首先使用语音分离算法对混合语音进行分离, 然后利用一种融合双视角特征的端到端语音增强模块进行听觉注意解码, 并根据解码结果对所分离的语音进行选择性输出。该增强模块一方面提取脑电信号中与语音能量变化等相关的动态特征, 另一方面提取与说话人发声特性相关的静态特征, 使混合语音能够更好地与脑电信号中的注意力信息相融合。由于增强模块仅被用于解码, 与已有方法相比, 所提方法在有效获取注意力信息的同时, 降低了脑电信号对语音输出质量的负面影响。实验结果表明, 在“2024稀疏脑辅助式语音增强挑战赛”数据集上, 所提方法能够使目标语音的信号失真比提升18.08 dB, 比已有方法高6.44 dB, 且在使用较少脑电通道或较低信噪比的脑电信号时, 仍能保持较高的语音输出质量。Abstract: A two-stage brain-controlled speech enhancement method that integrates dual-perspective features is proposed. First, a speech separation algorithm is used to separate the mixed speech signals. Then, an end-to-end speech enhancement module that incorporates dual-perspective features is utilized for auditory attention decoding. Based on the decoding results, selective output of the separated speech is performed. This enhancement module extracts both dynamic features related to speech energy variations from the electroencephalogram (EEG) signals and static features associated with the speaker’s vocal characteristics, enabling better integration of attention information within the mixed speech. Since the enhancement module is used solely for decoding, the proposed method, compared to existing methods, effectively captures auditory attention while reducing the adverse effects of EEG signals on the quality of speech output. Experimental results on the “2024 Sparse Brain-Assisted Speech Enhancement Challenge” dataset show that the proposed method improves the target speech signal-to-distortion ratio (SI-SDR) by 18.08 dB, which is 6.44 dB higher than existing methods. Moreover, the method maintains high output speech quality even when using fewer EEG channels or lower signal-to-noise ratio EEG signals.