基于多窗谱的心理声学语音增强

吴红卫; 吴镇扬; 赵力

doi:10.15949/j.cnki.0371-0025.2007.03.013

基于多窗谱的心理声学语音增强

Psychoacoustical enhancement of speech based on multitaper spectrum

摘要

摘要: 与传统的周期谱图相比,多窗谱具有更小的估计方差。从含噪语音的多窗谱对噪声及噪声与含噪语音之比(NNSR)进行估计,用基于NNSR的幅度谱减实现用于计算人耳掩蔽阈值的预增强语音,用集成了人耳掩蔽阈值的心理声学加权规则实现最终的增强语音。考虑到多窗谱的特点对掩蔽偏移量进行了修正,修正后的重建语音,其客观测量指标修正巴克谱测度比修正前有一定的改进。再对心理声学加权规则作最大值小于1的限制,则输入信噪比越大(0 dB以上),分段信噪比和总体信噪比提高得越多。非正式试听表明重建语音失真较小,背景噪声大大降低,且没有音乐噪声。

Abstract: Multitaper spectrum has lower variance than the traditional periodogram. The noise spectrum and the Noise to Noisy Signal Ratio (NNSR) are estimated from the multitaper spectrum of the noisy signal; the pre-enhanced speech for calculating the noise masking threshold is obtained by the spectral amplitude subtraction method, whose gain is a function of NNSR; the final enhanced speech is obtained by suppressing the Fourier spectrum of the noisy signal with the psychoacoustical weighting rule incorporating the noise masking threshold. Because of the low variance feature of the multitaper spectrum, a modified offset formula is proposed to calculate the noise masking threshold, thus the reconstructed speech with this modification has an improvement in MBSD (Modified Bark Spectral Distortion). When a maximum limitation less than one to the psychoacoustical weighting rule is further proposed, the higher the input SNR (>0 dB) is, the more improvement the segmental SNR and the overall SNR have. The informal listening tests show that there is little speech distortion for the enhanced speech processed by the proposed method, the background noise is reduced much and free of musical noise.

HTML全文

参考文献(0)

施引文献

资源附件(0)