基于掩蔽特性的噪声环境下语音识别新特征

A new feature extraction method for noisy speech recognition based on masking model

摘要: 语音识别系统的识别率在噪声环境中下降很大。本文根据人耳的听觉特性,提出一种基于人耳听觉掩蔽特性的抗噪声特征提取方法。该方法先求取噪声语音的掩蔽特性,在此基础上再计算Mel倒谱系数用于语音识别。通过对TIMIT数据包的 0~9十个英语数字在 NoiseX92的各种噪声下进行了识别试验。其中在信噪比 0dB条件下,在 3种噪声条件下识别率平均提高 152%,实验表明新方法对于各种噪声环境下的识别率有显著提高。

Abstract: The performance of traditional speech recognition system degrades seriously in noisy environment. This paper presents a new speech feature extraction method based on masking properties of the human auditory system. We derive MFCC from masking model of noisy speech. The new method is evaluated by a task on TIMIT digit database (from 0 to 9, in English). Several types of noises from NoiseX92 database are added to the original speech to simulate noisy speech at different SNR. An average of 152% increase in recognition accuracy rate compared to classical MFCC is obtained in three different kinds of noises at 0dB SNR. The experimental results show that the performance of speech recognition systems can be greatly improved by using the new feature method under noisy environment.