基于熵函数的耳语音声韵分割法
Entropy-based initial/final segmentation for Chinese whiskered speech
-
摘要: 耳语音声韵分割是耳语音识别和转换的前期工作。由于耳语发音不同于正常音,一般用于正常音的声韵分割法对耳语音不再适用。通过分析耳语音的发音及声学特点,利用宽带语谱图的声韵变化规律,提出了适用于耳语音的信息熵端点检测法,以及相对熵、音长和谱重心相结合的声韵分割法。并对两组信噪比为2-10 dB的380个汉语单音节耳语音进行声韵分割,女声的正确率为87.9%,男声的正确率为90.3%,高于频域法、聚类法和谱平坦度声韵分割法。实验表明,相对熵法可做为耳语音识别和转换的预处理,它改善了汉语耳语音转换为正常音的音质。Abstract: The Initial/Final(IF) segmentation of whispered speech is the pre-processing in the whispered speech recognition and the reconstruction of normal speech from whisper. However, because the whispered initials and finals are all unvoiced, it is difficult to segment them by the methods used in the normal speech. With tile characteristics analysis of Chinese whispered speech, a new segmentation method is proposed. The speech endpoint is detected by the entropy function, and the initial/final boundary is obtained by the decision of the initial duration, the symmetric relative entropy and the normalized spectral center of gravity. The correct segmentation rates are 87.9% for the female data and 90.3% for the male data in the test with 380 Chinese whispered syllables at 2-10 dB SNR. It is more accuracy than the frequency domain method, the clustering method and the spectral flatness method. As shown in the experiments, this algorithm can be used as pre-processing in the whispered speech recognition and the conversion. It gives the reconstructed speech a more natural quality.