EI / SCOPUS / CSCD 收录

中文核心期刊

一种适于说话人识别的非线性频率尺度变换

A non-linear frequency transform for speaker recognition

  • 摘要: 传统的非线性频率尺度变换虽然能够反映人类听觉系统(HAS:Human Auditory System)的感知特性,但不能区别对待语音中包含的语义和个性特征,在表达说话人个性特征方面并不充分。通过分析语音信号不同频带短时谱对说话人识别性能的影响,采用最小二乘法多项式曲线拟合技术,提出了一种非线性频率尺度变换。实验表明,与传统的Mel、Bark和ERB频率尺度变换相比,在同样的训练与测试条件下,平均误识率分别降低70.5%,60.8%和70.5%。这一结果说明,本文提出的非线性频率尺度变换有效地增强了短时谱的说话人个性特征,能够提高说话人识别系统的性能。

     

    Abstract: The classical frequency transform can describe perception characteristics of human auditory system, but can not relatively enhance speaker's individuality in short-time spectrum of speech. A non-linear frequency transform and feature detection algorithm are proposed based on analyzing contribution of short-time spectrum in different frequency sub-bands and using of polynomial curve fitting. The experimental results show that the proposed non-linear frequency transform can improve the performance effectively in comparison with classical non-linear frequency transform such as Mel, Bark and ERB. In the same condition, the average error rate falls about 70.5%, 60.8% and 70.5% respectively. The proposed frequency scale and feature detection algorithm can strengthen the individual personality and improve the recognition performance.

     

/

返回文章
返回