深浅层特征及模型融合的说话人识别
Fusion of deep shallow features and models for speaker recognition
-
摘要: 为了进一步提高说话人识别系统的性能,提出基于深、浅层特征融合及基于I-Vector的模型融合的说话人识别。基于深、浅层特征融合的方法充分考虑不同层级特征之间的互补性,通过深、浅层特征的融合,更加全面地描述说话人信息;基于I-Vector模型融合的方法融合不同说话人识别系统提取的I-Vector特征后进行距离计算,在系统的整体结构上综合了不同说话人识别系统的优势。通过利用CASIA南北方言语料库进行测试,以等错误率为衡量指标,相比基线系统,基于深、浅层特征融合的说话人识别其等错误率相对下降了54.8%,基于I-Vector的模型融合的方法其等错误率相对下降了69.5%。实验结果表明,深、浅层特征及模型融合的方法是有效的。Abstract: We propose a features fusion and a models fusion approach for speaker recognition to further improve the performance of speaker recognition. The proposed method of deep and shallow features fusion describes the speaker information more comprehensively because of the complementarity between different level features; the other method fusions the I-Vector extracted from different speaker recognition systems and can combine the advantages of different speaker recognition system. The experimental results show that, the relative improvements from the proposed framework compared to a state-of-the-art system are of 54.8% and 69.5% relative at the equal error rate when evaluated on the CASIA North and South dialect corpus. Proved that the proposed method is effective.