连接时序分类准则声学建模方法优化

王智超; 张鹏远; 潘接林; 颜永红

doi:10.15949/j.cnki.0371-0025.2018.06.014

连接时序分类准则声学建模方法优化

Optimization of acoustic modeling method with connectionist temporal classification criterion

摘要

摘要: 对基于连接时序分类准则(connectionist temporal classification,CTC)的端到端声学建模方法进行研究和优化。研究分析了不同声学特征、建模单元以及神经网络结构对CTC声学模型性能的影响,针对CTC模型中blank符号共享导致的建模缺陷提出了建模单元相关的非共享blank方法进行改进,并引入融合建模单元关联信息的模型初始化方法进一步提高CTC模型的性能。在300小时标准英文数据集Switchboard的实验结果显示,结合非共享blank、时延神经网络以及融合建模单元关联信息的初始化方法,CTC声学模型相对于基线系统在词错误率上取得绝对1.1%的下降,同时在训练速度上取得3.3倍的提高,实验结果证明本文针对端到端声学建模提出的优化方法是有效的。

Abstract: The end-to-end acoustic modeling method based on connectionist temporal classification (CTC) criterion is studied and optimized in this paper. We study on the performance of CTC acoustic models with different acoustic features, modeling units and architectures. A modeling unit related unshared blank method is proposed to improve the modeling defects caused by the blank sharing in the CTC model. And a model initialization method that put the association information between the modeling units into the neural network is introduced to further improves the performance of the CTC model. Experiments were carried out on the 300-hour Switchboard dataset. Results show that the proposed CTC model trained with non-shared blanks, time-delay neural networks and the initialization method with association information between the modeling units achieves an absolute 1.1% reduction in word error rate as well as a 3.3-time speedup over the baseline system. The experimental results show that the proposed method is effective for end-to-end acoustic modeling.

HTML全文

参考文献(0)

施引文献

资源附件(0)