Abstract:
To improve the performance of automatic mispronunciation detection in computer-assisted language learning, a discriminative acoustic model training method is proposed. The method aims at maximizing the
F1-score of mispronunciation detection results on the annotated non-native speech database. The training objective function is formulated as a smooth form of the
F1-score by using the sigmoid function, and is optimized by using the extended Baum-Welch form like updating equations based on the weak-sense auxiliary function method. Simultaneous updating strategy of acoustic models and phone threshold parameters is proposed to ensure monotonicity of the objective function improvement. Mispronunciation detection experiments show that the method is effective in increasing the
F1-score,precision, recall and detection accuracy on both the training and evaluation data set.