Mispronunciation detection and diagnosis with acoustic pronunciation model aided modeling
-
-
Abstract
For Mispronunciation Detection and Diagnosis (MDD) tasks, expert-annotated data are scarce. To efficiently model pronunciation regularities on limited data and then aid MDD systems, an acoustic pronunciation model that integrates both acoustic and textual information is proposed. It models the mispronunciation generation process in a more theoretically complete way. Based on the acoustic correlation of different parts of this process, the model achieves aided modeling by sharing the acoustic encoder network parameters with the phoneme recognition model and optimizing it jointly in a multi-task learning manner. Moreover, the acoustic confidence masking-prediction training approach is proposed to further strengthen the correlation between the two tasks and improve the efficiency of aided modeling. Experiments show that the acoustic pronunciation model can effectively model mispronunciation regularities. With its aid in phoneme recognition modeling, the MDD system showed 4.9%, 9.5%, and 14.0% improvement in mispronunciation detection, diagnosis, and phoneme recognition, respectively. The acoustic confidence masking-prediction training method improves the efficiency of aided modeling, and both the masking parameters and the multi-task learning parameters can affect the effectiveness of aided modeling.
-
-