EI / SCOPUS / CSCD 收录

中文核心期刊

CHENG Gaofeng, LU Haitian, GUO Yao, SHANG Zengqiang, LI Xuyuan. Enhancing speaker diversity through dual-optimizing loop between text-to-speech and speaker recognition[J]. ACTA ACUSTICA. DOI: 10.12395/0371-0025.2024028
Citation: CHENG Gaofeng, LU Haitian, GUO Yao, SHANG Zengqiang, LI Xuyuan. Enhancing speaker diversity through dual-optimizing loop between text-to-speech and speaker recognition[J]. ACTA ACUSTICA. DOI: 10.12395/0371-0025.2024028

Enhancing speaker diversity through dual-optimizing loop between text-to-speech and speaker recognition

  • To address the challenge of speaker diversity scarcity in real-world speech data and to achieve bi-directional optimization of multi-speaker speech synthesis and speaker recognition models based on generated speech data, the dual optimization loop (DOL) method is proposed. The DOL comprehensively utilizes the data generation capability of multi-speaker speech synthesis models and the discriminative ability of speaker recognition models, aiming to expand the speaker diversity of a limited manually labeled speech dataset through the generation and filtering of speech data and achieve bidirectional optimization of the loop system. Experimental results on Aishell-1, Aishell-3, MagicData-READ and LibriTTS illustrate that the proposed approach, when integrated with conventional data augmentation techniques, proficiently expands speaker diversity within speech data. Consequently, this enhancement markedly advances the generalization ability of multi-speaker speech synthesis models and the discriminative power of speaker recognition models.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return