Multilingual text-to-waveform with cross-speaker prosody transfer

SHANG Zengqiang; ZHANG Pengyuan; WANG Li

doi:10.12395/0371-0025.2022146

SHANG Zengqiang, ZHANG Pengyuan, WANG Li. Multilingual text-to-waveform with cross-speaker prosody transferJ. ACTA ACUSTICA, 2024, 49(1): 171-180. DOI: 10.12395/0371-0025.2022146

Citation:

SHANG Zengqiang, ZHANG Pengyuan, WANG Li. Multilingual text-to-waveform with cross-speaker prosody transferJ. ACTA ACUSTICA, 2024, 49(1): 171-180. DOI: 10.12395/0371-0025.2022146

Citation:

SHANG Zengqiang, ZHANG Pengyuan, WANG Li. Multilingual text-to-waveform with cross-speaker prosody transferJ. ACTA ACUSTICA, 2024, 49(1): 171-180. DOI: 10.12395/0371-0025.2022146

Multilingual text-to-waveform with cross-speaker prosody transfer

Graphical Abstract

Abstract

Abstract

For the multilingual speech synthesis task, due to the scarcity of single-person multilingual data, it becomes very difficult for one voice to support multilingual synthesis at the same time. Unlike previous methods that only decouple timbre and pronunciation within acoustic models, this paper proposes an end-to-end multilingual speech synthesis method that incorporates cross-speaker prosody transfer, which uses a two-level hierarchical conditional variational auto-encoder to directly model the generation process from text-to-waveform and decouples timbre, pronunciation, and prosody. The method improves the prosody of cross-lingual synthesis by transferring the prosody style of existing speakers in the target language. Experiments reveal that the proposed model achieves an average opinion score of 3.91 and 4.01 for naturalness and similarity in cross-lingual speech generation. Objective indicators also show that the word error rate of this method is reduced to 5.85% compared with baselines. Besides, prosody transfer and ablation experiments further prove the effectiveness of proposed method.

FullText(HTML)

References (32)

Cited By

Multilingual text-to-waveform with cross-speaker prosody transfer

Abstract

Catalog

Export File

Citation

Format

Content