Diverse style oriented many-to-many emotional voice conversion

ZHOU Jian; LUO Xiangyu; WANG Huabin; ZHENG Wenming; TAO Liang

doi:10.12395/0371-0025.2023192

ZHOU Jian, LUO Xiangyu, WANG Huabin, ZHENG Wenming, TAO Liang. Diverse style oriented many-to-many emotional voice conversion[J]. ACTA ACUSTICA, 2024, 49(6): 1297-1303. DOI: 10.12395/0371-0025.2023192

Citation:

Diverse style oriented many-to-many emotional voice conversion

Graphical Abstract

Graphical Abstract

Abstract

Abstract

To address the issues of insufficient emotional separation and lack of diversity in emotional expression in existing generative adversarial network (GAN)-based emotional voice conversion methods, this paper proposes a many-to-many speech emotional voice conversion method aimed at style diversification. The method is based on a GAN model with a dual-generator structure, where a consistency loss is applied to the latent representations of different generators to ensure the consistency of speech content and speaker characteristics, thereby improving the similarity between the converted speech emotion and the target emotion. Additionally, this method utilizes an emotion mapping network and emotion feature encoder to provide diversified emotional representations of the same emotion category for the generators. Experimental results show that the proposed emotion conversion method yields speech emotions that are closer to the target emotion, with a richer variety of emotional styles.

FullText(HTML)

References (20)

Cited By

Diverse style oriented many-to-many emotional voice conversion

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content