面向风格多样化的多对多语音情感转换
Diverse style oriented many-to-many emotional voice conversion
-
摘要: 针对现有基于生成对抗网络的语音情感转换仍然存在情感分离不明显, 且转换后的语音情感缺乏多样性问题, 提出了一种面向风格多样化的多对多语音情感转换方法。该方法基于一个双生成器结构的生成对抗网络模型, 通过对不同生成器的中间编码进行一致性损失约束确保语音内容和说话人特征具有一致性, 以提升转换后语音情感与目标情感的相似性。此外, 该方法通过情感映射网络和情感特征编码器为生成器提供同类情感的多样化情感表征。实验结果表明, 所提情感语音转换方法得到的语音情感更接近目标情感, 且情感样式更加丰富。Abstract: To address the issues of insufficient emotional separation and lack of diversity in emotional expression in existing generative adversarial network (GAN)-based emotional voice conversion methods, this paper proposes a many-to-many speech emotional voice conversion method aimed at style diversification. The method is based on a GAN model with a dual-generator structure, where a consistency loss is applied to the latent representations of different generators to ensure the consistency of speech content and speaker characteristics, thereby improving the similarity between the converted speech emotion and the target emotion. Additionally, this method utilizes an emotion mapping network and emotion feature encoder to provide diversified emotional representations of the same emotion category for the generators. Experimental results show that the proposed emotion conversion method yields speech emotions that are closer to the target emotion, with a richer variety of emotional styles.