EI / SCOPUS / CSCD 收录

中文核心期刊

时间及通道双维序列注意力音乐声源分离方法

Music source separation method based on time and channel dual-dimensional sequential attention

  • 摘要: 针对音乐中乐器声源表征特异性不足的问题, 结合乐器声源与曲式内容相关的结构特征, 提出了双维序列注意力/时域端到端音乐源分离方法。首先, 由于各乐器声源在曲式不同部分的出现具有显著规律性, 因此从时间及特征通道两个维度, 对特征基函数进行差异化注意力加权。其次, 在损失函数中引入频率多分辨因子, 同时从时域及频域衡量分离后声源和理想声源间的差异。在MUSDB18数据集上的实验结果表明, 同时强调声源的时域曲式结构特征和离散谐波特征, 可以进一步改善乐器声源分离效果。与目前最先进的时域端到端音乐源分离方法Demucs相比, 信噪比指标提升了0.40 dB, 且在鼓和低音等声源的分离上表现尤为突出, 鼓声源信噪比指标提升0.13 dB, 低音声源信噪比指标提升0.60 dB。充分利用声源的语义内容及声学特征等多维度先验知识, 可以进一步提升声源表征的特异性, 从而提升声源的可分离程度。

     

    Abstract: An end-to-end time-domain music source separation method based on dual-dimension sequential attention combing the structural characters related to instrument sound sources and contents of song pattern is proposed to address the insufficient specificity when characterizing the instrument sound sources in music. First, characteristic basis functions are weighted with different attention based on two dimensions, namely time and characteristic channels, because of the significant regularity of the occurrence of different instrument sound sources in different parts of the song pattern. Second, a multi-resolution frequency factor is introduced into the loss function to measure the difference between separated sound sources and ideal ones from both time and frequency domains at the same time. As shown by the experimental results on the MUSDB18 dataset, the separation results of instrumental sound sources can be improved by giving special attention to both the time-domain song pattern structure features and discrete harmonic features of the sound sources. Compared with Demucs, the most advanced end-to-end time-domain music source separation method, the signal-to-noise ratio index of this method is improved by 0.40 dB, with particularly outstanding performances in the separation of drum and bass audio sources, whose signal-to-noise ratio index is improved by 0.13 dB and 0.60 dB, respectively. The specificity of the characterization of sound sources can be improved through sufficient use of multi-dimensional a priori knowledge, such as the semantic content and acoustic feature of sound sources, thus improving the degree of separability of sound sources.

     

/

返回文章
返回