双路注意力循环网络的轻量化语音分离
Light-weight speech separation based on dual-path attention and recurrent neural network
-
摘要: 提出了双路注意力循环网络的轻量化语音分离方法。首先, 该方法使用基于“双路注意力机制”和“双路循环网络”的可选择分支结构对语音信号进行建模, 从而提取深层特征信息并降低模型的参数量。其次, 引入子带处理技术, 从而降低模型的计算量。在LibriCSS数据集上的实验结果表明, 该方法取得的平均词错误率为8.6%, 且参数量和计算量分别仅为0.15 MiB和15.2 G/6s, 与当前主流方法相比, 分别减小了3.3~391.3倍和1.1~3.2倍。这表明, 所提方法在取得高语音分离性能的同时, 能有效地降低模型的参数量和计算量。Abstract: A light-weight speech separation algorithm based on dual-path attention and recurrent neural network is proposed. First, optional branch structures based on dual-path attention mechanism and dual-path recurrent network are utilized to model the speech signals, which facilitate the extraction of deep feature information and the reduction of training parameters. Second, sub-band processing approach is introduced to alleviate the computation burden. As shown by the experimental results on the LibriCSS dataset, the average word error rate obtained by the proposed algorithm is 8.6% with only 0.15 MiB training parameters and 15.2 G/6s computation cost, which is 3.3−391.3 and 1.1−3.2 times smaller than other mainstream approaches. This proves the proposed algorithm can effectively reduce the training parameters and computation cost while achieving high speech separation performance.