多任务学习框架下的声事件定位与检测损失函数设计
Loss function design for sound event localization and detection based on multi-task learning
-
摘要: 基于轨道输出的多任务学习方法在提升声事件定位与检测中重叠声源识别性能方面表现出色, 但当预测事件类别过多时会因输出稀疏导致声事件的漏报。为此, 提出了一种聚合损失函数, 通过将各类别的声事件活动性与笛卡尔波达方向向量相耦合, 把多任务学习网络转化为单任务学习问题。在此基础上, 针对多轨道输出的特性, 引入辅助复制的目标协同训练策略, 通过在非活动轨道中填充复制活动轨道的事件优化输出表现。基于包含170种事件类别的大规模合成测试集的实验结果表明, 该方法显著提升了声事件检测的性能, 有效降低了漏报率, 并在定位与轨迹追踪精度方面取得了明显改进。此外, 实际声学场景下录制数据的实验也验证了所提方法的有效性。Abstract: The track-wise multi-task learning approach exhibits significant efficacy in detecting overlapping sound sources for sound event localization and detection. However, as the number of predicted event classes increases, the track-wise multi-task networks often produce sparse outputs, resulting in missing alarms of sound events. To address this issue, this paper introduces an aggregated loss function, reformulating the multi-task learning framework into a single-task learning problem by coupling the activity of sound events with its Cartesian direction-of-arrival vector. Furthermore, considering the characteristics of the track-wise output format, auxiliary duplicated targets are introduced to optimize the system outputs by replicating events from active tracks into inactive ones. Experimental results on a large-scale synthetic test set with 170 event classes demonstrate that the proposed method significantly improves the performance in sound event detection, effectively reduces the missing alarm rate, and achieves substantial improvement in localization and trajectory tracking. Additionally, experimental results on the real-scene dataset demonstrate the effectiveness of the proposed methods.