基于显著性判断的城市声事件标注与识别方法

张伟; 路晓东; 马建军; 祝培生; 谢庄秀; 熊文波

doi:10.12395/0371-0025.2025140

基于显著性判断的城市声事件标注与识别方法

Urban sound event annotation and recognition based on salience judgment

摘要

摘要: 城市声环境中声源类型繁多且高度混合, 传统声事件检测方法难以从中高效提取有价值的声源信息。为解决此问题, 提出了一种基于显著性判断的声事件标注与识别方法。首先在大连市公共空间采集录音, 筛选样本并进行显著声事件标注, 为了获得可靠的标签, 进行了标注者分类能力评估与一致性分析, 以此构建显著声事件数据集; 随后训练并验证显著声事件检测模型; 最后测试模型对真实数据的显著性判断结果, 以及对声事件时长的估计能力。结果表明: 基于同一分类认知体系, 标注者在判断声音样本中显著的声事件时高度一致(Cohen’s Kappa = 79.19%), 验证了判断的稳定性; 基于深度学习的显著声事件检测模型交叉验证正确率较高(91.3%), 体现出良好建模能力; 模型在泛化测试中能够实现对主要事件类型的精确分类(精确度 > 0.95), 以及较为准确的时长估计, 有助于对城市空间使用规律的捕捉。

Abstract: Urban sound environments are characterized by diverse and highly mixed sound sources, posing challenges for traditional sound event detection methods in efficiently extracting meaningful information. To address this issue, this study proposes an urban sound event annotation and recognition method based on salience judgment. Field recordings were conducted in public spaces across Dalian, China, and salient sound events were annotated from selected audio samples. To ensure label reliability, annotator classification ability and inter-rater consistency were evaluated, forming a dataset of salient sound events. A model was then trained and validated for salient event detection. Finally, the model’s performance was tested on real-world audio in terms of salience recognition and duration estimation. The results show that under a unified classification framework, annotators exhibited a high level of agreement in identifying salient sound events (Cohen’s Kappa = 79.19%), confirming the consistency of salience judgments. The deep learning model achieved a high cross-validation accuracy (91.3%), demonstrating strong capability in modeling salience. In generalization tests, the model accurately classified major event types (Precision > 0.95) and provided reliable duration estimates for these events, enabling insights into urban spatial usage patterns.

HTML全文

参考文献(27)

施引文献

资源附件(0)