Urban sound event annotation and recognition based on salience judgment

ZHANG Wei; LU Xiaodong; MA Jianjun; ZHU Peisheng; XIE Zhuangxiu; XIONG Wenbo

doi:10.12395/0371-0025.2025140

ZHANG Wei, LU Xiaodong, MA Jianjun, ZHU Peisheng, XIE Zhuangxiu, XIONG Wenbo. Urban sound event annotation and recognition based on salience judgmentJ. ACTA ACUSTICA, 2026, 51(2): 405-417. DOI: 10.12395/0371-0025.2025140

Citation:

Urban sound event annotation and recognition based on salience judgment

Graphical Abstract

Abstract

Abstract

Urban sound environments are characterized by diverse and highly mixed sound sources, posing challenges for traditional sound event detection methods in efficiently extracting meaningful information. To address this issue, this study proposes an urban sound event annotation and recognition method based on salience judgment. Field recordings were conducted in public spaces across Dalian, China, and salient sound events were annotated from selected audio samples. To ensure label reliability, annotator classification ability and inter-rater consistency were evaluated, forming a dataset of salient sound events. A model was then trained and validated for salient event detection. Finally, the model’s performance was tested on real-world audio in terms of salience recognition and duration estimation. The results show that under a unified classification framework, annotators exhibited a high level of agreement in identifying salient sound events (Cohen’s Kappa = 79.19%), confirming the consistency of salience judgments. The deep learning model achieved a high cross-validation accuracy (91.3%), demonstrating strong capability in modeling salience. In generalization tests, the model accurately classified major event types (Precision > 0.95) and provided reliable duration estimates for these events, enabling insights into urban spatial usage patterns.

FullText(HTML)

References (27)

Cited By

Urban sound event annotation and recognition based on salience judgment

Abstract

Catalog

Export File

Citation

Format

Content