Urban sound event annotation and recognition based on salience judgment
-
Graphical Abstract
-
Abstract
Urban sound environments are characterized by diverse and highly mixed sound sources, posing challenges for traditional sound event detection methods in efficiently extracting meaningful information. To address this issue, this study proposes an urban sound event annotation and recognition method based on salience judgment. Field recordings were conducted in public spaces across Dalian, China, and salient sound events were annotated from selected audio samples. To ensure label reliability, annotator classification ability and inter-rater consistency were evaluated, forming a dataset of salient sound events. A model was then trained and validated for salient event detection. Finally, the model’s performance was tested on real-world audio in terms of salience recognition and duration estimation. The results show that under a unified classification framework, annotators exhibited a high level of agreement in identifying salient sound events (Cohen’s Kappa = 79.19%), confirming the consistency of salience judgments. The deep learning model achieved a high cross-validation accuracy (91.3%), demonstrating strong capability in modeling salience. In generalization tests, the model accurately classified major event types (Precision > 0.95) and provided reliable duration estimates for these events, enabling insights into urban spatial usage patterns.
-
-