采用2D-Haar声学特征超向量的快速特定音频识别方法
A rapid audio event detection method by adopting 2D-Haar acoustic super feature vector
-
摘要: 针对特定音频事件识别技术在大数据音频处理任务中的准确性和快速性问题,提出一种基于2D-Haar声学特征超向量和AdaBoost算法的快速特定音频事件泛化识别方法。首先将多个连续音频帧的常用声学特征构成“声学特征图”,进而提取维数高达数十万的Haar-like声学特征,然后使用AdaBoost.MH或速度较快的Random AdaBoost特征筛选算法,筛选出较高代表性的Haar-like声学特征模式组合,从而构成2D-Haar声学特征超向量;最后分析特定音频事件子类间的共性和差异性,提取子类别的共性,弱化子类间的差异,训练后得到一个泛化的音频事件模板,可支持多子类的泛化识别,能够准确检测并定位音频流中的特定音频事件。实验结果表明,使用2D-Haar声学特征超向量可以获得比MFCC,PLP,LPCC等常用声学特征约5%的识别精度提升、7~20倍的训练速度提升和5-10倍的识别速度提升,在网格法寻得最优参数配置下,可获得93.38%的准确率,95.03%的查全率,这为大数据量的特定音频事件识别提供了一种准确快速的处理方法。Abstract: Aiming at the problem of accuracy and rapidity of audio event detection in the mass-data audio processing tasks, a generic method of rapidly recognizing audio event based on 2D-Haar acoustic super feature vector and AdaBoost is proposed. Firstly, it combines certain number of continuous audio frames to be an "acoustic feature image", secondly, uses AdaBoost.MH or fast Random AdaBoost feature selection algorithm to select high representative 2D-Haar pattern combinations to construct super feature vectors; thirdly, analyzes the commonality and differences between subcategories, then extracts common features and reduces different features to obtain a generic audio event template, which can support the accurate identification of multiple sub-classes and detect and locate the specific audio event from the audio stream accurately. Experimental results show that the use of 2D-Haar acoustic feature super vector can make recognition accuracy 5% higher than ones that MFCC, PLP, LPCC and other traditional acoustic features yielded, and can make the training processing 7~20 times faster and the recognition processing 5~10 times faster, it can even achieve an average precision of 93.38%, an average recall of 95.03% under the optimal parameter configuration found by grid method. Above all, it can provide an accurate and fast mass-data processing method for audio event detection.