双哈希索引的高精度大规模音频样例检索
Retrieval method of large scale audio samples based on Double hashing index
-
摘要: 实时音频流中对大规模音频样例进行检索时,在保证准确率的条件下,检索速度直接影响音频流实时处理能力。提出一种基于双哈希索引的大规模音频样例检索方法。该方法通过对大规模音频样例的音频特征进行自相似量化后,分别根据自相似序列的分段向量均值和模值建立线性双哈希索引,然后在音频流中进行搜索,最后对搜索结果利用音频的时序和空间信息进行判断得到检索结果。实验结果表明,本方法实现了大规模音频样例的一次检索,且当采用12维MFCC音频特征,音频样例时长为16 s、音频样例规模小于3100时,音频样例的检索准确率在90%以上,检索速度大于12000倍速,最高达到16000倍速。该方法在有效提高检索精度的基础上,保证较高的检索速度。Abstract: The capacity of processing audio stream in real time is affected directly by the detection speed with detection accuracy guaranteed. A method based on double hashing index to test large-scale audio samples is proposed. The method first does weighted self-similarity to the audio feature, secondly establishes double linear hashing indexes to the mean and modulus of self-similarity sequence, then searches in the audio stream and judge the search results by temporal and spatial information to get the detection results. The results of experiments show that the method implements the one detection of large scale audio samples. The real time detection speed is above 12000 xRT, the largest detection speed is 16000 xRT, and the detection accuracy is above 90% when the duration of audio samples is 16 s and the number of audio samples is 3100. The method improves detection speed with higher detection accuracy guaranteed.