汉语听觉视觉双模态数据库CAVSR1.0
Chinese audiovisual bimodal speech database CAVSR1.O
-
摘要: 听觉视觉双模态语音识别在国际上已经逐渐成为当前语音识别的热点之一,汉语的双模态识别研究也已开始启动。然而,由于视觉信息获取及处理难度极大,目前的双模态语音数据库的建设尚显薄弱,汉语方面更是空白。鉴于此,我们在进行听觉视觉双模态语音识别关键技术研究的同时,在分析国外同类数据库的结构的基础上,结合汉语语音的特点,建立了汉语语音的第一个双模态数据库CAVSR1.0。它具有如下特点:采用的语料涵盖所有声韵母,其规模(总数据量、音节量)超出目前国际上同类数据库;语料分布符合汉语声韵母的实际分布概率,因此其反映的规律具有代表性;捆绑了自动音节分割程序及脸部主要特征标定程序,使数据库具有很强的可扩展性。Abstract: Audiovisual bimodal speech recognition has been one of the most promising branch in international speechrecognition area,Chinese bimodal speech recognition has also started.But,as the visual information capturing is verydifficult,there are a few audiovisual speech databases developed,and there are no such Chinese database yet.So wedesigned and created the audiovisual bimodal speech database CAVSR1.0.t has following advantages:Its corpus includesall Chinese phonetic units (initials and finals),and its size is very large.Its corpus selection conforms to the distributionprobability of initials and finals,conclusions from it could stand for Chinese language.There are automatic segmentingand automatic main features labeling bound with it,so it has good extendibility.The database may be enlarged (numberof subjects,speech data,repetitions) conveniently according to research requirements.