EI / SCOPUS / CSCD 收录

中文核心期刊

ZHANG Siyu, XIE Lingyun, ZHAO Zhijun. Analysis of human-machine perception differences in speech forgery detection[J]. ACTA ACUSTICA, 2025, 50(6): 1652-1664. DOI: 10.12395/0371-0025.2024234
Citation: ZHANG Siyu, XIE Lingyun, ZHAO Zhijun. Analysis of human-machine perception differences in speech forgery detection[J]. ACTA ACUSTICA, 2025, 50(6): 1652-1664. DOI: 10.12395/0371-0025.2024234

Analysis of human-machine perception differences in speech forgery detection

  • Based on the perceptual characteristics of the human ear regarding speech naturalness, this paper delves into the human ear’s ability to detect speech forgery and the differences between human and machine detection results. This paper presents a subjective discrimination experiment focusing on Chinese synthetic speech to compare the accuracy of speech detection between human listeners and machines. It analyzes the influence of objective features on human-machine detection results and further contrasts the similarities and differences of human-machine detection results from the viewpoints of advanced features and naturalness influencing factors. Experimental data indicate that timbre, emotion, and rhythm all assist the human ear in discrimination. Compared to synthetic signals, human ears can recognize natural signals more precisely. Additionally, the gender of the speaker and different speech synthesis algorithms lead to variations in human-machine detection results. Further analysis based on objective acoustic characteristics reveals that the wider the dynamic range and the larger the mean of the zero-crossing rate, the more challenging it is for the human ear to make judgments. Real audio with a large 75% spectral roll-off dynamic range and a narrow fundamental frequency dynamic range is more conducive for both humans and machines to make correct judgments. The more stable the change in logarithmic spectral flatness and the smaller the mean, the easier it is for humans and machines to successfully detect synthesized audio.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return