EI / SCOPUS / CSCD 收录

中文核心期刊

语音出现的统计分布

STATISTICAL DISTRIBUTION OF SPEECH SOUNDS

  • 摘要: 在语音统计中,发现语词出现不符合无规出现的泊松分布。作者提出一个新的统计分布函数,用以描述语言中语词出现规律。考虑到虽然有大量语词供人自由选择,但实际使用是经过作者深思的。因此语词的出现有随机性质,也有经过选择、处理的痕迹。这种选择、处理与使一无规信号(例如,电噪声)通过一窄带滤波器很相似。因此估计相应的瑞利分布可能更为适合。试验结果正是如此,即语词的累计分布符合瑞利累计分布函数(但其变数只取整数值,因语词数是整数)。由此建立起的累计分布同于瑞利分布而变数只取整数值的分布函数称为离散瑞利分布函数。语音中的语助词、音素、字母等都是与多个语词同时出现的,其分布按统计学的中值极限定理(用于离散过程)的概念应是无规的,即遵守泊松分布函数。试验也证实此点。在试验实际分布与设想的分布(泊松分布或离散瑞利分布)是否适合时,所用方法有三种,即分布图上观察、均方差或标准偏差比较和χ2试验。认为在三种方法中分布图上的观察比较直观,χ2试验比较严谨,而标准偏差比较则比较简单也有定量意义。简单方法可能是在图上观察后,再用标准偏差定量比较。

     

    Abstract: In speech sound statistics, the word occurance deviates from Poisson distribution for random discrete variable. A new statistical distribution function is proposed in this paper to account for the law of occurance of speech words. It is reasoned that although a large vocabulary is in existence from which the words can be drawn arbitrarily, but the speaker draws only after deliberation. Thus, unlike the random occurance of simpler speech sounds, such as phonemes and letters, which obeys Poisson distribution for discrete random processes the occurance of words is a random process modulated by the selection and processing of the speaker. This process closely resembles the narrow-band filtering of a random signal, and it is considered that something like the corresponding Rayleigh distribution function might fit better. Actual counting and test of fitness proved this. The cumulative distribution function is exactly Rayleigh's except that only integral numbers are taken for the variable, the distribution function thus built up is named discrete Rayleigh distribution function. The phonemes, letters and auxiliary words may occur in conjunction with many words and they have the properties of these words, and therefore they distribute, according to the reasoning of central limit theorem (used to discrete variables), as Poisson's. Tests also show this. In the test of significant between the observed distribution and hypothetical distribution (Poisson or discrete Rayleigh) the graph observation, variance or standard deviation comparison and x2 test were used. The latter is strict and the first is visual, but the variance comparison is simple and also quantitative. And in some cases the visual and variance methods show clear preference which the x2 test is not quite definite The simple way is to observe on the graph and then compare the variances.

     

/

返回文章
返回