Abstract:
In speech sound statistics, the word occurance deviates from Poisson distribution for random discrete variable. A new statistical distribution function is proposed in this paper to account for the law of occurance of speech words. It is reasoned that although a large vocabulary is in existence from which the words can be drawn arbitrarily, but the speaker draws only after deliberation. Thus, unlike the random occurance of simpler speech sounds, such as phonemes and letters, which obeys Poisson distribution for discrete random processes the occurance of words is a random process modulated by the selection and processing of the speaker. This process closely resembles the narrow-band filtering of a random signal, and it is considered that something like the corresponding Rayleigh distribution function might fit better. Actual counting and test of fitness proved this. The cumulative distribution function is exactly Rayleigh's except that only integral numbers are taken for the variable, the distribution function thus built up is named discrete Rayleigh distribution function. The phonemes, letters and auxiliary words may occur in conjunction with many words and they have the properties of these words, and therefore they distribute, according to the reasoning of central limit theorem (used to discrete variables), as Poisson's. Tests also show this. In the test of significant between the observed distribution and hypothetical distribution (Poisson or discrete Rayleigh) the graph observation, variance or standard deviation comparison and
x2 test were used. The latter is strict and the first is visual, but the variance comparison is simple and also quantitative. And in some cases the visual and variance methods show clear preference which the
x2 test is not quite definite The simple way is to observe on the graph and then compare the variances.