mcdisp: dispersion analysis


mcdisp (-kX -pY -sZ -H) text.zvc text.spc

Monte Carlo based dispersion analysis.

input

    text.zvc: text in vector format with Zipf ranks

    text.spc: the corresponding frequency spectrum

options

    -kX: number of text chunks is set to X

    -pY: number of permutation runs is set to Y

    -sZ: seed for random generator is set to Z

    -H: input files do not have the standard header

output

    text.mcd: list with for each word type:

          z: the Zipf rank z

          Frequency: f(i,N)

          Obs: observed dispersion d_i

          Exp: expected dispersion E[d_i] using the binomial model

          StDev: the corresponding standard deviation

          Z: the corresponding Z-score

          MCperc: proportion of simulation runs with dispersion <= d_i.

    text.fik: list of word types and their frequencies for each text chunk

technical details

The maximum text length currently implemented equals 100000 word tokens,the maximum number of types 20000, and the maximum number of text chunks 100.

[ Previous | Index | Next ]