wfl2spc: construct frequency spectrum from word frequency list


wfl2spc -e -m text.txt

This program takes a word frequency list as input and outputs the frequency spectrum.

input

    text.wfl: a word frequency list with columns labeled Word and Frequency

options

    -e: input does not have .wfl extension

    -m: input does not have a header

output

    text.sum: summary statistics:

          N: N (number of tokens)

          K: K (Yule's charactistic constant)

          D: D (Simpson's diversity index)

          V: V (number of types)

          V1: V(1,N) (number of hapax legomena)

          V2: V(2,N) (number of dis legomena)

          V3: V(3,N) (number of tris legomena)

          V4: V(4,N) (number of types with frequency 4)

          V5: V(5,N) (number of types with frequency 5)

          R: R (Guiraud's constant)

          W: W (Brunet's constant)

          S: S (Sichel's constant)

          H: H (Honore's constant)

          C: C (Herdan's constant)

          E: E (sample entropy)

          lM: mu (mean log frequency)

          lSt: sigma (standard deviation of log frequency)

          b: b (parameter of Sichel's model)

          c: c (parameter of Sichel's model)

          a1: alpha(1,N) (relative number of hapax legomena)

          Z: Z (parameter of extended Zipf's law)

          fa: not implemented (available in the input file)

          fthe: not implemented (available in the input file)

          sLmean: sample mean of lognormal model

          sLstdev: sample standard deviation of lognormal model

    text.spc: the frequency spectrum

          m: the frequency rank m

          Vm: V(m,N), the number of words with frequency m

technical details

The maximum number of different word types that can be accomodated is 40000.

[ Previous | Index ]