This program takes a word frequency list as input and outputs the frequency spectrum.
input
text.wfl: a word frequency list with columns labeled Word and Frequency
options
-e: input does not have .wfl extension
-m: input does not have a header
output
text.sum: summary statistics:
N: N (number of tokens)
K: K (Yule's charactistic constant)
D: D (Simpson's diversity index)
V: V (number of types)
V1: V(1,N) (number of hapax legomena)
V2: V(2,N) (number of dis legomena)
V3: V(3,N) (number of tris legomena)
V4: V(4,N) (number of types with frequency 4)
V5: V(5,N) (number of types with frequency 5)
R: R (Guiraud's constant)
W: W (Brunet's constant)
S: S (Sichel's constant)
H: H (Honore's constant)
C: C (Herdan's constant)
E: E (sample entropy)
lM: mu (mean log frequency)
lSt: sigma (standard deviation of log frequency)
b: b (parameter of Sichel's model)
c: c (parameter of Sichel's model)
a1: alpha(1,N) (relative number of hapax legomena)
Z: Z (parameter of extended Zipf's law)
fa: not implemented (available in the input file)
fthe: not implemented (available in the input file)
sLmean: sample mean of lognormal model
sLstdev: sample standard deviation of lognormal model
text.spc: the frequency spectrum
m: the frequency rank m
Vm: V(m,N), the number of words with frequency m
technical details
The maximum number of different word types that can be accomodated is 40000.