This program takes a text as input and produces a word frequency list, the frequency spectrum, and the observed developmental profile.
INPUT
text.txt: ASCII text file (SGML markup is ignored)
OPTIONS
-e: input text file does not have .txt extension
-s: short output (suppresses creation of text.zvc, text.zrk)
-n: do not attempt to remove Sgml code
-kZ: number of text chunks is set at Z (default: 20)
OUTPUT
text.obs: empirical developmental profile statistics
N: N (numer of tokens)
K: K (Yule's charactistic constant)
D: D (Simpson's diversity index)
V: V (numer of types)
V1: V(1,N) (numver of hapax legomena)
V2: V(2,N) (numver of dis legomena)
V3: V(3,N) (numver of tris legomena)
V4: V(4,N) (numver of types with frequency 4)
V5: V(5,N) (numver of types with frequency 5)
R: R (Guiraud's constant)
W: W (Brunet's constant)
S: S (Sichel's constant)
H: R (Honoré's constant)
C: C (Herdan's constant)
E: E (sample entropy)
lM: hat{mu} (mean log frequency)
lSt: hat{sigma} (standard deviation of log frequency)
b: b (parameter of Sichel's model)
c: c (parameter of Sichel's model)
a1: alpha(1,N) (relative number of hapax legomena)
Z: Z (parameter of extended Zipf's law)
fa: frequency of first word with specified Zipf rank
fthe: frequency of second word with specified Zipf rank
sLmean: sample mean of lognormal model
sLstdev: sample standard deviation of lognormal model
text.zvc: the text in Zipf-vector format
Word: the word tokens
z: the Zipf ranks of the corresponding word types
text.wfl: the word frequency list
Word: the word types omega_i
Frequency: the frequencies f(i,N) of these word types
text.spc: the frequency spectrum
m: the frequency rank m
Vm: V(m,N), the number of words with frequency m
text.zrk: the Zipfian rank-frequency list
z: the Zipf rank z
fz: f(z,N), the frequency of the word with Zipf rank z
text.sum: summary statistics for complete text
TECHNICAL DETAILS
The maximum number of different word types equals 40000, the maximum number of text chunks 40