spectrum: construct frequency spectrum and profile data


spectrum (-kZ -e -s -n) text.txt

This program takes a text as input and produces a word frequency list, the frequency spectrum, and the observed developmental profile.

INPUT

   text.txt: ASCII text file (SGML markup is ignored)

OPTIONS

   -e: input text file does not have .txt extension

   -s: short output (suppresses creation of text.zvc, text.zrk)

   -n: do not attempt to remove Sgml code

   -kZ: number of text chunks is set at Z (default: 20)

OUTPUT

   text.obs: empirical developmental profile statistics

         N: N (numer of tokens)

         K: K (Yule's charactistic constant)

         D: D (Simpson's diversity index)

         V: V (numer of types)

         V1: V(1,N) (numver of hapax legomena)

         V2: V(2,N) (numver of dis legomena)

         V3: V(3,N) (numver of tris legomena)

         V4: V(4,N) (numver of types with frequency 4)

         V5: V(5,N) (numver of types with frequency 5)

         R: R (Guiraud's constant)

         W: W (Brunet's constant)

         S: S (Sichel's constant)

         H: R (Honoré's constant)

         C: C (Herdan's constant)

         E: E (sample entropy)

         lM: hat{mu} (mean log frequency)

         lSt: hat{sigma} (standard deviation of log frequency)

         b: b (parameter of Sichel's model)

         c: c (parameter of Sichel's model)

         a1: alpha(1,N) (relative number of hapax legomena)

         Z: Z (parameter of extended Zipf's law)

         fa: frequency of first word with specified Zipf rank

         fthe: frequency of second word with specified Zipf rank

         sLmean: sample mean of lognormal model

         sLstdev: sample standard deviation of lognormal model

   text.zvc: the text in Zipf-vector format

         Word: the word tokens

         z: the Zipf ranks of the corresponding word types

   text.wfl: the word frequency list

         Word: the word types omega_i

         Frequency: the frequencies f(i,N) of these word types

   text.spc: the frequency spectrum

         m: the frequency rank m

         Vm: V(m,N), the number of words with frequency m

   text.zrk: the Zipfian rank-frequency list

         z: the Zipf rank z

         fz: f(z,N), the frequency of the word with Zipf rank z

   text.sum: summary statistics for complete text

TECHNICAL DETAILS

The maximum number of different word types equals 40000, the maximum number of text chunks 40

[ Previous | Index | Next ]