This program fits the Naranan-Balasubrahmanyan model and computes Good-Turing estimates.
input
text.spc: frequency spectrum
options
-h: display on-line help
-mW: number of ranks in fit is set to (default: 15)
-H: input files do not have a header (default: with header)
-eR: use cost function C_{2}(r) with r=R
-v: include |V(N)-E[V(N)]| in cost function
-n: include |N-Nfit| in cost function
output
text_N.spc: observed and expected frequency spectrum
m: m (frequency)
Vm: V(m,N) (frequency at sample size N)
SVm: V_{r}(m,N) (real-valued spectrum)
EVm: E[V(m,N)] (expected frequency at sample size N)
mStar: m* (Good-Turing estimate using E[V(m,N)])
mStarRaw: m* (Good-Turing estimate using V_{r}(m,N)))
StdevMstar: standard deviation of m*
text_N.sum: summary statistics and estimated parameters
N: N (number of tokens)
V(N): V(N) (observed number of types)
E[V(N)]: E[V(N)] (expected number of types)
V(m,N): V(m,N)
E[V(m,N)]: E[V(m,N)]
C: C (first parameter)
mu: mu (second parameter)
gamma: gamma (third parameter)
MSE: mean squared error
Nfit: sum_m mE[V(m,N)]
Nproxy: sum_{m=1}^{mmax} mV_{r}(m,N)
Vproxy: sum_{m=1}^{mmax} V_{r}(m,N)