spectfit: Naranan-Balasubrahmanyan smoother, Good-Turing estimation


spectfit (-h -mW -H -eR -n -v) text.spc

This program fits the Naranan-Balasubrahmanyan model and computes Good-Turing estimates.

input

    text.spc: frequency spectrum

options

    -h: display on-line help

    -mW: number of ranks in fit is set to (default: 15)

    -H: input files do not have a header (default: with header)

    -eR: use cost function C_{2}(r) with r=R

    -v: include |V(N)-E[V(N)]| in cost function

    -n: include |N-Nfit| in cost function

output

    text_N.spc: observed and expected frequency spectrum

          m: m (frequency)

          Vm: V(m,N) (frequency at sample size N)

          SVm: V_{r}(m,N) (real-valued spectrum)

          EVm: E[V(m,N)] (expected frequency at sample size N)

          mStar: m* (Good-Turing estimate using E[V(m,N)])

          mStarRaw: m* (Good-Turing estimate using V_{r}(m,N)))

          StdevMstar: standard deviation of m*

    text_N.sum: summary statistics and estimated parameters

          N: N (number of tokens)

          V(N): V(N) (observed number of types)

          E[V(N)]: E[V(N)] (expected number of types)

          V(m,N): V(m,N)

          E[V(m,N)]: E[V(m,N)]

          C: C (first parameter)

          mu: mu (second parameter)

          gamma: gamma (third parameter)

          MSE: mean squared error

          Nfit: sum_m mE[V(m,N)]

          Nproxy: sum_{m=1}^{mmax} mV_{r}(m,N)

          Vproxy: sum_{m=1}^{mmax} V_{r}(m,N)

[ Previous | Index | Next ]