lnreSgam: generalized inverse Gauss-Poisson (GIGP, gamma free)


lnreSgam (-h -mW -kX -KY -EZ -H -eR -sS -Nn -Vv -S) text.spc

The standard LNRE model based on the generalized inverse Gauss-Poisson distribution. Since there are no closed-form expressions for estimating the parameters of the full generalized inverse Gauss-Poisson model with free gamma, the program asks the user whether to use downhill simplex minimization or to provide interactive user-guided minimization. In the case downhill simplex minimization is selected, the program calculates E[V(N)] and E[V(1,N)] for the simpler model with gamma=-0.5. The user is offered the choice between using these parameters as starting point for minimization, or to specify another starting point. By default, cost function C_1 is used, but cost function C_{2}(r) can be selected as well using the -e option.

input

    text.spc: frequency spectrum

options

    -h: display on-line help

    -mW: number of ranks in fit is set to W (default: 15)

    -kX: number of chunks for interpolation is set to X (default: 20)

    -KY: number of chunks for extrapolation is set to Y (default: 20)

    -EZ: extrapolation sample size is set to Z (default: 2N_0)

    -H: input files do not have a header (default: header is presupposed)

    -eR: use cost function C_{2}(r) with r=R

    -sS: calculate only the expected spectrum for S ranks, output on textG.fsp

    -Nn: force N to equal n (in case of a partial spectrum)

    -Vv: force V(N) to equal v (in case of a partial spectrum)

    -S: calculate Good-Turing estimates (output in textG.str)

output

    text_G.spc: observed and expected frequency spectrum

          m: m (frequency)

          Vm: V(m,N) (frequency at sample size N)

          EVm: E[V(m,N)] (expected frequency at sample size N)

    text_G.fsp: expected frequency spectrum

          m: m (frequency)

          EVm: E[V(m,N)] (expected frequency at sample size N)

    text_G.sp2: expected frequency spectrum at

          m: m (frequency)

          EVm2N: E[V(m,2N)] (expected frequency at sample size 2N)

    text_G.ev2: vocabulary size statistics

          V: V(N) (observed vocabulary size at N)

          EV: E[V(N)] (expected vocabulary size at N)

          EV2N: E[V(2N)] (expected vocabulary size at 2N)

    text_G.int, textG.ext: interpolation and extrapolation statistics

          N: N (number of tokens)

          E[V(N)]: E[V(N)] (expected number of types)

          Alpha1: E[alpha(1)] (E[V(1,N)]/E[V(N)])

          EV1-5: E[V(1-5,N)] (expected spectrum elements)

          GV: E[V(N+1)] - E[V(N)] (token-unit growth rate)

    text_G.sum: summary statistics and estimated parameters

          N: N (number of tokens)

          V(N): V(N) (observed number of types)

          E[V(N)]: E[V(N)] (expected number of types)

          V(1,N): V(1,N) (observed number of hapax legomena)

          E[V(1,N)]: E[V(1,N)] (expected number of hapax legomena)

          S: S (population number of types)

          b: b (parameter)

          c: c (parameter)

          Z: Z = 1/c (parameter)

          gamma: gamma (parameter)

    text_G.str: Good-Turing estimates based on the GIGP fit

          m: the frequency spectrum

          mstar: the corresponding Good-Turing estimates

technical details

The Bessel function K_{v}(z) of real order v,

K_{v}(z) = frac{pi}{2} frac{I_{-v}(z) - I_{v}(z)}{sin(v pi)},

is itself defined in terms of the simpler function

I_{v}(z) = sum_{n=0}^{infty} frac{(z/2)^{v+2n}}{ n! Gamma(v+n+1) },

which is calculated up to the point where two successive terms of the sum differ by less than 1.0e-9. The downhill simplex minimization method is used for parameter estimation, using the subroutine amoeba of Press et al. (1988).

[ Previous | Index | Next ]