2. Kaplan-Meier Estimates


{cil, kme, ciu} = 1720 hazkpm(data{, alpha})
calculates Kaplan-Meier estimates and confidence bounds for the survival function
Let $ t_{(1)} < t_{(2)} <\ldots < t_{(m)}$ denote the distinct times in which an event was observed, $ d_{i}$ the number of events that occurred at time $ t_{(i)}$, and $ r_{i}$ the size of the risk set at time $ t_{(i)}$. The Kaplan-Meier estimate for a survival function, also called product-limit estimate, is given by

$ \hat S(t) = \left\{ 
\begin{array}{ccc}
1, & \textrm{if} & t < t_{(1)}, \\  
\...

...frac{d_{i}}{r_{i}}\right], & \textrm{if} & 
t_{(1)} \leq t.
\end{array}
\right.$ (1)

The Kaplan-Meier estimate $ \hat S(t)$ is a right-continuous step function with jumps in the event times. Censoring times affect the estimate only by reducing the risk set for next event, and thereby increasing the hight of the next jump. In the presence of censoring, Greenwood (1926) suggested the following estimate for the variance of the Kaplan-Meier estimate:

$ \hat V(t) = \hat S(t)^{2}\sum_{t_{(i)}\leq t}\frac{d_{i}}{r_{i}(r_{i}-d_{i})}.$ (2)

The Kaplan-Meier estimate $ \hat S(t)$ is asymptotically normally distributed. This leads to the following pointwise confidence intervals for the survival function, $ \hat S(t)$,

$ \left[ \hat S(t) - z_{1-\alpha/2} \hat V(t)^{1/2}, \ \hat S(t) +
z_{1-\alpha/2} \hat V(t)^{1/2} \right],$ (3)

where $ (1-\alpha)$ is the coverage probability, $ z_{p}$ denotes the $ p\times 100$-th percentile of the standard normal distribution, and $ \hat V(t)$ is Greenwood's estimate of the variance of $ \hat S(t)$, given in formula (2). Note that Greenwood's estimate tends to slightly underestimate the true variance, so that the true coverage probability of the confidence intervals might be somewhat smaller than stated. The quantlet 1725 hazkpm computes the Kaplan-Meier estimates and confidence bounds of the survival function using formulae (1) and (3). It requires that the data are organized in the specific form as provided by 1728 hazdat. The syntax is given below:



  {cil,kme,ciu} = hazkpm(data {,alpha})

Input:
data
$ n \times (p+4)$ matrix, the sorted data matrix given by the output data of 1731 hazdat;
alpha
scalar, the specified error rate of the confidence interval, default option is $ 0.05$ (coverage probability of $ 0.95$).
Output:
cil
$ n \times 2$ matrix, the first column consists of the sorted $ t_i$, the second column contains the Greenwood lower confidence bounds at $ t_i$, defined in (3);
kme
$ n \times 2$ matrix, the first column consists of the sorted $ t_i$, the second column contains the Kaplan-Meier estimates at $ t_i$;
ciu
$ n \times 2$ matrix, the first column consists of the sorted $ t_i$, the second column contains the Greenwood upper confidence bounds at $ t_i$, defined in (3).
By definition, the Kaplan-Meier estimate $ \hat S(t)$ is a right-continuous step function. The quantlet 1734 hazkpm supplies the coordinates $ ( t_{i}, \hat S(t_{i}))$ of the upper left corners of each step, as well as coordinates of pointwise confidence limits for the $ S(t_{i})$, $ ( t_{i}, {\tt cil}(t_{i}))$ and $ ( t_{i}, {\tt ciu}(t_{i}))$. Note that the output of 1737 hazkpm provides one row for each observed time $ t_i$, censored or uncensored. In the case of ties, the rows are repeated. The quantlet 1740 steps4plot provides support for plotting step functions. Given the coordinates of the upper left corners and the leftmost starting point, quantlet 1743 steps4plot adds the coordinates of the lower right corner points in the correct order. Optionally, a right endpoint may be specified. The output is a $ (2n+2)\times 2$ matrix of point coordinates. The step function may then be drawn into a graph by connecting consecutive output points with line segments. Syntax of 1746 steps4plot:



  {xyline}=steps4plot(xy {,xymin} {,xmax})

1750 haz04.xpl

Input:
xy
$ n \times 2$ matrix, coordinates $ (x_{i}, y_{i})$ of the jump points of a right-continuous step function which jumps in $ x_{i}$ to value $ y_{i}$. The $ x_{i}$ (first column) are required to be sorted in ascending order.
xymin
$ 1 \times 2$ matrix, coordinates of the leftmost starting point of the plotted step function. Default is the first row in xymin. If xymin[1,1] $ >$ xy[1,1], then the leftmost starting point is set to the first row of xy.
xmax
scalar, $ x$-coordinate of the rightmost endpoint.
Default: xmax = xy[n,1] + 0.01*(xy[n,1] - xy[1,1]), adding 1 % of the $ x$ range to the last jump point. If xmax $ <$ xy[n,1], then xmax is set to xy[n,1], the last jump point.
Output:
xyline
$ (2n+2)\times 2$ matrix, rows are coordinates of the starting point, the lower right and the upper left corner points, and the end point of a step function with jumps in $ x_{i}$ to value $ y_{i}$ (given in input xy). Connecting consecutive points with lines draws a plot of the step function.
Example 4. We illustrate the use of 1755 hazkpm and 1758 steps4plot by plotting a Kaplan-Meier estimate and Greenwood's confidence limits for simulated data. The data are provided in the file haz01.dat. They were obtained by generating $ n=20$ independent, uniformly distributed covariate values $ z_i=(z_{1 i}, z_{2 i})^T$, with $ z_{k i} \sim U[-0.5, 0.5]$, $ k=1, 2,\ $ $ i=1,\ldots, n; \,$ uniformly distributed censoring times, $ \, c_i \sim U[0, 4]$; and exponentially distributed survival times $ \ y_i\vert z_i \sim Exp\left(\lambda(z_i) \right)$, with $ \lambda(z) = \exp(z_1 + 2 z_2)$. The first column in haz01.dat contains the observed times, $ t_i = \min(c_i, y_i)$, the second column is the censoring indicator, and the third and fourth columns contain the covariate values. In this particular sample, three of the observations are censored, including the largest time, $ t_{20}$. In this example, we display the confidence limits as step functions, although 1765 hazkpm provides only pointwise confidence intervals at the event points $ t_i$. Alternatively, readers may choose to draw vertical lines connecting the confidence limits $ \left( t_{i}, {\tt cil}(t_{i})\right)$ and $ \left( t_{i}, {\tt ciu}(t_{i})\right)$ to emphasize the pointwise nature of the confidence intervals.



  library("hazreg")

  dat=read("haz01.dat")  

  t = dat[,1]                         ; observed times                      

  delta = dat[,2]                     ; censoring indicator                       

  z = dat[,3:4]                       ; covariates  

  {data,ties} = hazdat(t,delta, z)    ; preparing data

  {cil,kme,ciu} = hazkpm(data)        

                             ; compute kme and confidence limits



  setsize(600,400)                    ; initiating graph    

  plot1=createdisplay(1,1)            ; initiating graph        

  n = rows(data)                      ; sample size

  pm = (#(1,n+2)'+ (0:n))|(#(2*n+2,3*n+3)'+ (0:n))

                                      ; points to be connected

  cn = matrix(2*n+2)         ; color_num, controls colors

  ar = matrix(2*n+2)                  ; art, controls line types

  th = matrix(2*n+2)         ; thick, controls line thickness 

 

  cilline = steps4plot(cil)  ; points for step function plot

  setmaskl(cilline, pm, cn, ar, th)   ; lines control

  setmaskp(cilline, 4, 0, 8)          ; points control 

 

  ciuline = steps4plot(ciu)  ; points for step function plot

  setmaskl(ciuline, pm, cn, ar, th)   ; lines control

  setmaskp(ciuline, 4, 0, 8)          ; points control

 

  kmeline = steps4plot(kme, 0~1)  

                             ; points for step function plot

  setmaskl(kmeline, pm, cn, ar, 2*th) ; lines control

  setmaskp(kmeline, 4, 0, 8)          ; points control

 

  show(plot1, 1, 1, cilline, kmeline, ciuline)  

  setgopt(plot1, 1, 1, "title","Kaplan-Meier Estimates")

  setgopt(plot1, 1, 1, "xlabel","Time")

  setgopt(plot1, 1, 1, "ylabel","Survival Function")

  setgopt(plot1, 1, 1, "ymajor",0.2) 

  print (plot1,"hazkpmtest.ps")

1769 haz04.xpl

Figure 1 displays the three estimated functions. The pointwise confidence limits are truncated to 0 or 1 when the asymptotic confidence intervals exceed these values. Each step in the Kaplan-Meier estimate corresponds to one event time. In our sample, the event times $ t_2$ and $ t_3$ are very close, and the two jumps merge into one on the plot.

Figure 1: Kaplan-Meier estimate (bold line) and pointwise confidence limits for the survival function. Estimates are based on the simulated data in haz01.dat.
\includegraphics[scale=0.6]{hazkpmtest.ps}

The Kaplan-Meier step function is plotted starting at the point $ (0, 1)$, while the step functions for the confidence limits start at the first event point, $ t_1 > 0$. This is achieved through the argument xymin in the 1777 steps4plot calls. In defining kmeline for the Kaplan-Meier step function, xymin is set to $ (0, 1)$, while this argument is omitted when defining cilline and ciuline for the confidence limits.

Method and Data Technologies   MD*TECH Method and Data Technologies
  http://www.mdtech.de  mdtech@mdtech.de