5.2 Kaplan-Meier Estimates
- {cil, kme, ciu} =
hazkpm(data{, alpha})
- calculates Kaplan-Meier estimates and confidence bounds for the
survival function
|
Let
denote the distinct
times in which an event was observed,
the
number of events that occurred at time
,
and
the size of the risk set at time
.
The Kaplan-Meier estimate for a survival function, also called
product-limit estimate, is given by
![$ \hat S(t) = \left\{
\begin{array}{ccc}
1, & \textrm{if} & t < t_{(1)}, \\
\...
...frac{d_{i}}{r_{i}}\right], & \textrm{if} &
t_{(1)} \leq t.
\end{array}
\right.$](xploreapplichtmlimg679.gif) |
(5.1) |
The Kaplan-Meier estimate
is a
right-continuous step function with jumps in the event times.
Censoring times affect the estimate only by reducing the risk
set for next event, and thereby increasing the hight of the
next jump.
In the presence of censoring, Greenwood (1926)
suggested the following
estimate for the variance of the Kaplan-Meier estimate:
 |
(5.2) |
The Kaplan-Meier estimate
is asymptotically normally
distributed.
This leads to the following pointwise confidence intervals for the survival
function,
,
![$ \left[ \hat S(t) - z_{1-\alpha/2} \hat V(t)^{1/2}, \ \hat S(t) +
z_{1-\alpha/2} \hat V(t)^{1/2} \right],$](xploreapplichtmlimg682.gif) |
(5.3) |
where
is the coverage probability,
denotes the
-th
percentile of the standard normal distribution,
and
is Greenwood's estimate of the variance of
, given in formula (5.2).
Note that Greenwood's
estimate tends to slightly underestimate the true variance, so
that the true coverage probability of the confidence intervals might be
somewhat smaller than stated.
The quantlet
hazkpm
computes the Kaplan-Meier estimates and confidence bounds of the survival function
using formulae (5.1) and (5.3). It requires that the data are organized in
the specific form as provided by
hazdat. The syntax is given below:
{cil,kme,ciu} = hazkpm(data {,alpha})
Input:
- data
-
matrix, the sorted data matrix
given by the output data of
hazdat;
- alpha
- scalar, the specified error rate of the
confidence interval,
default option is
(coverage probability of
).
Output:
- cil
-
matrix, the first column
consists of the sorted
, the second column contains the Greenwood lower
confidence bounds at
, defined in (5.3);
- kme
-
matrix, the first column
consists of the sorted
, the second column contains the Kaplan-Meier estimates
at
;
- ciu
-
matrix, the first column
consists of the sorted
, the second column contains the Greenwood upper
confidence bounds at
, defined in (5.3).
By definition, the Kaplan-Meier estimate
is a right-continuous
step function. The quantlet
hazkpm
supplies the coordinates
of the upper left corners of
each step, as well as coordinates of pointwise confidence limits
for the
,
and
.
Note that the output of
hazkpm
provides one row for
each observed time
, censored or uncensored.
In the case of ties, the rows are repeated.
The quantlet
steps4plot
provides support for plotting
step functions. Given the coordinates of the upper left corners
and the leftmost starting point, quantlet
steps4plot
adds
the coordinates of the lower right corner points in the correct
order. Optionally, a right endpoint may be specified.
The output is a
matrix of point coordinates.
The step function may then be drawn into a graph by connecting consecutive
output points with line segments.
Syntax of
steps4plot:
{xyline}=steps4plot(xy {,xymin} {,xmax})
Input:
- xy
-
matrix, coordinates
of the jump points of a right-continuous
step function which jumps in
to value
.
The
(first column) are required to be sorted in ascending order.
- xymin
-
matrix,
coordinates of the leftmost starting point of the plotted
step function.
Default is the first row in xymin.
If xymin[1,1]
xy[1,1], then the leftmost starting
point is set to the first row of xy.
- xmax
- scalar,
-coordinate of the rightmost
endpoint.
Default: xmax = xy[n,1] + 0.01*(xy[n,1] - xy[1,1]),
adding 1 % of the
range to the last jump point.
If xmax
xy[n,1], then xmax is set to
xy[n,1], the last jump point.
Output:
- xyline
-
matrix,
rows are coordinates of the starting point, the lower right and
the upper left corner points, and the end point of a step function
with jumps in
to value
(given in input xy).
Connecting consecutive points with lines draws a plot of the
step function.
Example 4. We illustrate the use of
hazkpm
and
steps4plot
by plotting a Kaplan-Meier estimate
and Greenwood's confidence limits for simulated data.
The data are provided in the file haz01.dat.
They were obtained by generating
independent,
uniformly distributed
covariate values
, with
,
uniformly distributed
censoring times,
;
and exponentially distributed survival times
, with
. The first column in haz01.dat
contains the observed times,
, the second
column is the censoring indicator, and the third and fourth
columns contain the covariate values.
In this particular sample, three of the observations are censored, including
the largest time,
.
In this example,
we display the confidence limits as step functions, although
hazkpm
provides only pointwise confidence intervals at the event points
. Alternatively, readers may choose to draw
vertical lines connecting the confidence limits
and
to emphasize the pointwise nature
of the confidence intervals.
library("hazreg")
dat=read("haz01.dat")
t = dat[,1] ; observed times
delta = dat[,2] ; censoring indicator
z = dat[,3:4] ; covariates
{data,ties} = hazdat(t,delta, z) ; preparing data
{cil,kme,ciu} = hazkpm(data)
; compute kme and confidence limits
setsize(600,400) ; initiating graph
plot1=createdisplay(1,1) ; initiating graph
n = rows(data) ; sample size
pm = (#(1,n+2)'+ (0:n))|(#(2*n+2,3*n+3)'+ (0:n))
; points to be connected
cn = matrix(2*n+2) ; color_num, controls colors
ar = matrix(2*n+2) ; art, controls line types
th = matrix(2*n+2) ; thick, controls line thickness
cilline = steps4plot(cil) ; points for step function plot
setmaskl(cilline, pm, cn, ar, th) ; lines control
setmaskp(cilline, 4, 0, 8) ; points control
ciuline = steps4plot(ciu) ; points for step function plot
setmaskl(ciuline, pm, cn, ar, th) ; lines control
setmaskp(ciuline, 4, 0, 8) ; points control
kmeline = steps4plot(kme, 0~1)
; points for step function plot
setmaskl(kmeline, pm, cn, ar, 2*th) ; lines control
setmaskp(kmeline, 4, 0, 8) ; points control
show(plot1, 1, 1, cilline, kmeline, ciuline)
setgopt(plot1, 1, 1, "title","Kaplan-Meier Estimates")
setgopt(plot1, 1, 1, "xlabel","Time")
setgopt(plot1, 1, 1, "ylabel","Survival Function")
setgopt(plot1, 1, 1, "ymajor",0.2)
print (plot1,"hazkpmtest.ps")
Figure 5.1 displays the three estimated functions.
The pointwise confidence limits are truncated to 0 or 1
when the asymptotic confidence intervals exceed these values.
Each step in the Kaplan-Meier estimate corresponds to one event time.
In our sample, the event times
and
are very close, and
the two jumps merge into one on the plot.
Figure 5.1:
Kaplan-Meier estimate (bold line) and pointwise confidence
limits for the survival function.
Estimates are based on the simulated data
in haz01.dat.
|
The Kaplan-Meier step function is plotted starting at the
point
, while the step functions
for the confidence limits start at the first event point,
.
This is achieved through the argument xymin in the
steps4plot
calls.
In defining kmeline for the Kaplan-Meier step function,
xymin is set to
,
while this argument is omitted when defining cilline and
ciuline for the confidence limits.