1. Data Structure
- {data, ties} =
hazdat(t, delta{, z})
- sorts the times t in ascending order, cosorts the censoring indicator
delta and the covariates in z, and provides tie information
- nar =
haznar(data)
- calculates the size of the risk set at each observed time point
- atrisk =
hazrisk(data, i)
- determines which observations are at risk at time

|
The quantlib hazreg
provides methods for analyzing right-censored
time-to-event data.
The observed data are triples
,
,
where
denotes the observed survival time of the
-th individual,
denotes the
-dimensional covariate vector associated with the
-th
individual, and
is the censoring indicator.
Let
denote the uncensored survival time,
and
the random censoring time.
The observed survival time of the
-th
individual is then given by
.
The censoring indicator takes the value
when
; in this case, the observed time,
, is
called event time. Otherwise,
, and the
observed time is censored,
.
We assume
that censoring is uninformative; this means, given the covariate values,
the conditional distributions of the survival time and of the censoring time
are independent.
For many computations, information on the presence and location
of ties is required.
Obviously, we could locate the ties each time that a method requires
this
information. However, in a typical session the same dataset will be
studied for various purposes. It is much more efficient to gather the tie
information once, and link it to the data set.
We address this problem by compiling most
of the necessary data information into a matrix data, which is passed on as
an argument to the various data analysis quantlets.
The quantlet
hazdat
sorts the right-censored data
,
in ascending order with respect to time
,
cosorts the censoring indicator and covariate values, evaluates
ties, and organizes the data and tie information in the matrix data.
It has the following syntax:
{data,ties} = hazdat(t, delta {,z})
Input:
- t
-
vector of survival times
,
- delta
-
vector of censoring
indicators
,
- z
-
matrix of covariate values,
with rows
; default is an empty matrix.
Output:
- data
-
matrix of cosorted
time-to-event data, with
column 1: observed times
, sorted in ascending order,
column 2: censoring indicator
, cosorted,
column 3: original observation labels (
),
cosorted,
column 4: number of tied observations in time
,
cosorted,
columns 5 through
: covariate values
, cosorted;
- ties
- scalar, indicator of ties, with
ties=1 when ties in the
are present, and
ties=0 when there are no ties.
Example 1.With this example, we illustrate the use of the quantlet
hazdat.
The censoring and the observed times are chosen to better demonstrate
the handling of ties (column 4 in data, and tie indicator ties=1).
There are no covariates.
Note that at the start of each session, the quantlib
hazreg
has to be loaded manually, with the command
library("hazreg") .
library("hazreg")
y = 2|1|3|2|4|7|1|3|2 ; uncensored event times
c = 3|1|5|6|1|6|2|4|5 ; censoring times
t = min(y~c,2) ; observed (censored) times
delta = (y<=c) ; censoring indicator
{data,ties} = hazdat(t,delta)
data
ties
The variables data and ties take the following values:
data =
1 0 5 3
1 1 7 3
1 1 2 3
2 1 4 3
2 1 9 3
2 1 1 3
3 1 8 2
3 1 3 2
6 0 6 1
ties =
1
The first column of data provides the observed times in
ascending order. Column 3 gives the original order of the sample.
The elements of Column 4 count how many observations (censored or
uncensored) are tied at the corresponding times. In our data, three
observations are tied at time points
and
, each.
Remark 1.1
Most of our hazard regression quantlets require an input variable
data, which provides the time-to-event data and tie
information in exactly the same format as the
hazdat
output
variable
data (first element in the output list).
Therefore, we recommend to run the quantlet
hazdat
at the beginning of each session, or whenever a different set of
covariates or a subset of time points is to be considered.
In order to simplify notation, we assume from now on that the observed
times are sorted,
.
For many calculations we need to know which observations are in the
risk set for any given event time.
The risk set at time
is defined as
.
It consists of all observations that did not have an event or
were censored prior to time
, and thus are still at risk
for an event.
The quantlet
hazrisk
determines the observations at risk at
a given observed time point,
. The syntax is given below:
atrisk = hazrisk(data,i)
Input:
- data
-
matrix, the sorted data matrix
given by the output data of
hazdat;
- i
- scalar, the position of
in the
ordered list
.
Output:
- atrisk
-
vector, with elements 0 or
that indicate whether observations are in the risk set at time
.
atrisk[j] = 1 when
, and atrisk[j] = 0,
otherwise.
Example 2.We illustrates the use of the quantlet
hazrisk
with the data set
of Example 1.
Note that the first 6 lines of the
XploRe
code are identical.
In line 6, we call
hazdat
to organize the observations and the tie
information into the matrix data, which is displayed as output
of
hazdat
in Example 1. In line 7, data is passed as
input argument to the quantlet
hazrisk.
library("hazreg")
y = 2|1|3|2|4|7|1|3|2 ; uncensored event times
c = 3|1|5|6|1|6|2|4|5 ; censoring times
t = min(y~c,2) ; observed (censored) times
delta = (y<=c) ; censoring indicator
{data,ties} = hazdat(t,delta) ; organize data
atrisk = hazrisk(data,6) ; risk set at observation 6
atrisk
The variable atrisk takes the value
.
In this example, the times
are tied. Therefore,
the risk set at time
includes all observations with index
.
The quantlet
haznar
returns the size of the risk set at each
observed time
. Its syntax follows below:
nar = haznar(data)
Input:
- data
-
matrix, the sorted data matrix
given by the output data of
hazdat.
Output:
- nar
-
vector, the number (of
observations) at risk at each time point.
Example 3. The use of the quantlet
haznar
is illustrated with the
same data set used in the previous two examples.
Again, the first 6 lines of code
are identical to Example 1, preparing the data. The input matrix
data is obtained as part of the output of the
hazdat
call;
data is displayed in Example 1.
library("hazreg")
y = 2|1|3|2|4|7|1|3|2 ; uncensored event times
c = 3|1|5|6|1|6|2|4|5 ; censoring times
t = min(y~c,2) ; observed (censored) times
delta = (y<=c) ; censoring indicator
{data,ties} = hazdat(t,delta)
nar = haznar(data) ; calculate the number at risk
nar
The output variable nar takes the value
.
The first three observations
are tied, and, therefore, have identical risk sets.