13.1 Introduction
In contingency table,
the data are classified according to
each of two characteristics.
The attributes on each characteristic are represented by the row
and the column categories.
We will denote by
the number of individuals with the
-th row and
-th column attributes. The contingency table
itself is the
matrix containing the elements
.
13.1.1 Singular Value Decomposition
Total variation in the contingency table
is measured by departure from independence, i.e., more
precisely, by the
statistic
where
,
,
are the observed frequencies
and
is the estimated expected value in the cell
under the assumption of independence
We define
The matrix
contains the differences between the
observed frequency and the frequency estimated under assumption of
independence.
The
statistic which measures the departure of
independence can be rewritten as
where
and
.
The CA itself consists of finding the singular value
decomposition
(SVD) of the matrix
. In this way,
we obtain approximations of the matrix
by
matrices of lower rank:
where
is the matrix of rank one closest
to
in the chi-square norm,
is the matrix of rank two closest to
in the chi-square norm and so on. The
's are the
eigenvalues of
in decreasing order and
.
13.1.2 Coordinates of Factors
The
vector
, defines the coordinates of the
rows corresponding to the
-th factor. Similarily, the
vector
defines the coordinates of columns
corresponding to the
-th factor.
A set of
coordinates for row (resp. column) items, where
is hierarchically constructed via singular value
decomposition.Thus the construction is similar to the PCA,
however with a different matrix norm in order to take
into account the specific frequency nature of the data.
For the sake of simplicity, the vector of first row coordinates is
called the first factor (as well as the vector of the first coordinates
for columns), and so on up to the
-th factor.