In contingency table,
the data are classified according to
each of two characteristics.
The attributes on each characteristic are represented by the row
and the column categories.
We will denote by
the number of individuals with the
-th row and
-th column attributes. The contingency table
itself is the
matrix containing the elements
.
Total variation in the contingency table
is measured by departure from independence, i.e., more
precisely, by the
statistic
The
statistic which measures the departure of
independence can be rewritten as
The CA itself consists of finding the singular value
decomposition
(SVD) of the matrix
. In this way,
we obtain approximations of the matrix
by
matrices of lower rank:
The
vector
, defines the coordinates of the
rows corresponding to the
-th factor. Similarily, the
vector
defines the coordinates of columns
corresponding to the
-th factor.
A set of
coordinates for row (resp. column) items, where
is hierarchically constructed via singular value
decomposition.Thus the construction is similar to the PCA,
however with a different matrix norm in order to take
into account the specific frequency nature of the data.
For the sake of simplicity, the vector of first row coordinates is
called the first factor (as well as the vector of the first coordinates
for columns), and so on up to the -th factor.
![]() |
MD*TECH Method and Data Technologies |
http://www.mdtech.de mdtech@mdtech.de |