| Group: | Cluster analysis |
| See also: | tree agglom |
| Function: | kmeans | |
| Description: |
performs cluster analysis, i.e. computes a partition of n row points into K clusters.
|
| Usage: | ckm = kmeans (x, b, it, {w, {m}}) | |
| Input: | ||
| x | n x p matrix data matrix | |
| b | n x 1 matrix: Initial partition (for example random generated numbers of clusters 1,2,...,K | |
| it | maximal number of iterations | |
| w | p x 1 matrix with the weights of column points | |
| m | n x 1 matrix of weights (masses) of row points | |
| Output: | ||
| cm.g | n x 1 matrix containing the final partition which gives a minimum sum of within cluster variances | |
| cm.c | k x p matrix of means (centroids) of the K clusters | |
| cm.v | k x p matrix of within cluster variances divided by the weight (mass) of clusters | |
| cm.s | k x 1 matrix of the weight (mass) of clusters | |
The partition b contains non-negative numbers
and for every cluster k (k=1,2,...,K) there must exist at least
one row point i (i=1,2,...,n) with b(i)=k.
; set the seed of the random generator
randomize(0)
; generate some data
x = normal(100, 4)
; generate first cluster
x1 = x - #(2,1,3,0)'
; generate second cluster
x2 = x + #(1,1,3,1)'
; generate third cluster
x3 = x + #(0,0,1,5)'
; make a data set with 3 clusters
x = x1|x2|x3
; generate a random partition with 3 clusters
b = ceil(uniform(rows(x)).*3)
; apply k-means clustering to the data
{g, c, v, s} = kmeans(x, b, 100)
; show the startpartition and the final partition
b~g
shows as result the start and the final partition of the data in 3 clusters Contents of _tmp [ 1,] 1 2 [ 2,] 3 2 [ 3,] 1 2 [ 4,] 3 2 [ 5,] 3 2 [ 6,] 2 2 [ 7,] 3 2 [ 8,] 2 2 [ 9,] 2 2 [ 10,] 3 2 [ 11,] 1 2 [ 12,] 2 2 [ 13,] 2 2 [ 14,] 3 2 [ 15,] 2 2 ... [286,] 2 1 [287,] 1 1 [288,] 1 1 [289,] 3 1 [290,] 2 1 [291,] 2 1 [292,] 3 1 [293,] 2 1 [294,] 3 1 [295,] 3 1 [296,] 3 1 [297,] 1 1 [298,] 2 1 [299,] 1 1 [300,] 2 1
| Group: | Cluster analysis |
| See also: | tree agglom |