| Library: | xplore |
| See also: | sort cumsum paf diff |
| Quantlet: | discrete | |
| Description: | Reduces a matrix to its distinct rows and gives the number of replications of each row in the original dataset. An optional second matrix y can be given, the rows of y are summed up accordingly. |
| Usage: | {xr,yr} = discrete(x{,y}) | |
| Input: | ||
| x | n x p matrix, the data matrix to reduce, in regression usually the design matrix. The matrix may be numeric or string, in the latter case no y is possible. | |
| y | optional, n x q matrix, in regression usually the observations of the dependent variable. Not possible for string matrix x. | |
| Output: | ||
| xr | m x p matrix, reduced data matrix (sorted). | |
| yr | m x 1 vector or m x (q+1) matrix, contains in the first column the number of replications. If y was given, sums of y-rows with same x-row are contained in the other q columns of r. | |
library("xplore")
n=100
b=1|2
x=ceil(normal(n,rows(b)))
y=x*b + normal(n)
; --------------------------------------
; data reduction
; --------------------------------------
{xr,yr}=discrete(x,y)
r =yr[,1]
yr=yr[,2]
rows(r)
; --------------------------------------
; descriptive statistics of x
; --------------------------------------
meanxr = sum(r.*xr)/sum(r)
varxr = sum(r.*(xr-meanxr)^2)/(sum(r)-1)
mean(x)'~meanxr'
var(x)'~varxr'
; --------------------------------------
; linear regression
; --------------------------------------
b=inv(x'*x)*x'*y
br=inv(xr'*diag(r)*xr)*xr'*yr
b~br
Matrices x, y with 100 rows are reduced to a matrix xr (containing distinct rows of x) and yr (sums of y with same rows in x). r gives the number of replications. The mean and variance of x coincide with the weighted mean and variance of xr. The linear regression of y on x coincides with the weighted regression of yr on xr.
| Library: | xplore |
| See also: | sort cumsum paf diff |