In XploRe, data can be stored in matrices (n x p) or arrays
(
).
Here, we will concentrate on data matrices.
Small data matrices can be created directly from the command
line or within an XploRe quantlet. Large data matrices are
typically read from data files.
The following subsections provide a short introduction on matrix and data handling. Consult the Read and Write Tutorial to learn more about loading data files into XploRe. More details on data and matrix manipulation can be found in the Matrix Handling Tutorial.
Small data matrices can be directly given at the command
line or within an XploRe program. The following XploRe codes
are all available from the quantlet desc01.xpl.
As a first example, consider the data matrix
col1=#(1,5,9,8) col1=1|5|9|8Both create the column vector
col1at the command line, which results in
Contents of col1 [1,] 1 [2,] 5 [3,] 9 [4,] 8in the output window. In the same way as for col1, we build the second and third columns:
col2=#(2.0,6.0,0.0,7.0) col3=#(3.4,7.8,1.44,10.432)and group all three vectors together by means of the
mat=col1~col2~col3When we check the contents of mat we see
Contents of mat [1,] 1 2 3.4 [2,] 5 6 7.8 [3,] 9 0 1.44 [4,] 8 7 10.432Note that we could have created mat within a single step
mat= #(1,5,9,8) ~ #(2.0,6.0,0.0,7.0) ~ #(3.4,7.8,1.44,10.432)Let us also remark that XploRe does not distinguish between integer and float values. Therefore, the first two columns of the matrix mat appear in the same format.
It is also possible to create text matrices. For example
textmat= #("aa","c") ~ #("b","d2")creates the text matrix
Large data sets are usually stored in data files. XploRe can read data from ASCII files, consisting of both numeric and text data. In the following we will use two data sets: cps85 and uscomp2 (see Data Sets).
The file cps85.dat consists of a subsample of the 1985 U.S. current population survey. The file contains only numeric data. We will assign columns 1 (years of education), 2 (=1 if living in south), 5 (=1 if female) 8 (years of labor market experience), 10 (=1 if working on a union job), 11 (natural logarithm of average hourly earnings) and 12 (age in years) to the XploRe variable earn:
earn=read("cps85") earn=earn[,1|2|5|8|10|11|12]
uscomp=readm("uscomp2") branch=uscomp.text[,2] salpro=uscomp.double[,2|4]
The first step in data analysis is to find out information
on the dimension of the data. This can be done generally by
using the function
dim. We apply this function now
to the data matrices mat, earn, branch, and
salpro that we specified in Subsections 1.1
and 1.2. The codes for this section are available
from the quantlet
desc02.xpl.
dim(mat) dim(earn) dim(branch) dim(salpro)yields
Contents of dim [1,] 4 [2,] 3 Contents of dim [1,] 534 [2,] 7 Contents of dim [1,] 79 Contents of dim [1,] 79 [2,] 2and tells us that mat is a 4 x 3 matrix, earn is 534 x 7, branch is a 79 x 1 vector and salpro is 79 x 2. If we are just interested in the number of rows or columns, we can use the commands
rows(earn) cols(earn)gives
Contents of rows [1,] 534 Contents of cols [1,] 7
To extract elements or submatrices of a matrix, we can use the subarray operator []. The following three lines extract the first row, the second column and (4,3)-element (fourth row, third column), for example:
mat[1,] mat[,2] mat[4,3]This operator can also be used for extracting several rows and columns at once. The statement mat[1:3,1|3] extracts the elements which are in the 1st to 3rd rows of mat and in the 1st and 3rd columns. The operator : is used to specify a range of subsequent integers.
![]() |
MD*TECH Method and Data Technologies |
http://www.mdtech.de mdtech@mdtech.de |