Prerequisites
Newton's binomium refers to the identity
(a + b)n = an +
binom(n,1)an - 1b +
binom(n,2)an - 2b2 +
binom(n,3)an - 3b3 +
· · · +
binom(n,n -1)a1bn - 1
+ bn.
Here, binom(n,k) stands for the binomial coefficient
(nk) =
n!/(k! (n - k)!)
(`n choose k'), and
n is a positive integer.
The identity holds for complex numbers a and b, but is
also valid for more general arithmetic systems as Algebra Interactive illustrates.
Special cases occur for small values of the exponent:
(a + b)2 = a2 +
2ab + b2 and
(a + b)3 = a3 +
3a2b + 3ab2 + b3.
Here are some more useful identities valid for complex numbers.
Given an equivalence relation
R on a set
A, the set
A
can be partitioned into subsets - called
equivalence classes - as follows. The elements
a and
b belong to the same equivalence class if and only if (
a,
b)
R. The equivalence class containing
an element
a consists precisely of all elements of
A
equivalent with
a.
An
equivalence relation is a relation
R on a set
A
which is reflexive, symmetric and
transitive. If (
a,
b)
R, then
we say that
a and
b are equivalent (with respect to
R).
A relation
R on a set
X is called
reflexive if
(
x,
x)
R for all
x
X.
A relation
R on a set
X is called
symmetric if
(
x,
y)
R implies
(
y,
x)
R.
A relation
R on a set
X is called
transitive if
(
x,
y)
R and
(
y,
z)
R imply
(
x,
z)
R.
A relation on a set V is a subset of V × V.
To indicate that
the pair (a, b) belongs to this subset, it is common
practice to
write that a is related to b, or to replace
the words `is related to' by some symbol like ~ or by an abbrevation of the
relation being discussed.
A map
f :
A ->
B is
bijective if it is injective
and surjective. This is equivalent to saying that
f has an inverse,
i.e., there exists a map
g say, such that
f(g(b)) = b for all b
B and g(f(a)) = a for all a
A.
If f is a bijection, then its inverse is unique and often denoted
by f-1.
If
f :
A ->
B is a map with codomain
B and
g :
B ->
C is a map with domain
B, then
the
composition of
f and
g is the map
gf : A -> C given by (gf((a) =
g(f(a)).
The composition gf is also denoted by
g o f.
If n is a positive integer and f : A -> A is a map
whose domain and codomain coincide, then fn : A -> A is the map obtained by composing f exactly n times with itself.
By definition, f0 denotes the identity map.
For nonnegative integers m and n the equalities
fm + n = fmfn and
(fm)n = fmn
hold.
If f : A -> A is a bijection with inverse f-1, and if n is a positive integer, then
f-n is defined as (f-1)n. For a bijection f : A
-> A the rules
fm + n = fmfn and
(fm)n = fmn
hold for all integers m and n.
A function
f :
A ->
R (where in calculus
A is
usually some open subset of
R) is
differentiable
at a point
a
A if the limit
limh -> 0 {f(a + h) - f(a)}/h
exists. In this case the limit is called the derivative of f at
a and is usually denoted by f'(a).
A function f : A -> R is differentiable if
it is differentiable at every point of A. In that case the
derivative f'.
A map f : A -> B is often referred to as a function
if B is a set of numbers, for instance the reals. The distinction
between maps and functions is not very strict. The term function is more
common in calculus.
Given a map
f :
A ->
B and a subset
C of
B,
the
image of
C under
f is the following subset of
B:
f(C) = {f(a) | a
C}.
The image of an element a
A is the
element f(a).
A map
f :
A ->
B is called
injective if the following
holds for all
x and
y in
A:
If x
y then
f(x)
f(y).
In words: Distinct elements of A are mapped to distinct elements
of B.
An equivalent formulation: For all x and y in A,
the equality f(x) = f(y) implies
x = y.
If the
f :
A ->
B is bijective then there exists
a unique map
g :
B ->
A such that
f(g(b)) = b for all b
B and g(f(a)) = a for all a
A.
The map g is the inverse of f and is usually denoted by
f-1.
A
map f :
A ->
B (`the map
f from
A to
B') assigns to each
element
a of
A a unique element
f(
a) of
B.
The element
f(
a) is the image of
a under
f.
We often simply write f instead of f : A -> B.
The set A is the domain of the map, the set B the
codomain.
Given a map
f :
A ->
B and a subset
C of
B,
the
preimage of
C under the map
f is the following subset
of
A:
f-1(C) = {a
A | f(a)
B}.
Note that the notation f-1 does not refer to the inverse
of f (f need not even have an inverse). The reason for the common notation for two different notions is that they are closely related for
bijections. If f is a bijection,
with inverse denoted by g rather than by f-1 for
the sake of clarity, then for b
B
we have
{g(b)} = f-1({b}).
In fact, for every subset C of B the image g(C)
equals the preimage f-1({C}).
The preimage of an element b
B is the
preimage of {b}. By abuse of notation this preimage is also
often written as f-1(b) rather than
f-1({b}).
A map
f :
A ->
B is called
surjective (or
onto) if for
every
b
B there exists an element
a
A such that
f(
a) =
b.
Equivalently, in terms of the image of the map, f is surjective if
and only if f(A) = B.
A cycle in a graph is a path whose begin and end point coincide.
A graph is called connected if any two vertices are connected by a path.
A
graph (without loops) is a set
X together with a collection
E of subsets
of size 2. Members of
X are called
vertices, members of
E are called
edges.
Graphs are drawn by representing the vertices by points (or small circles)
and edges by lines (or curves) connecting the two vertices of which it consists.
A path in a graph is a sequence of edges such that two consecutive
edges have exactly one vertex in common; moreover, the first edge has a vertex (not appearing in the second edge) called begin point, and the last edge has
a vertex (not appearing in the one but last edge;
if there is just one edge: the non-begin point) called end point.
One can think of a path as a walk from begin to end point along edges.
A
tree is a connected graph without cycles. A
rooted tree is a tree
with a distinguished vertex, the
root.
If a and b are vertices of a tree, then there is a unique
path from a to b: Since the tree is connected,
there is a path from a to b; since there are no cycles, the path
is unique.
Induction (or mathematical induction) can often be used to prove a property
P(
n) that
is to hold for every natural number
n. A proof by induction consists
of two steps.
-
A proof of P(0).
-
A proof that, for all n, if P(n) holds the
property P(n + 1) also holds.
Together these two steps provides a proof that the property
P(
n) holds for all natural numbers.
There are several variations of `proof by induction'. For instance, if
a property Q(n) is to hold for n = 1, 2, ...,
then the first step of the proof by induction consists of course of proving
Q(1). Another variation is obtained by replacing the second
step above by a proof that for all n the property P(n + 1) is implied by P(0), P(1), ..., P(n).
The
characteristic polynomial of a square matrix
A is
the following polynomial in the indeterminate
X:
det(XI - A).
Here, I is the identity matrix and det stands for
determinant. If A is an n by n matrix then
the characteristic polynomial has degree n.
The
determinant is a number computed from a square matrix. If
A is an
n by
n matrix with entries
Ai,j, then
the determinant det(
A) of
A is defined by the expression
det(A) =
g sgn(g) A1,g(1) A2,g(2)
··· An,g(n),
where g runs through all the permutations of the set {1, 2, ..., n}, and where sgn denotes the sign of a permutation. The notion of a permutation is explained in
Chapter 5, as well as
the notion sign.
The computation of determinants is facilitated by a range of practical methods
that are derived from the definition.
In fact, the reader may well have computed
determinants without having been aware of the formal definition, since
the definition involves permutations, a notion that is not always discussed
in elementary linear algebra.
For a 2 by 2 matrix [[a, b], [c, d]], the determinant is ad - bc.
An important property of the determinant is its multiplicativity. If A
and B are n by n matrices, then
det(AB) = det(A) · det(B).
A square matrix
A is called
invertible if there exists a matrix
B
such that
AB =
BA =
I, where
I
is the identity matrix of the same size. If the square matrix
A
is invertible, then the matrix
B such that
AB =
BA =
I is unique and has the same size as
A. It is called the
inverse
of
A and is usually denoted by
A-1.
A useful criterion for invertibility is that A is invertible if
and only if det(A)
0,
where det stands for the determinant.
The diagonal of a square matrix A is the set of entries
Ai, i. A diagonal matrix is a
square matrix all of whose entries off the diagonal are 0.
The identity matrix of size n by n is the
n by n matrix whose off-diagonal entries are all 0 and
whose diagonal entries are all 1. The matrix is often denoted by
In or simply by I.
A
matrix is a rectangular array of numbers. Each entry (i.e., element
occurring in the matrix) of the matrix
belongs to a (unique)
row and
column. If
A is a matrix,
then the entry in the
i-th row and
j-th column is
usually denoted by
Ai,j, or by
another symbol with subindices
i and
j. If
A
has
m rows and
n columns, then
A is said to be
an
m by
n matrix, or to have size
m by
n.
For typographical reasons we sometimes write a matrix as a `list of lists'.
For example,
[[1, 2, 3],[4, 5, 6]]
is a 2 by 3 matrix with rows [1, 2, 3]
and [4, 5, 6].
The set of m by n matrices whose entries are in the
set Q, R, etc., is often denoted by
Mm, n(Q), etc. If m = n,
the notation Mm(Q), etc., is used.
If
A is an
m by
n matrix and
B is a
n by
p matrix, then the
product AB is the
m by
p matrix
C whose
r,
s-th entry
is
Ar,1B1,s +
Ar,2B2,s +
··· +
Ar,nBn,s,
where A has entries Ai,j (for
i ranging from 1 to m and j ranging from 1 to n)
and B has entries Bi,j (for
i ranging from 1 to n and j ranging from 1 to p).
So, the entry Cr,s is computed from
the r-th row of A and the s-th column of B.
Note that the product AB is only defined if
the number of columns of A equals the number of rows of B.
In particular, if AB exists then the product of B
and A (in that order) need not exist. Moreover, if both AB
and BA exist, they need not be equal.
A
minimal polynomial of a square matrix
A is a nonzero polynomial
f of minimal degree such that
f(
A) = 0 (the zero matrix). Here,
f(
A), i.e., substitution of
A in the polynomial
f, is to be interpreted as follows. If
f = a0 + a1X +
··· +
anXn,
then
f(A) = a0I + a1A +
··· + anAn
(again a square matrix of the same size as A),
where I is the identity matrix.
The Cayley-Hamilton theorem from linear algebra states that the characteristic
polynomial fA of the square matrix A
has the property fA(A) = 0. Using division
with remainder as explained in Chapter 3, it follows that minimal polynomials
exist and that they divide the characteristic polynomial.
Usually, one speaks of the minimal polynomial if the leading coefficient is 1.
A square matrix is a matrix of size n by n, i.e., a matrix
where the number of rows is equal to the number of columns.
If A is an m by n matrix, then the transpose
AT of A is the n by m matrix
whose r,s-th entry is the s,r-th entry of
A.
N stands for the set of natural numbers, i.e., the set {0, 1, 2, 3, 4, ...}.
Z stands for the set of integers, i.e., the set {..., -3, -2, -1, 0, 1, 2, 3, 4, ...}.
Q stands for the set of rational numbers, i.e., the set of numbers of the
form a/b, with a and b integers and with
nonzero b.
R stands for the set of real numbers.
C stands for the set of complex numbers. Complex numbers
are of the form a + bi, where a and b
are real numbers and where i2 = -1.
Given the quadratic expression
x2 +
bx +
c, with
b and
c complex numbers say,
the term
completing the square refers to rewriting the expression
as
(x + b/2)2 -b2/4 + c.
The terms
x2 and
bx have been absorbed
in a square.
The
Cartesian product of two sets
A and
B is
the set
A ×
B consisting precisely of all the ordered pairs (
a,
b)
with
a
A and
b
B. The Cartesian product
A1 ×
A2 × ··· ×
An
of the sets
A1,
A2, ...,
An
is defined analogously. It consists of the ordered
n-tuples
(
a1,
a2, ...,
an)
of elements
a1
A1,
...,
an
An.
The
complement A\
B of the sets
A and
B is the
set
A\B = { x | x belongs to A but not to B}.
The
intersection of two sets
V and
W is defined as
V
W = { x | x
V and x
W}.
Similarly, the intersection of more than two sets is defined.
A
set is just a collection of things, usually called
elements.
If
a is an element of the set
A, we denote this by
a
A. The empty set is denoted by
ø.
A set
A is a
subset of the set
B if every element
of
A is also an element of
B. If
A is a subset
of
B this is denoted by
A
B.
If A is a subset of B, we also sometimes say that
B is superset of A.
Notation:
B
A.
The
union of two sets
V and
W is defined as
V
W = { x | x
V or x
W}.
Similarly, the union of more than two sets is defined.
If there exists a finite set of vectors that span a vector space
V,
then
V is said to be
finite-dimensional. The vector spaces
we consider are finite-dimensional. A subset
v1, ...,
vn is a
basis of
V if every vector
in
V can be expressed in a unique way as a linear combination
of these vectors.
An equivalent way of saying that v1, ...,
vn is a basis is that
-
v1, ..., vn span V,
-
the only solution to the equation
a1 · v1 +
a2 · v2 +
··· +
an · vn = 0
in the unknowns a1, ..., an
is a1 = ··· =
an = 0. (The vectors v1, ..., vn are independent.)
A
linear combination of the vectors
v1, ...,
vm is a vector that can be written in the form
a1 · v1 +
a2 · v2 +
··· +
am · vm,
for some scalars a1, ..., am.
The dimension of a finite-dimensional vector space is the
number of elements in a basis of the vector space. This number is
independent of the basis chosen.
If
V is a vector space and
U and
W are subspaces of
V, then
V is the
direct sum of
U and
W
if
every vector
v in
V can be written in a unique way as
v =
u +
w for some
u in
U and some
w in
W. If
V is the direct sum of
U and
V
then this is often denoted by
V =
U
W.
Let
V and
W be vector spaces over the field
K.
A
linear map (or linear transformation)
f :
V ->
W is a map
satisfying
f(a1v1 +
a2v2) =
a1f(v1) +
a2f(v2)
for all vectors v1, v2 in V
and for all scalars a1, a2 in K.
The kernel of the linear map f is the linear subspace
(of V)
{v
V | f(v) = 0}.
The image of f is the linear subspace f(V)
of W.
Suppose f : V -> W is a linear map between finite-dimensional vector spaces, and v1, ...,
vn is a basis of V and
w1, ...,
wm is a basis of W, then the matrix of
the linear map f with respect to these bases is the m by n matrix
whose k-th column consists of the coordinates of
f(vk) with respect to the basis of W.
Given a subset of a vector space V, its span is the set of all linear
combinations of vectors from the subset. The span is a subspace
of V, i.e., itself a vector space with the operations
from V.
A
K-
vector space consists of a set
V with two operations:
Addition (+) and scalar multiplication (* or · or no symbol at all)
with scalars from a field
K.
(The notion field is discussed extensively in chapter 7, but for now
think of the rational numbers
Q, the real numbers
R or
the complex numbers
C.) The elements of the set
V are
called
vectors. The two operations are required to satisfy
the following axioms (for all vectors in
V and all scalars; vectors
are denoted in boldface).
-
v + w = w + v (commutativity);
-
(v + w) + u = v + (w + u)
(associativity);
-
There exists a (unique) zero vector 0 such that
v + 0 = v;
-
For every vector v there exists a (unique) vector -v (the
inverse of v)
such that v + -v = 0;
-
a · (v + w) = a · v +
a · w (distributivity);
-
(a + b) · v = a · v +
b · v (distributivity);
-
a(b · v) = (ab) · v;
-
1 · v = v.
These axioms formalize that computations in a vector space behave as expected.
Examples of vector spaces: