※ In order to facilitate visualization and understanding, the field where vectors and matrices are defined is limited to real numbers.
Prerequisites
To fully understand this post, we recommend that you have knowledge of the following topics.
- Vector projection
Background Knowledge
Vector Projection
We learned about vector projection in high school mathematics.
In order for the concept of projection to be valid, we need two vectors.
The following figure shows the relationship between vector a and vector b.
Figure 1. The component of one vector in the direction of another vector can be expressed using the cosine value.
For example, if we express vector →b using vector →a as shown in Figure 1, the component of →b in the direction of →a can be expressed as follows using the angle between the two vectors:
comp→a→b=|→b|cosθMeanwhile, the dot product of two vectors can be expressed as follows:
→a⋅→b=|→a||→b|cosθWe can also see that the value of |→b|cosθ can be expressed as follows:
|→b|cosθ=→a⋅→b|→a|=comp→a→bOn the other hand, comp→a→b is a scalar value. This can be seen from the fact that both |→b| and cosθ are scalar values.
If we want to obtain the vector value of the component of →b in the direction of →a, what should we do?
We can simply multiply comp→a→b by the unit vector in the direction of →a.
If we denote the vector projection of →b onto →a as proj→a→b, then its value is as follows. proj→a→b=comp→a→b⋅→a|→a|=|→b|cosθ⋅→a|→a|
On the other hand, it can be seen from equation (3) that:
⇒proj→a→b=→a⋅→b|→a|⋅→a|→a| =→a⋅→b|→a|2→a=→a⋅→b→a⋅→a→a
Figure 2. For one of two vectors, the component in the direction of another vector can be expressed using cosine values.
Gram-Schmidt Process
Consciousness of the Goal of the Gram-Schmidt Process
The Gram-Schmidt process essentially performs the following task when given linearly independent vectors:
“Let’s appropriately transform them to an orthogonal basis.”
This can be described using the following figure:
Figure 3. The Gram-Schmidt process is a process of transforming the two given vectors in the left figure to two orthogonal vectors as shown in the right figure.
Obtaining an orthogonal vector set has many advantages, but the most important is that compared to a non-orthogonal basis, obtaining an orthogonal basis has its own benefits.
Figure 4. Left: Vector space generated using a non-orthogonal basis. Right: Vector space generated using an orthogonal basis.
Let’s say two linearly independent vectors {v1,v2} are a basis for a 2-dimensional real space. In this case, the two vectors do not need to be orthogonal to each other.
Then any vector v∈R2 in the 2-dimensional real space can be expressed as a linear combination of v1 and v2. In other words, v can be represented using v1 and v2 as follows:
v=x1v1+x2v2 where x1,x2∈RHowever, if we assume that two vectors {w1,w2} form an orthonormal basis in a 2-dimensional real space, any arbitrary vector v∈R2 can be expressed as follows.
v=(v⋅w1)w1+(v⋅w2)w2With this, we can easily obtain coefficients to represent v as a linear combination of v1 and v2.
The process of Gram-Schmidt
Figure 4. Animation representing the process of Gram-Schmidt orthonormalization
Source: Wikipedia, Gram-Schmidt process
The process of Gram-Schmidt can be expressed mathematically as follows.
Given linearly independent vectors a1,⋯,ak in a vector space V, we can construct a orthonormal basis as follows.
u1=a1 u2=a2−proju1(a2) u3=a3−proju1(a3)−proju2(a3) ⋮ uk=ak−k−1∑j=1projuj(ak)The set of vectors generated in this way, {u1,u2,⋯,uk}, is orthogonal, and we can normalize each vector by dividing it by its norm.
That is,
qi=ui||ui||defines an orthonormal basis {q1,q2,⋯,qk} for the vector space V.
Although it may look a bit complicated in mathematical notation, the essence of the equation is as follows.
Figure 5. What the equation of Gram-Schmidt process tells you
To construct vectors that are orthogonal to each other, it is necessary to subtract each component that they occupy.
Once again, if you look at the picture, it is the same as Figure 6 below.
Figure 6. Basic principle of the Gram-Schmidt process
The Gram-Schmidt process allows us to iteratively find a new orthonormal basis vector for a given set of vectors.
QR decomposition
QR decomposition is a process of decomposing a matrix using orthonormal basis vectors found by the Gram-Schmidt process.
Let the orthonormal basis vectors obtained through the Gram-Schmidt process be denoted by q1,⋯,qn, and let the matrix that collects them be denoted by Q. Then the following holds:
A=QR [|| |a1a2⋯an|| |]=[|| |q1q2⋯qn|| |][a1⋅q1a2⋅q1⋯an⋅q1a1⋅q2a2⋅q2⋯an⋅q2⋮⋮⋱⋮a1⋅qna2⋅qn⋯an⋅qn]If we consider a1⋅q2, the value is 0 since q2 has removed all components of a1 and q1.
For the same reason, if i<j, then ai⋅qj=0. This is because qj has removed all components of ai where i<j.
Therefore, the following equation holds:
=[|| |q1q2⋯qn|| |][a1⋅q1a2⋅q1⋯an⋅q10a2⋅q2⋯an⋅q2⋮⋮⋱⋮00⋯an⋅qn]This is called QR decomposition.
Example Problem
It can be difficult to understand the explanation of QR decomposition in words alone, so let’s practice the Gram-Schmidt normalization process and QR decomposition through the example below.
Problem: Perform QR decomposition of the matrix A below.
A=[110101011]Let a1, a2, and a3 be the column vectors of matrix A, as follows:
a1=[110],a2=[101],a3=[011]For convenience, we will denote each column vector of A as a1=(1,1,0), etc.
To perform QR decomposition, let’s apply the Gram-Schmidt process to the three vectors.
Let’s denote the vectors obtained by orthogonalization but not normalization as u1, u2, etc., and the normalized orthonormal vectors as e1, e2, etc.
First, consider a1:
a1=(1,1,0)According to the Gram-Schmidt process, we can use the first vector as it is.
u1=a1=(1,1,0)Now let’s calculate u2. u2 is the vector obtained by subtracting the component of a2 in the direction of u1 from a2.
u2=a2−proju1a2 =(1,0,1)−(u1⋅a2u1⋅u1)u1 =(1,0,1)−1⋅1+1⋅0+0⋅112+12+02(1,1,0) =(12,−12,1)Let’s also calculate u3. u3 is the vector obtained by subtracting the component of a3 in the direction of u1 and u2 from a3.
u3=a3−proju1a3−proju2a3 =(0,1,1)−(u1⋅a3u1⋅u1)u1−(u1⋅a3u2⋅u2)u2 =(0,1,1)−(0+1+012+12+02)(1,1,0)−((−1/2+1)1/4+1/4+1)(12,−12,1) (0,1,1)−12(1,1,0)−13(12,−12,1) =(−23,23,23)To summarize, the vectors u1, u2, u3 are as follows:
u1=(1,1,0),u2=(12,−12,1),u3=(−23,23,23)Normalizing the above three vectors, we get:
e1=u1‖u1‖=1√2(1,1,0)=(1√2,1√2,0) e2=u2‖u2‖=√23(12,−12,1)=(1√6,−1√6,2√6) e3=u3‖u3‖=1√3⋅(2/3)2(−23,23,23)=(−1√3,1√3,1√3)Therefore, we can perform QR decomposition as follows, considering e1, e2, and e3 as corresponding to q1, q2, and q3 in A=QR.
A=QR=[1/√21/√6−1/√31/√2−1/√61/√302/√61/√3][2/√21/√21/√203/√61/√6002/√3]Here, Q is an orthonormal matrix and R is an upper triangular matrix.