# The MVP Matrix

$\text{Model} * \text{View} * \text{Projection}$ is the first lesson to render (the so called render is the process so that people can see it on a 2D screen )objects in Computer Graphics, which transfer a 3D object in object space into, in the end, a UV plane.

The Model matrix is simple and easy to understand, simply the translation, scale and rotation, but View matrix and Camera matrix are not obvious(although you can get it for free by single call from glm::lookAt() and glm::perspective()).

#How does View Matrix work?

The view matrix has another name called extrinsic matrix in Computer Vision, people use it to find the where the camera is.

The engines don’t move the ship at all. The ship stays where it is and the engines move the universe around it.

This simply means that the view matrix does nothing but remapping everything from $(0,0,0) to the centre of the camera. By linear algebra, it is a linear transform that changes the basis. and one can use the glm::lookAt() generates the view matrix. So in the beginning, the camera sits at$(0,0,0)$, and looking at$(0,0,0)$. The normal is$(0,1,0)$, since we don’t know the direction, lets assume it is$(0,0,-1)$. And imaging the universe is a huge Cube box that surround us. • If we want to move the camera to left by$(-3, 0, 0)$, we can translate the cube by$(3,0,0)$• If we want to rotate the camera to left by 30 degree, we can rotate the cube by 30 degree to the right. So the inefficient implementation is simply just -translation * -rotation, But about the rotation part, there are simple way to do it. Called Gram-Schmidt process. The essence is, again, projection, if we want to retrieve the coordinates from one xyz coordinate system to our new coordinate system, we can simply projecting to that system by dot product to the new axies. The complete View matrix format is: $$M = \begin{bmatrix} R_x & R_y & R_z & 0 \\ U_x & U_y & U_z & 0 \\ D_x & D_y & D_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & -T_x \\ 0 & 1 & 0 & -T_y \\ 0 & 0 & 1 & -T_z \\ 0 & 0 & 0 & 1 \end{bmatrix}$$$R$,$U$,$D$is the new coordinate basis, the principle is super simple, simply by first reverse-translate the point and second projecting on the new coordinate system. In shorter form:$M = R | t$. # The persepective projection Persepective projection, on the other hand, is a way to project 3d sences to 2d plane, as the way of human eyes and camera. Which means the object further from us looks smaller than the object closer to us. It sounds nature, but how does the computer implement it? Thats where Camera matrix were introduced. ## Camera matrix To finish projecting objects to our eyes, we need to follow the formula that make futher objects smaller. Given two points$[x_1, y_1, z_1]$and$[x_2, y_2, z_2 ]$, they would project to the same position if$ x_1 / z_1 = x_2 / z_2 $and$ y_1 / z_1 = y_2 / z_2 $. The projection is to project$[x, y, z]$to$ [d\frac{x}{z}, d{y}{z}] $, the$ d \$ there is the camera plane.

Since there is now linear tranform to do that with 3d matrix, we have to use homogeneous coordinate. $$\begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & -1/d & 0 \end{bmatrix} \begin{bmatrix} x\\ y\\ z\\ 1\\ \end{bmatrix}= \begin{bmatrix} x\\ y\\ z\\ -z/d\\ \end{bmatrix}$$

And as homogeneous coordinates, we should keep scale to keep last element to 1. $$\begin{bmatrix} x\\ y\\ z\\ -z/d \end{bmatrix} \rightarrow \begin{bmatrix} -d\frac{x}{z}\\ -d\frac{y}{z}\\ -d\frac{-d}\\ 1 \end{bmatrix} \rightarrow \begin{bmatrix} -d\frac{x}{z}\\ -d\frac{y}{z}\\ \end{bmatrix}$$ We can simply replace ( 1 ) with ( -d ) in the projection matrix to reach the same goal.

$$\begin{bmatrix} -d & 0 & 0 & 0\\ 0 & -d & 0 & 0\\ 0 & 0 & -d & 0\\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x\\ y\\ z\\ 1\\ \end{bmatrix} \rightarrow \begin{bmatrix} -dx\\ -dy\\ -dz\\ z\ \end{bmatrix} \rightarrow \begin{bmatrix} -d\frac{x}{z}\\ -d\frac{y}{z}\\ \end{bmatrix}$$

Finally, the camera matrix looks like this: $$\begin{bmatrix} -fs_x & 0 & x_c\\ 0 & -fs_y & y_c\\ 0 & 0 & 1 \end{bmatrix}$$ Its little bit complex than what we have, but general idea stays the same.