The Developer’s Cry

Yet another blog by a hobbyist programmer

Setting up the OpenGL Projection Matrix

Last time we saw that modern OpenGL requires you to do matrix math. Moving, rotating, or scaling an object in 3D space are all done by matrix multiplication. The camera object works in exactly the same way, but to get anything displayed on-screen we first need to properly initialize the projection matrix. Next we can setup the camera (view matrix), and finally transform our objects (model matrix). As we saw last time, this all comes together in the vertex shader as:

gl_Position = projection * view * model * input;

First off, what is a projection? A projection distorts a view so that it will fit onto the screen. For example, think of a movie projector: a light source illuminates a single frame of film, and a lens spreads the beam such that the image appears on a much larger movie screen. In other words, the projection transforms the frame to fit the screen. The frame coordinates are transformed to screen coordinates, but OpenGL does not directly work with screen coordinates; it uses normalized device coordinates. Normalized device coordinates run from -1 to 1, regardless of both screen resolution and aspect ratio. In a sense, the projection will contract the image, and OpenGL will stretch it again over the viewport so that the final result is a normal image, displayed on the screen.

So to render an image, we must map our modelview space to normalized device coordinates. Let’s first examine how that works for an orthographic projection. An orthographic projection ignores perspective, and is used for 2D and isometric projections.

Consider any point (x, y, z) in our viewing volume. Naturally x lies between left and right, and y lies between bottom and top, so we can write:

  left <= x <= right
bottom <= y <= top
  near <= z <= far

Let’s start with the first proposition. We can rearrange it so that x will fall between -1 and 1.

 0 <= x - l <= r - l
 0 <= (x - l) / (r - l) <= 1
 0 <= 2 ((x - l) / (r - l)) <= 2
-1 <= 2 ((x - l) / (r - l)) - 1 <= 1

The middle part rewrites to:

2 x / (r - l) - ((r + l) / (r - l))

In other words, the x scaling factor is 2 / (r - l), and the x offset is given by - ((r + l) / (r - l)). You can write this in a transformation matrix (column major notation):

w = r - l

2.0 / w    0    0    -(r + l) / w
0          1    0     0
0          0    1     0
0          0    0     1

If you write this out for all three x, y, and z, you get the full orthographic projection matrix (in column major notation):

w = r - l
h = t - b
d = f - n

2.0 / w    0          0          -(r + l) / w
0          2.0 / h    0          -(t + b) / h
0          0         -2.0 / d    -(f + n) / d
0          0          0          1

For 2D I like having a math-like coordinate system where the origin lies at the bottom left, positive Y-axis pointing up, meaning that left and bottom will be zero. Many old-time programmers are used to having the origin at the top left, positive Y-axis pointing down, as this corresponded with how display framebuffer memory was laid out. This was in the days before OpenGL however, and back then computer graphics was more about plotting pixels than transforming vertices.

Things get a bit more complicated for 3D perspective projections, but the principle is the same. The projection acts like a virtual lens, distorting an image. A perspective projection creates the illusion of depth by showing distant objects smaller than objects that are close. This illusion is created by the viewing frustum, a trapezoid-like shape that distorts space. Let me be clear about this, it actually pushes objects out of shape to create the illusion of depth on a flat display screen.

The well-known (but deprecated) gluPerspective() function takes a vertical viewing angle fovy and uses simple trigonometry to find the values for left, right, bottom, and top:

radians = deg2rad(fovy) : fovy * Pi / 180.0
p = tan(radians * 0.5) * near
l = -p * aspect_ratio
r = p * aspect_ratio
b = -p
t = p

Note how this conveniently puts the origin of the coordinate system in the center of the XY-plane of the viewing volume. These values we will plug into our own version of glFrustum() to construct the perspective matrix, as follows.

Perspective distortion is created by dividing by the z coordinate; a larger z coordinate results in smaller objects in the distance. This crunches objects, creating the illusion of depth. When you work out the math, this results in the perspective projection matrix (column major notation):

w = r - l
h = t - b
d = f - n

2.0 * n / w    0              (r + l) / w    0
0              2.0 * n / h    (t + b) / h    0
0              0             -(f + n) / d   -2.0 * f * n / d
0              0             -1              0

Once upon a time there was a convenient gluLookAt() function for setting up the camera. You can aim the camera at a center position, and you must pass an up vector: is the camera being held up straight, or is it rolling on its side.

To get the camera matrix, we will first calculate three vectors:

vec3 f = normalize(center - eye)
vec3 s = normalize(cross(f, up))
vec3 u = cross(s, f)

Remember that the cross product is a vector that is perpendicular to two given vectors. So what we did here is defining the axis of the (possibly rotated) camera space.

This results in the following view matrix. Remember that the dot product is the cosine of the angle between the two given vectors; the adjacent divided by the hypotenuse. The dot products in this matrix reflect the position of the camera, while the other numbers represent its orientation.

 s0    s1    s2   -dot(s, eye)
 u0    u1    u2   -dot(u, eye)
-f0   -f1   -f2    dot(f, eye)
 0     0     0     1

I have tried to explain how you get the projection and view matrices. Of course, you can ignore all this and just use the glm library; that’s what it’s for. However, it’s good to know how it actually works. My math skills aren’t as strong as used to be, but in principle all of this stuff can be done by applying high school level math. It certainly took me a while to get all the pluses and minuses correct. What can be confusing is that OpenGL expects a (regular) right-handed coordinate system for world space, but the projection transforms it to a left-handed system by flipping the z coordinate. Therefore it is said that OpenGL uses a left-handed system internally. Anyway, more thorough derivations can be found at SongHo and Scratchapixel, both excellent sites that teach professional-grade computer graphics. It’s complex matter though and I would rate them at university level. If you want to learn more, those are the places to look.