Lecture: 3D transforms and projection. Hierarchical Modelling

This lecture examines the mathematics of the vertex transform stage of the pipeline, and introduces some new OpenGL functions that work with transforms.

Some of this lecture will be review material if you have studied matrices in a maths course previously. I've provided links to other pages that explain the same concepts in case this is very new to you (and so a more detailed explanation may be needed).

Matrix algebra

Linear algebra

Consider the problem:

Take a vector <a, b>.  Make a new vector <2a + 3b + 1, -a - 4b - 2>.

A C function could easily be written that performs this computation, but what if you need to do the same again for different values of multipliers and additions?

This is the kind of maths needed to transform vertices in video hardware. In order to simplify working with this kind of maths, programmers use matrices to completely describe an operation such as the one above.

Here's the corresponding matrix for the problem above:

[  2  3  1 ]
[ -1 -4 -2 ]
[  0  0  1 ]

To solve for a vector <a, b>, we write the vector vertically to the right of the matrix (adding a 1 as the third element so it has the same number of rows as the matrix) and perform matrix multiplication:

[  2  3  1 ] [ a ]   [  2a +  3b +  1*1 ]
[ -1 -4 -2 ] [ b ] = [ -1a + -4b + -2*1 ]
[  0  0  1 ] [ 1 ]   [  0a +  0b +  1*1 ]

2D transformation

Only a 2x2 matrix is needed to transform a 2D vector:

[ a c ] [ x ]
[ b d ] [ y ]

However, the resulting vector must be entirely dependent on the input vector: there can be no constant offset, which is needed to unify the approach to transformations. So, usually a 3x3 matrix is used with a 2D vector (again, with a 1 filling out the extra value in the vector):

[ a d g ] [ x ]
[ b e h ] [ y ]
[ c f i ] [ 1 ]

Notice that g and h both get multiplied by 1, and so can be used as constant offsets for x and y.

3D transformation

Similarly, a 3D vector could be transformed with a 3x3 matrix, but the need for a constant offset means we use a 4x4 matrix:

[ a e i m ] [ x ]
[ b f j n ] [ y ]
[ c g k o ] [ z ]
[ d h l p ] [ 1 ]

In this case, m, n and o give the constant offset.

Homogeneous coordinates

A vector expressed with the 1 at the end is homogeneous. When the final value is not 1, the entire vector is divided by the final value so that it becomes 1. The final component in both 2D and 3D homogeneous coordinates is often denoted w:

[ x ]   [ x / w ]
[ y ]   [ y / w ]
[ z ] = [ z / w ]
[ w ]   [   1   ]

The w coordinate is used to perform perspective division required by perspective projection transforms.

Note that setting w to 0 gives a vector of infinite length. This has very practical benefits. For example, remember that a directional light in OpenGL is specified like this. This causes the translation part of the matrix to be multiplied by zero and ignored; effectively removing any paralax from the vector.

Common 3D transforms

OpenGL provides functions (that you've already used) for constructing useful transform matrices.

Identity

The identity matrix is always:

[ 1 0 0 0 ]
[ 0 1 0 0 ]
[ 0 0 1 0 ]
[ 0 0 0 1 ]

The result of multiplying this matrix by a vector is the input vector without any change. In OpenGL this matrix can be set with:

glLoadIdentity();

Translate

The translation matrix is:

[ 1 0 0 x ]
[ 0 1 0 y ]
[ 0 0 1 z ]
[ 0 0 0 1 ]

As described above, the input vector is translated by the offset amounts given in the translation matrix (if the vector's w coordinate is 1). In OpenGL a translation matrix can be constructed with:

glTranslatef(x, y, z);

Rotate

There are several useful rotation transforms that can be expressed with matrices. The simplest just rotate around the X, Y or Z axis. For example, a rotation around Z is:

[  c -s  0  0 ]
[  s  c  0  0 ]
[  0  0  1  0 ]  c = cos(angle)
[  0  0  0  1 ]  s = sin(angle)

OpenGL allows for a rotation around an arbitrary axis (see the glRotate man page for the matrix, it's pretty knarly):

glRotatef(angle, x, y, z);

Scale

This transform scales along the X, Y and Z axes:

[ x 0 0 0 ]
[ 0 y 0 0 ]
[ 0 0 z 0 ]
[ 0 0 0 1 ]

In OpenGL:

glScalef(x, y, z);

Orthographic

The orthographic projection transform is:

[ 2/(r-l) 0        0       (r+l)/(r-l) ]
[ 0       2/(t-b)  0       (t+b)/(t-b) ]
[ 0       0       -2/(f-n) (f+n)/(f-n) ]
[ 0       0        0            1      ]

for the OpenGL function:

glOrtho(l, r, b, t, n, f);

Note that the matrix is just a scale combined with a translation.

Perspective

The gluPerspective(fov, aspect, near, far) function constructs this matrix:

[ f/aspect  0             0                       0           ]
[    0      f             0                       0           ]
[    0      0   (far+near)/(near-far)   2*far*near/(near-far) ]
[    0      0             -1                      0           ]

f = atan(fovy/2)

Note the unusual (compared to the other transfoms) values in the bottom row used to affect the perspective divide w.

OpenGL also has the glFrustum function for creating perspective transforms, but this is less intuitive to use than gluPerspective.

LoadMatrix

OpenGL allows an arbitrary matrix to be loaded with glLoadMatrix. This lets you create transforms with shear and skew, or more unusual perspective projections. It can also be useful if you want to perform all your matrix operations in software.

Combining transforms

Consider a translation of a vector by <tx, ty, tz> followed by a scale of <sx, sy, sz>:

[ 1 0 0 tx ] [ x ]   [ x + tx ]
[ 0 1 0 ty ] [ y ]   [ y + ty ]
[ 0 0 1 tz ] [ z ] = [ z + tz ]
[ 0 0 0 1  ] [ 1 ]   [   1    ]

[ sx 0 0 0 ] [ x + tx ]   [ sx * (x + tx) ]
[ 0 sy 0 0 ] [ y + ty ]   [ sy * (y + ty) ]
[ 0 0 sz 0 ] [ z + tz ] = [ sz * (z + tz) ]
[ 0 0  0 1 ] [   1    ]   [      1        ]

We can perform these two transforms in just one "step" by multiplying the two transform matrices. For a review of matrix multiplication, see http://www.euclideanspace.com/maths/algebra/matrix/arithmetic/index.htm

Let's call the translation matrix T the scale matrix V, and the input vector v. Then previously we did:

S (T v) = v'

This is equivalent to:

(S T) v = v'

Written out in full:

[ sx 0 0 0 ] [ 1 0 0 tx ] [ x ]
[ 0 sy 0 0 ] [ 0 1 0 ty ] [ y ]
[ 0 0 sz 0 ] [ 0 0 1 tz ] [ z ]
[ 0 0  0 1 ] [ 0 0 0 1  ] [ 1 ]

  [ sx 0 0 sx * tx ] [ x ]
  [ 0 sy 0 sy * ty ] [ y ]
= [ 0 0 sz sz * tz ] [ z ]
  [ 0 0  0    1    ] [ 1 ]

  [ sx * (x + tx) ]
  [ sy * (y + ty) ]
= [ sz * (z + tz) ]
  [       1       ]

When you call functions like glScale, glTranslate and so on, OpenGL doesn't replace the current matrix, it postmultiplies it with your transform matrix. There is also the glMultMatrix function for postmultiplying the current matrix with an arbitrary transform matrix.

The order of transforms

In lecture 3 we saw that the order of transformations in OpenGL affects the outcome. It's easy to see why this is the case when looking at the actual transform matrices.

For example, consider reversing the order of the scale and translate from the previous example. This time we will scale the vector then translate it. Where previously we computed S T v, this time we want to find T S v:

[ 1 0 0 tx ] [ sx 0 0 1 ] [ x ]
[ 0 1 0 ty ] [ 0 sy 0 1 ] [ y ]
[ 0 0 1 tz ] [ 0 0 sz 1 ] [ z ]
[ 0 0 0  1 ] [ 0 0  0 1 ] [ 1 ]

  [ sx 0 0 tx ] [ x ]
  [ 0 sy 0 ty ] [ y ]
= [ 0 0 sz tz ] [ z ]
  [ 0 0 0   1 ] [ 1 ]

  [ sx * x + tx ]
  [ sy * y + ty ]
= [ sz * z + tz ]
  [      1      ]

The order of matrix multiplication has resulted in a different answer, in which the translation is not affected by the scale (because it was applied first).

OpenGL always postmultiplies the matrix. So T S in OpenGL is:

glTranslatef(tx, ty, tz);
glScalef(sx, sy, sz);

Whereas S T is:

glScalef(sx, sy, sz);
glTranslatef(tx, ty, tz);

OpenGL allows "getting" i.e. reading of many state machine values, including the current value of the modelview matrix. This CTM.c program shows how, and prints out the matrix value for a combined transformation example.

Hierarchical Modelling

Hierachical modelling is used to model hierarchical objects, e.g. humanoids and robots. These objects consist of (rigid) parts connected at joints. Rendering them involves traversing the hierarchy, during which transformations need to be combined, and at times isolated.

OpenGL's modelview matrix stack can be used to directly support hierarchical model rendering, by pushing and popping the matrix stack. In essence, pushing the matrix stack duplicates and the top of the matrix stack, and thus creates a transformation state which can later be restored by popping the matrix stack an appropriate number of times.

We have seen that glLoadIdentity can be used to reset the matrix to the identity matrix. Often it's useful to be able to reset the matrix not back to the identity transform, but to just to some earlier transform.

For example, consider the following sequence for drawing 4 wheels of a car:

Translate to car's position
Scale to car's scale
Translate to first wheel position
Rotate by first wheel amount
Draw wheel
Return to car's position/scale
Translate to second wheel position
Rotate by second wheel amount
Draw wheel
Return to car's position/scale
...

One way to "return to car's postion/scale" would be to glLoadIdentity and then replay all the transforms used to get that point. That would be difficult and time-consuming to code though.

OpenGL provides a stack of matrices that allow a transform matrix to be saved at any time and then restored.

glPushMatrix() creates a copy of the current transform matrix and pushes it onto the top of the matrix stack.

glPopMatrix() removes the matrix from the top of the matrix stack and uses it to replace the current transform matrix.

So, the above example would be written:

Translate to car's position
Scale to car's scale

glPushMatrix();
    Translate to first wheel position
    Rotate by first wheel amount
    Draw wheel
glPopMatrix();

glPushMatrix();
    Translate to second wheel position
    Rotate by second wheel amount
    Draw wheel
glPopMatrix();

... (draw next wheel)

If we were drawing several cars, we could surround the whole block in a pair of glPushMatrix/glPopMatrix so that each car could have its own translation and scale:

glPushMatrix();
    Translate to car's position
    Scale to car's scale

    glPushMatrix();
        Translate to first wheel position
        Rotate by first wheel amount
        Draw wheel
    glPopMatrix();

    glPushMatrix();
        Translate to second wheel position
        Rotate by second wheel amount
        Draw wheel
    glPopMatrix();
    ... (draw next wheel)
glPopMatrix();

... (draw next car)

You can nest up to 32 levels of glPushMatrix (said another way, the matrix stack has a maximum size of 32).

If you glPopMatrix when there's no matrices left in the stack, you will cause a stack underflow. This is an error and will cause your program to behave unpredictably. It's difficult to find the cause of these errors, because by default OpenGL doesn't report errors like this.

You can call glGetError() to return the error code of the last error. If the result is 0, no error has occurred since the last time glGetError was called:

int error = glGetError();
if (error)
    printf("OpenGL error %d\n");

If you do get an error, you'll need to know the meaning of the error code. You could look this up in the header file or specification, but thankfully GLU also provides a function that will return a string for a given error code:

int error = glGetError();
if (error)
    printf("OpenGL error: %s\n", gluErrorString(error));

You should insert this code at the end of your display function to ensure any errors get caught and printed as soon as they happen.

When you do get an error, it can be helpful to copy this code into earlier parts of display to narrow down exactly where the error is occurring.