Why does opengl use 4d matrices for everything?

Question

What I managed to figure out is that the first 3*3 is used for rotation and scale and the 1*3 at the end is used for position, but what is the bottom row used for? 
is it only for clipping related things?

pmw1234 · Answer

So,
A lot of folks would tell you that a 4x4 matrix is used so you can get a translation component rolled into your linear transformation. Meaning that with a 3x3 matrix you can only compute rotations, scales, and shears. (and some would argue that a shear is really another type of rotation) This same person might also go into great detail about affine transformations and homogeneous coordinate systems. And they would be right.
But that's not the reason.
All this might leave the astute observer thinking, well wouldn't a 3x3 matrix get the transformation out of the way then just use vector math to add our translation on at the end saving lots of special math, hardware, and precious resources?
Well this is perfectly plausible, and a little extra thinking would leave use with the idea of adding a 4th column/row to our matrix giving a 3x4 matrix. Now, a translation is suddenly perfectly plausible and would create a pretty world where you can do all the operations needed to move and deform objects all day long saving lots of math and resource. And you would be right. You can in fact
do this very thing when computing your Model and View matrices. Handling vectors would require a little extra trickery but we could manage it.
So, now, on this theoretical video card, since all this has to be implemented in hardware, we have gotten or world models/meshs/blobs whatnots and whathaveyous all into place and are ready to project them from our 3D world onto our 2D screen. And we think: this too isn't a problem since a projection is really just a non-linear scaling operation that squeezes things around so the stuff we want to see falls inside our clipping volume.
A little bit of algebra later...
We have mountains of triangles all of which are ready to have their attributes computed, and interpolations...well interpolated. We then use standard interpolation, check our work very carefully and what we get on our video monitor looks like some kind of star trek space anomaly got hold of it.
This is where the problem comes up: It turns out that you can't just interpolate values directly after performing a perspective projection. Instead you must use a special form of interpolation called...wait for it...Perspective Correct Interpolation. And the solution to perspective correct interpolation leads use to the correct form of the projection matrix which has 4 rows and 4 columns. That matrix after being applied to a 4 component position vector results in a nonzero value in the w component of the vector. This value will be specific to each triangle depending on how it is oriented. And, at long last, the video card hardware can perform interpolation on all the values for our triangles, including colors, texture coordinates, normal vectors, ect. Then when that it is all done, hardware can perform the "perspective divide" resulting in correctly interpolated values for the triangles, along with correctly interpolated attributes.
And that is the real reason everyone everywhere uses 4x4 matrices in hardware and why they are littered throughout API's like OpenGL. Hardware vendors could probably hack their way around some of the math related to translations and transformations but when it comes to projection, you really can't get away from the 4x4 matrix. (but you can sure optimize the heck out of it)
The math for deriving the correct form of a projection isn't to terribly complicated, but the full explanation is fairly long so I will defer to a good book for that, and the best explanation I have seen for deriving a projection matrix that will allow perspective correct interpolation is chapter 6, section 6.2 of the Lengyel book, "Foundations of Game Engine Development".

joojaa · Answer

If we put homogeneous ordinates to the backburner for a second. Then there is also a second reason:

Mathematical completeness. A 3 by 4 or 4 by 3 matrix is not invertible (although that does not mean you couldn't calculate a inverse by other means just not by standard matrix algebra). Inverse is critical for hierarchy calculations.

Now you could posit that the  last computer part is always same and be done with it. But having worked for a company that in their main software did just this i would say that it is a mistake. Well, sure now the matrices are invertible but not general. I can now not safely transpose my matrices for example. The truth is that i lose out on a lot of possible mathematics knowledge that i could use because this one part is missing. At the end this was a common source of errors, anger and confusion among clients.

Homogeneous coordinates are just one possible application of this mathematical completeness. It turns out to be very useful if you want to model perspective. But its far from the only benefit.

user1118321 · Answer

The bottom row allows you to create perspective foreshortening. That is, it makes lines that are getting further away appear to converge. When arranged this way, we call this a perspective projection matrix.

There are other ways to arrange a projection matrix where that foreshortening doesn't happen. For example in an orthographic projection. This graphic shows some different projections that are possible by  changing the values in the last row of a 4x4 matrix.

Mathematically, the difference between a perspective projection and an orthographic projection is that the last row of a perspective matrix has a division in the z component, whereas the orthographic projection has a 0 in the z component.

Why does opengl use 4d matrices for everything?

3 Answers

Add your own answers!

Ask a Question