Rasterization in One Weekend

Welcome to mini-series of Rasterization in One Weekend!

Having lately gotten tired of (well, not really) chasing instructions and signals inside GPUs, I set out to do a small project. As I recently got myself into making another SW-based rasterizer that utilizes modern techniques in an efficient manner and not one that pretends it’s ’65, I wanted to add some basic and beginner-friendly code/information/documents here that one can use to make their own very first rasterizer and maybe realize along the way that fundamentals of the math behind very powerful machines can be understood without too much difficulty after all.

Here, we’ll be closely following a method called “Triangle Scan Conversion using 2D Homogeneous Coordinates” (from one of my all-times favorite papers). Note that we strictly restrict ourselves to rasterization of triangles; different algorithms can be used for different primitive types such as points, lines, etc.

Remember also that, although what lies below can provide good hints on the way GPUs actually work, it’s all immensely simplified. The path from ‘1 static object with 12 triangles‘ to  ‘100,000 animated objects with 10 million triangles‘ on-screen is a long and tedious one. And unlike its rival, rasterization is not kind of algorithm whose parallelism would embarrass you. That’s why actual implementations will employ many, many smart, difficult and expensive to implement algorithms that have taken very smart people many years of R&D, sweat and tear to develop, in order to provide the required rendering capabilities. With that said, here I hope that at least I’ll have helped you scratch the surface a little bit by the end! So dive in:

All code can be found here.


Why Clip Transformed Vertices in Clip-Space?

As a noob in rendering pipelines and computer graphics in general, something that was very puzzling to me and I’d just accepted as-is without truly grasping was that in a 3D rendering pipeline, the operation of clipping transformed vertices takes place in another space, oddly named “clip-space” before perspective division is done. Why the hell?!

As you may know, GPUs invoke a user-defined program, called vertex shader for each vertex of a given primitive so that the primitive will be transformed from whatever space they were defined or modeled in to the clip-space. The result of this invocation will be a 4-elemetn vector in homogeneous coordinates that is used during the clip test such that only those primitives (more like their vertices but anyways) that survive the clip test are passed down the pipeline and the rest are clipped, i.e. removed from the pipeline. The reason GPUs do clipping is not only to increase performance and b/w by clipping primitives that’d have been otherwise a waste of compute because they would be out of viewport but also to guarantee that rasterizers, poor guys having to deal with lots of corner cases already, work within a well-defined distance away from eye/camera/viewpoint via the inclusion of viewport and not deal with arbitrarily positioned vertices (think of numerical precision). That is the clipping operation in a nutshell, however it might not be immediately obvious to you, as it was not to me why we invent yet another confusing space only to apply clipping.

If you think about it, there are 3 candidate locations in the whole 3D pipeline where we could handle the clipping:

  1. Before perspective/orthographic/what-have-you projection using the planes of view frustumfrust
  2. In 4D clip-space before perspective division4d.PNG
  3. In 3D space after perspective division, i.e. working in NDCgl_projectionmatrix01.png

The first doesn’t seem so suitable to me as the calculations would be tied to the way camera (Is it perspective? Is it orthographic?) in a 3D application will be modeled, how about you?

The second, hmm let’s set it aside for a second.

The third looks like a good candidate; we are already done with projecting vertices and dividing by w to give the good old feeling of three-dimensional perspective and able to work within view-volume cuboid handling all degenerate cases like NaNs/Infs nice and easy. Let’s see what happens to the vertices behind the eye/camera/viewpoint:


What is going on in this picture is, the point behind origin, Q2 with a negative w value, when simply transformed via vertex shader to projection-space first and then perspective-divided, projects to a point in front of viewer as if it was visible, which is wrong. What we want is: determine such points with negative w values before applying perspective-division because unless we do something to restore the fact that this point had w=w<0 value and was behind camera, we will lose this information as after perspective division, it’ll have been projected to a point in front of the eye.

Besides the fact that it simplifies the math to do inside/outside tests for clip-planes a lot to apply clipping in 4D clip-space, it also helps us clip primitives behind the camera rather easily. And if you think that it’s rather a rare case when vertices happen to be behind a camera in a 3D application, think again!

Now, even though what’s actually done in a GPU HW can be waaaay different than what’s outlined here in terms of actual computations or implementations, it’s not that far from the truth and what you’d see in the wild.

Please note that I tried and over-simplified a lot such as what happens to partially visible primitives? or guardband-clipping or does HW really check intersection with each half-plane? Really?! for the sake of clarity; maybe these will be get to be an excuse for me to dust off my writing.