Rasterization in One Weekend, Part II

Code for this part can be found here.

Go 3D!

In Part I: Hello, Triangle!, we rendered a two-dimensional triangle defined in normalized device coordinates-space [-1, 1] directly. As simple as they can get, it was already perfect for projecting to our two-dimensional screen. Before we get into adding a third dimension to our virtual world, let’s have a quick refresher on various spaces and coordinate systems used in computer graphics, again referencing the awesome resource that is learnopengl.com:

Local-space or object-space is where objects created in a 3D software such as Autodesk Maya or Blender, are usually modeled in. It’s simply an intermediary space that allows you to decouple objects’ position/scaling/rotation from their final location in your virtual world/camera/screen. That’s what we’ll do in this part, too. We’re gonna use vertex data of a single cube in object-space and apply different transformations to render multiple cubes. Here is our pseudo vertex and index buffers:

        // Setup vertices & indices to draw an indexed cube
        glm::vec3 vertices[] =
            { 1.0f, -1.0f, -1.0f },
            { 1.0f, -1.0f, 1.0f },
            { -1.0f, -1.0f, 1.0f },
            { -1.0f, -1.0f, -1.0f },
            { 1.0f, 1.0f, -1.0f },
            {  1.0f, 1.0f, 1.0f },
            { -1.0f, 1.0f, 1.0f },
            { -1.0f, 1.0f, -1.0f },

        uint32_t indices[] =
            1,3,0, 7,5,4, 4,1,0, 5,2,1, 2,7,3, 0,7,4, 1,2,3, 7,6,5, 4,5,1, 5,6,2, 2,6,7, 0,3,7

If the idea of using an index buffer is new to you: It’s simply a look-up table that maps indices to actual vertices. For instance, If you look at a geometric shape like cube, you can see many vertices that are shared by multiple triangles. So the idea here is simply that, instead of storing the same vertices over and over again, we can build a buffer of unique vertices and look up real vertices at run-time using the index buffer while rendering the objects. Index buffers are fundamental for saving buffer space and accelerating drawing complex objects in 3D. Note how this time, we have real z coordinates.

Okay, we have our vertex & index buffers set up, let’s apply some random rotation and translations on them to render multiple cubes in 3D so that we can test our implementation and have something more than a dull triangle to show off:

        // Let's draw multiple objects
        std::vector<glm::mat4> objects;
    void InitializeSceneObjects(std::vector<glm::mat4>& objects)
        // Construct a scene of few cubes randomly positioned

        const glm::mat4 identity(1.f);

        glm::mat4 M0 = glm::translate(identity, glm::vec3(0, 0, 2.f));
        M0 = glm::rotate(M0, glm::radians(45.f), glm::vec3(0, 1, 0));

        glm::mat4 M1 = glm::translate(identity, glm::vec3(-3.75f, 0, 0));
        M1 = glm::rotate(M1, glm::radians(30.f), glm::vec3(1, 0, 0));

        glm::mat4 M2 = glm::translate(identity, glm::vec3(3.75f, 0, 0));
        M2 = glm::rotate(M2, glm::radians(60.f), glm::vec3(0, 1, 0));

        glm::mat4 M3 = glm::translate(identity, glm::vec3(0, 0, -2.f));
        M3 = glm::rotate(M3, glm::radians(90.f), glm::vec3(0, 0, 1));

What we add to array of objects is here basically the model matrix of each cube that we want to render. What does a model matrix do? They simply transform objects that are defined in object-space to the 3D world-space. By applying a different model matrix to the same vertex buffer that we defined above in object-space, we will be able to have multiple cubes rendered simultaneously.

Now that we positioned our 3D objects in our virtual world, let’s look at next how to build view and projection matrices.

        // Build view & projection matrices (right-handed sysem)
        float nearPlane = 0.1f;
        float farPlane = 100.f;
        glm::vec3 eye(0, 3.75, 6.5);
        glm::vec3 lookat(0, 0, 0);
        glm::vec3 up(0, 1, 0);

        glm::mat4 view = glm::lookAt(eye, lookat, up);
        glm::mat4 proj = glm::perspective(glm::radians(60.f), (float)g_scWidth / (float)g_scHeight, nearPlane, farPlane);

View (or camera) and projection matrices will allow us to view our virtual world in 3D similar to how we see the real world, in perspective. By applying view and projection matrices to each vertex, we will transform them from world-space (remember, they were just moved from object-space to world-space by use of model matrix!) to screen-space.

So in essence, the purpose of all these transformations is to project objects to image-plane where w=1. Let’s get into the details of how to do that with inclusion of third dimension now.

Unlike Part I, where we had a single triangle which consists of three vertices to render, we’re gonna render multiple cubes consisting of 12 triangles each this time around, so two more nested loops come in:

        // Loop over objects in the scene
        for (size_t n = 0; n < objects.size(); n++)
            // Loop over triangles in a given object and rasterize them one by one
            for (uint32_t idx = 0; idx < ARR_SIZE(indices) / 3; idx++)

Now is the time to put those vertex & index buffers we’ve allocated above to use by fetching vertices of each triangle that we’re trying to render:

            // We're gonna fetch object-space vertices from the vertex buffer indexed by the values in index buffer
            // and pass them directly to each VS invocation
            const glm::vec3& v0 = vertices[indices[idx * 3]];
            const glm::vec3& v1 = vertices[indices[idx * 3 + 1]];
            const glm::vec3& v2 = vertices[indices[idx * 3 + 2]];

If you are already familiar with the concept of shaders, you must know what we should do next: Invoke a vertex shader to have each vertex (that we’ve just fetched) transformed to clip-space.

            // Invoke function for each vertex of the triangle to transform them from object-space to clip-space (-w, w)
            glm::vec4 v0Clip = VS(v0, objects[n], view, proj);
            glm::vec4 v1Clip = VS(v1, objects[n], view, proj);
            glm::vec4 v2Clip = VS(v2, objects[n], view, proj);

Why do we do that, you may ask if you don’t know about shaders. We already hinted at why: To apply all the necessary transformations that move objects from object-space to clip-space, which is simply what we’ll do in VS() call:

glm::vec4 VS(const glm::vec3& pos, const glm::mat4& M, const glm::mat4& V, const glm::mat4& P)
    return (P * V * M * glm::vec4(pos, 1.0f));

Notice how we pass the same model, view and projection matrices to each VS invocation but the position is different for each vertex of triangle, which is how we form a triangle after all — three distinct points.

Now, we say that result of each VS invocation will be transformed vertices in clip-space but what does it mean really? Put simply, clip-space is yet another space in our convenience toolbox where clipping is done in the 3D pipeline.

One important thing to realize is that, by this point what we have is 3D (x, y, z, w) homogeneous coordinates in clip-space where the w coordinate is usually non-zero. Unlike NDC-space which we define to be valid in the range [-1, 1], clip-space homogeneous coordinates are contained in [-w, w]

If you are familiar with other types of rasterization algorithms, this is the point at which perspective-division takes place generally. Perspective-division is simply dividing (x, y, z) values in clip-space by their w coordinate so that we end up in [-1, 1] NDC-space. That operation is what gives us the perspective look, actually.

If you read the 2D homogeneous rasterization paper that I’ve linked to in the introduction, you must know that the rasterization algorithm that we utilize here differs in that it does not require perspective-division that we just talked about. Why? Why not just do this division and be done? The reason is that division by w complicates the pipeline in that it requires explicit clipping of triangles against view frustum, because as it happens, there may be triangles whose vertices have w=0, in which case we end up with a division-by-zero. Other than the fact that division operation is slower than addition or multiplication, more so on heavily pipelined architectures such as GPUs, this is the reason that we try and avoid it. We’ll see now how we can get away with it and still apply perspective to our scene.

Similar to Part I, we apply viewport transformation to projected vertices, albeit still in clip-space, in order to set up the base vertex matrix:

            // Apply viewport transformation
            glm::vec4 v0Homogen = TO_RASTER(v0Clip);
            glm::vec4 v1Homogen = TO_RASTER(v1Clip);
            glm::vec4 v2Homogen = TO_RASTER(v2Clip);
            // Base vertex matrix
            glm::mat3 M =
                // Notice that glm is itself column-major)
                { v0Homogen.x, v1Homogen.x, v2Homogen.x},
                { v0Homogen.y, v1Homogen.y, v2Homogen.y},
                { v0Homogen.w, v1Homogen.w, v2Homogen.w},

TO_RASTER macro contains some modifications so let’s have a look at it:

#define TO_RASTER(v) glm::vec4((g_scWidth * (v.x + v.w) / 2), (g_scHeight * (v.w - v.y) / 2), v.z, v.w)

Notice that we still haven’t applied any perspective division, which means that v here is in the range [-w, w]. Do you see why it’s called 2D homogeneous rasterization and not 3D? Look at the vertex matrix, there is no mention of z coordinate. A key realization here is that clip-space w values, under normal perspective transformations (regardless of why type of rasterization algorithm you use) will be equal to view-space z or distance from our camera.

The property that clip-space w equals view-space z is what allows us to interpolate linear functions across triangles in screen-space, as we did with edge functions in Part I, but also any other values defined in vertices in 3D.

The determinant of vertex matrix happens to also encode additional information such as orientation of a triangle (i.e. back-facing or front-facing) so we’ll put it to some use:

            // If det(M) == 0.0f, we'd perform division by 0 when calculating the invert matrix,
            // whereas (det(M) > 0) implies a back-facing triangle
            float det = glm::determinant(M);
            if (det >= 0.0f)

            // Compute the inverse of vertex matrix to use it for setting up edge & constant functions
            M = inverse(M);

After that, we calculate edge functions as in Part I, in addition to constant function that is new here:

            // Set up edge functions based on the vertex matrix
            glm::vec3 E0 = M[0];
            glm::vec3 E1 = M[1];
            glm::vec3 E2 = M[2];

            // Calculate constant function to interpolate 1/w
            glm::vec3 C = M * glm::vec3(1, 1, 1);

We know what the edge functions do already — they help us determine which pixels are inside the triangles that we’re projecting to image-plane, what’s a constant function do? It’s something that we make up, actually, in order to be able to interpolate 1/w values across triangles (don’t worry if it doesn’t make sense right now; we’ll detail how and why we use this constant function in Part III!).

We’re almost there, but with inclusion of third dimension in our world, we have another problem to solve, which is called the visibility problem or hidden surface removal. What this problem is about is quiet simple: Given a set of objects, determine which objects are closer to the camera. 
Various solutions to this problem exist and they incur different trade-offs, offer different performance characteristics and convenience for implementations. If you did any work in 3D APIs such as OpenGL or Direct3D, you know the concept of depth buffer already. That is the mechanism by which we’ll resolve our visibility problem, albeit in a slightly different way in this part.

As in Part I, we start by scanning pixels again and see which are covered or not:

                // Start rasterizing by looping over pixels to output a per-pixel color
                for (auto y = 0; y < g_scHeight; y++)
                    for (auto x = 0; x < g_scWidth; x++)
                        // Sample location at the center of each pixel
                        glm::vec3 sample = { x + 0.5f, y + 0.5f, 1.0f };

                        // Evaluate edge functions at current fragment
                        bool inside0 = EvaluateEdgeFunction(E0, sample);
                        bool inside1 = EvaluateEdgeFunction(E1, sample);
                        bool inside2 = EvaluateEdgeFunction(E2, sample);

                        // If sample is "inside" of all three half-spaces bounded by the three edges of the triangle, it's 'on' the triangle
                        if (inside0 && inside1 && inside2)
                            // Interpolate 1/w at current fragment
                            float oneOverW = (C.x * sample.x) + (C.y * sample.y) + C.z;

                            // Perform depth test with interpolated 1/w value
                            // Notice that the greater 1/w value is, the closer the object would be to our virtual camera,
                            // hence "less_than_equal" comparison is actually oneOverW >= depthBuffer[sample] and not oneOverW <= depthBuffer[sample] here
                            if (oneOverW >= depthBuffer[x + y * g_scWidth])
                                // Depth test passed; update depth buffer value
                                depthBuffer[x + y * g_scWidth] = oneOverW;

                                // Write new color at this fragment
                                frameBuffer[x + y * g_scWidth] = colors[indices[3 * idx] % 6];

Again, we interpolate edge functions at every fragment and check given fragment is visible. If so, we interpolate 1/w value at given visible fragment using the constant function that we have just set up above. To resolve the visibility issue now, we perform what’s called the depth test. It’s basically a comparison of current depth of pixel with incoming depth pixel value, in order to determine whichever is closer to the camera. Once we know the relation between them, we can simply update our depth buffer and write new fragment color at this fragment. Notice that we use a depth buffer of values of interpolated 1/w and not z as is usually done in OpenGL or Direct3D. Why? Because we already have this data available, thanks to the fact that we can interpolate any function defined across triangles. In Part III, we’ll see how we can actually use z among other things. Also see how, due to the fact that we interpolate 1/w, depth test less_than_eqaul corresponds to >=

As we did with frameBuffer, we allocate a depth buffer of the same resolution outside the loop over objects:

    // Allocate and clear the depth buffer (not z-buffer but 1/w-buffer in this part!) to 0.0
    std::vector<float> depthBuffer(g_scWidth * g_scHeight, 0.0);

Note that this time, we improved our edge function a bit not to process vertices on shared edges twice, because that leads not only to waste of computation but also wrong results (think of transparent objects).

If everything went well, we must be welcome by a bunch of cubes rendered:

See you in Part III: Go Wild! where we will improve our visuals quite a bit by handling vertex attributes, as well as their perspective-correct interpolation.

One thought on “Rasterization in One Weekend, Part II

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s