Code for this part can be found here.

## Go 3D!

In Part I: Hello, Triangle!, we rendered a two-dimensional triangle defined in normalized device coordinates-space [-1, 1] directly. As simple as they can get, it was already perfect for projecting to our two-dimensional screen. Before we get into adding a third dimension to our virtual world, let’s have a quick refresher on various spaces and coordinate systems used in computer graphics, again referencing the awesome resource that is learnopengl.com:

*Local-space* or *object-space* is where objects created in a 3D software such as Autodesk Maya or Blender, are usually modeled in. It’s simply an intermediary space that allows you to decouple objects’ position/scaling/rotation from their final location in your virtual world/camera/screen. That’s what we’ll do in this part, too. We’re gonna use vertex data of a single cube in object-space and apply different transformations to render multiple cubes. Here is our pseudo vertex and index buffers:

// Setup vertices & indices to draw an indexed cube glm::vec3 vertices[] = { { 1.0f, -1.0f, -1.0f }, { 1.0f, -1.0f, 1.0f }, { -1.0f, -1.0f, 1.0f }, { -1.0f, -1.0f, -1.0f }, { 1.0f, 1.0f, -1.0f }, { 1.0f, 1.0f, 1.0f }, { -1.0f, 1.0f, 1.0f }, { -1.0f, 1.0f, -1.0f }, }; uint32_t indices[] = { 1,3,0, 7,5,4, 4,1,0, 5,2,1, 2,7,3, 0,7,4, 1,2,3, 7,6,5, 4,5,1, 5,6,2, 2,6,7, 0,3,7 };

If the idea of using an index buffer is new to you: It’s simply a look-up table that maps indices to actual vertices. For instance, If you look at a geometric shape like cube, you can see many vertices that are shared by multiple triangles. So the idea here is simply that, instead of storing the same vertices over and over again, we can build a buffer of unique vertices and look up real vertices at run-time using the index buffer while rendering the objects. Index buffers are fundamental for saving buffer space and accelerating drawing complex objects in 3D. Note how this time, we have real *z* coordinates.

Okay, we have our vertex & index buffers set up, let’s apply some random rotation and translations on them to render multiple cubes in 3D so that we can test our implementation and have something more than a dull triangle to show off:

// Let's draw multiple objects std::vector<glm::mat4> objects; InitializeSceneObjects(objects);

void InitializeSceneObjects(std::vector<glm::mat4>& objects) { // Construct a scene of few cubes randomly positioned const glm::mat4 identity(1.f); glm::mat4 M0 = glm::translate(identity, glm::vec3(0, 0, 2.f)); M0 = glm::rotate(M0, glm::radians(45.f), glm::vec3(0, 1, 0)); objects.push_back(M0); glm::mat4 M1 = glm::translate(identity, glm::vec3(-3.75f, 0, 0)); M1 = glm::rotate(M1, glm::radians(30.f), glm::vec3(1, 0, 0)); objects.push_back(M1); glm::mat4 M2 = glm::translate(identity, glm::vec3(3.75f, 0, 0)); M2 = glm::rotate(M2, glm::radians(60.f), glm::vec3(0, 1, 0)); objects.push_back(M2); glm::mat4 M3 = glm::translate(identity, glm::vec3(0, 0, -2.f)); M3 = glm::rotate(M3, glm::radians(90.f), glm::vec3(0, 0, 1)); objects.push_back(M3); }

What we add to array of *objects* is here basically the *model* matrix of each cube that we want to render. What does a model matrix do? They simply transform objects that are defined in object-space to the 3D *world-space*. By applying a different model matrix to the **same** vertex buffer that we defined above in object-space, we will be able to have multiple cubes rendered simultaneously.

Now that we positioned our 3D objects in our virtual world, let’s look at next how to build *view* and *projection* matrices.

// Build view & projection matrices (right-handed sysem) float nearPlane = 0.1f; float farPlane = 100.f; glm::vec3 eye(0, 3.75, 6.5); glm::vec3 lookat(0, 0, 0); glm::vec3 up(0, 1, 0); glm::mat4 view = glm::lookAt(eye, lookat, up); glm::mat4 proj = glm::perspective(glm::radians(60.f), (float)g_scWidth / (float)g_scHeight, nearPlane, farPlane);

*View* (or camera) and *projection* matrices will allow us to view our virtual world in 3D similar to how we see the real world, in *perspective*. By applying view and projection matrices to each vertex, we will transform them from world-space (remember, they were just moved from object-space to world-space by use of model matrix!) to screen-space.

So in essence, the purpose of all these transformations is to project objects to image-plane where *w=1*. Let’s get into the details of how to do that with inclusion of third dimension now.

Unlike Part I, where we had a single triangle which consists of three vertices to render, we’re gonna render multiple cubes consisting of 12 triangles each this time around, so two more nested loops come in:

// Loop over objects in the scene for (size_t n = 0; n < objects.size(); n++) { // Loop over triangles in a given object and rasterize them one by one for (uint32_t idx = 0; idx < ARR_SIZE(indices) / 3; idx++)

Now is the time to put those vertex & index buffers we’ve allocated above to use by fetching vertices of each triangle that we’re trying to render:

// We're gonna fetch object-space vertices from the vertex buffer indexed by the values in index buffer // and pass them directly to each VS invocation const glm::vec3& v0 = vertices[indices[idx * 3]]; const glm::vec3& v1 = vertices[indices[idx * 3 + 1]]; const glm::vec3& v2 = vertices[indices[idx * 3 + 2]];

If you are already familiar with the concept of shaders, you must know what we should do next: Invoke a vertex shader to have each vertex (that we’ve just fetched) transformed to *clip-space*.

// Invoke function for each vertex of the triangle to transform them from object-space to clip-space (-w, w) glm::vec4 v0Clip = VS(v0, objects[n], view, proj); glm::vec4 v1Clip = VS(v1, objects[n], view, proj); glm::vec4 v2Clip = VS(v2, objects[n], view, proj);

Why do we do that, you may ask if you don’t know about shaders. We already hinted at *why*: To apply all the necessary transformations that move objects from object-space to clip-space, which is simply what we’ll do in VS() call:

glm::vec4 VS(const glm::vec3& pos, const glm::mat4& M, const glm::mat4& V, const glm::mat4& P) { return (P * V * M * glm::vec4(pos, 1.0f)); }

Notice how we pass the same *model, view *and *projection* matrices to each VS invocation but the position is different for each vertex of triangle, which is how we form a triangle after all — three distinct points.

Now, we say that result of each VS invocation will be transformed vertices in clip-space but what does it mean really? Put simply, clip-space is yet another space in our convenience toolbox where clipping is done in the 3D pipeline.

One important thing to realize is that, by this point what we have is 3D (*x, y, z, w)* *homogeneous coordinates* in clip-space where the *w* coordinate is usually non-zero. Unlike NDC-space which we define to be valid in the range [-1, 1], clip-space homogeneous coordinates are contained in [*-w, w*]

If you are familiar with other types of rasterization algorithms, this is the point at which perspective-division takes place generally. Perspective-division is simply dividing (*x, y, z)* values in clip-space by their *w* coordinate so that we end up in [-1, 1] NDC-space. That operation is what gives us the perspective look, actually.

If you read the 2D homogeneous rasterization paper that I’ve linked to in the introduction, you must know that the rasterization algorithm that we utilize here differs in that it does **not** require perspective-division that we just talked about. Why? Why not just do this division and be done? The reason is that division by *w* complicates the pipeline in that it requires explicit clipping of triangles against view frustum, because as it happens, there may be triangles whose vertices have *w=0*, in which case we end up with a division-by-zero. Other than the fact that division operation is slower than addition or multiplication, more so on heavily pipelined architectures such as GPUs, this is the reason that we try and avoid it. We’ll see now how we can get away with it and still apply perspective to our scene.

Similar to Part I, we apply viewport transformation to projected vertices, albeit still in clip-space, in order to set up the base vertex matrix:

// Apply viewport transformation glm::vec4 v0Homogen = TO_RASTER(v0Clip); glm::vec4 v1Homogen = TO_RASTER(v1Clip); glm::vec4 v2Homogen = TO_RASTER(v2Clip);

// Base vertex matrix glm::mat3 M = { // Notice that glm is itself column-major) { v0Homogen.x, v1Homogen.x, v2Homogen.x}, { v0Homogen.y, v1Homogen.y, v2Homogen.y}, { v0Homogen.w, v1Homogen.w, v2Homogen.w}, };

*TO_RASTER* macro contains some modifications so let’s have a look at it:

#define TO_RASTER(v) glm::vec4((g_scWidth * (v.x + v.w) / 2), (g_scHeight * (v.w - v.y) / 2), v.z, v.w)

Notice that we still haven’t applied any perspective division, which means that *v* here is in the range [*-w, w*]. Do you see why it’s called 2D homogeneous rasterization and not 3D? Look at the vertex matrix, there is no mention of *z* coordinate. A key realization here is that clip-space *w* values, under normal perspective transformations (regardless of why type of rasterization algorithm you use) will be equal to view-space *z* or *distance* from our camera.

The property that clip-space *w* equals view-space *z* is what allows us to interpolate linear functions across triangles in *screen-space*, as we did with edge functions in Part I, but also any other values defined in vertices in 3D.

The determinant of vertex matrix happens to also encode additional information such as orientation of a triangle (i.e. back-facing or front-facing) so we’ll put it to some use:

// If det(M) == 0.0f, we'd perform division by 0 when calculating the invert matrix, // whereas (det(M) > 0) implies a back-facing triangle float det = glm::determinant(M); if (det >= 0.0f) continue; // Compute the inverse of vertex matrix to use it for setting up edge & constant functions M = inverse(M);

After that, we calculate edge functions as in Part I, in addition to *constant function* that is new here:

// Set up edge functions based on the vertex matrix glm::vec3 E0 = M[0]; glm::vec3 E1 = M[1]; glm::vec3 E2 = M[2]; // Calculate constant function to interpolate 1/w glm::vec3 C = M * glm::vec3(1, 1, 1);

We know what the edge functions do already — they help us determine which pixels are inside the triangles that we’re projecting to image-plane, what’s a constant function do? It’s something that we make up, actually, in order to be able to interpolate *1/w* values across triangles (don’t worry if it doesn’t make sense right now; we’ll detail how and why we use this constant function in Part III!).

We’re almost there, but with inclusion of third dimension in our world, we have another problem to solve, which is called the *visibility* problem or hidden surface removal. What this problem is about is quiet simple: Given a set of objects, determine which objects are closer to the camera.

Various solutions to this problem exist and they incur different trade-offs, offer different performance characteristics and convenience for implementations. If you did any work in 3D APIs such as OpenGL or Direct3D, you know the concept of depth buffer already. That is the mechanism by which we’ll resolve our visibility problem, albeit in a slightly different way in this part.

As in Part I, we start by scanning pixels again and see which are covered or not:

// Start rasterizing by looping over pixels to output a per-pixel color for (auto y = 0; y < g_scHeight; y++) { for (auto x = 0; x < g_scWidth; x++) { // Sample location at the center of each pixel glm::vec3 sample = { x + 0.5f, y + 0.5f, 1.0f }; // Evaluate edge functions at current fragment bool inside0 = EvaluateEdgeFunction(E0, sample); bool inside1 = EvaluateEdgeFunction(E1, sample); bool inside2 = EvaluateEdgeFunction(E2, sample); // If sample is "inside" of all three half-spaces bounded by the three edges of the triangle, it's 'on' the triangle if (inside0 && inside1 && inside2) { // Interpolate 1/w at current fragment float oneOverW = (C.x * sample.x) + (C.y * sample.y) + C.z; // Perform depth test with interpolated 1/w value // Notice that the greater 1/w value is, the closer the object would be to our virtual camera, // hence "less_than_equal" comparison is actually oneOverW >= depthBuffer[sample] and not oneOverW <= depthBuffer[sample] here if (oneOverW >= depthBuffer[x + y * g_scWidth]) { // Depth test passed; update depth buffer value depthBuffer[x + y * g_scWidth] = oneOverW; // Write new color at this fragment frameBuffer[x + y * g_scWidth] = colors[indices[3 * idx] % 6]; } } } }

Again, we interpolate edge functions at every fragment and check given fragment is visible. If so, we interpolate *1/w* value at given **visible** fragment using the constant function that we have just set up above. To resolve the visibility issue now, we perform what’s called the *depth test*. It’s basically a comparison of *current* depth of pixel with incoming depth pixel value, in order to determine whichever is closer to the camera. Once we know the relation between them, we can simply update our depth buffer and write new fragment color at this fragment. Notice that we use a depth buffer of values of interpolated *1/w* and not *z* as is usually done in OpenGL or Direct3D. Why? Because we already have this data available, thanks to the fact that we can interpolate any function defined across triangles. In Part III, we’ll see how we can actually use *z* among other things. Also see how, due to the fact that we interpolate *1/w*, depth test less_than_eqaul corresponds to **>=**

As we did with *frameBuffer*, we allocate a depth buffer of the same resolution outside the loop over objects:

// Allocate and clear the depth buffer (not z-buffer but 1/w-buffer in this part!) to 0.0 std::vector<float> depthBuffer(g_scWidth * g_scHeight, 0.0);

Note that this time, we improved our edge function a bit not to process vertices on shared edges twice, because that leads not only to waste of computation but also wrong results (think of transparent objects).

If everything went well, we must be welcome by a bunch of cubes rendered:

See you in **Part III: Go Wild**! where we will improve our visuals quite a bit by handling vertex attributes, as well as their perspective-correct interpolation.

## One thought on “Rasterization in One Weekend, Part II”