Challenges in porting Ray Tracing in One Weekend to the GPU
Lately I’ve been learning OpenGL and to get better at coding shaders I went through the Ray Tracing in One Weekend book and ported it to GLSL (OpenGL Shading Language). The book is great by the way and I would recommend this challenge to anyone who is interested in learning about ray tracing or shader programming. The book implements the ray tracer in C++ on the CPU and I’ve ported it to GLSL to run on the GPU. While making the port I ran into some interesting challenges due to the differences in CPU and GPU architectures so I thought I’d write a post about them and how I solved them.
I got the idea of following this book and porting it to GLSL from Anton’s OpenGL 4 Tutorials which are a great resource to learn about OpenGL. Here is the final image from the book rendered by my ray tracer on the GPU:
Intro to GLSL
GLSL is OpenGL’s flavor of a shader language, the type of language
that is used to program the GPU. GLSL is designed to look similar to C
but there are some fundamental differences. For example all functions
are inlined, there are no pointers, and also no dynamically sized
arrays. The main feature of shader languages is that many instances of
your program run in parallel on the GPU. To implement my ray tracer I
mostly used a fragment shader, for which there will be one invocation
per pixel to determine that pixel’s color. You read the pixel
coordinates from gl_FragCoord.x
and
gl_FragCoord.y
and write the color of the pixel to
frag_color
(the output variable can be named however you
like). This is very fast because the GPU can compute the colors of the
individual pixels in parallel. For example, this shader program draws a
gradient to the screen:
out vec4 frag_color;
void main() {
frag_color = vec4(
gl_FragCoord.x / 1920, // Red
gl_FragCoord.y / 1080, // Green
0.0, // Blue
1.0 // Alpha
);
}
When implementing a ray tracer you simulate a ray going into the scene from every pixel that bounces around the scene and collects light upon hitting objects. So in the simplest case every invocation of the fragment shader simulates one ray. Eventually you will actually cast multiple rays per pixel and average them to get a cleaner image.
No interfaces
The book uses C++ class inheritance to create a Material
class that is then subclassed by different material types like shiny
metal and translucent glass. Materials have a scatter()
function that takes a ray and either returns a new scattered ray or
signals that the ray has been absorbed by the material. The different
material types each have their own implementation of this function and
other parts of the code can just deal with Material
objects
and call scatter()
without having to care about the
concrete material type. In Go you would use interfaces to solve this and
in Rust you’d use traits, but in GLSL, just like in C, we don’t have
interfaces or the like.
I’ve worked around this by stuffing the different material properties in one big struct and some of them remain unused depending of the type of material:
struct Material {
uint type;
vec3 albedo;
float fuzz;
float refraction_index;
};
const uint MAT_LAMBERTIAN = 0;
const uint MAT_METAL = 1;
const uint MAT_DIELECTRIC = 2;
Then I have a scatter function that calls the correct implementation depending on the type:
bool scatter(Ray r, HitRecord rec, out vec3 attenuation, out Ray scattered) {
switch (rec.mat.type) {
case MAT_LAMBERTIAN:
return scatter_lambertian(r, rec, attenuation, scattered);
case MAT_METAL:
return scatter_metal(r, rec, attenuation, scattered);
case MAT_DIELECTRIC:
return scatter_dielectric(r, rec, attenuation, scattered);
default:
return false;
}
}
I’ve also added constructors for the different types so the caller only has to pass the fields that are used by that material type:
Material mat_metal(vec3 albedo, float fuzz) {
Material mat;
mat.type = MAT_METAL;
mat.albedo = albedo;
mat.fuzz = fuzz;
return mat;
}
This approach works well and can also be used to create an abstract geometry aka “hittable” class. My ray tracer currently only supports spheres so I haven’t applied this approach to geometry yet.
No randomness
Ray tracers commonly make use of randomness to simulate light rays.
For example, rays hitting a diffuse object are scattered in a random
direction and when a ray hits a glass sphere it is determined
probabilistically if it refracts or reflects. But the GLSL language does
not provide a source of randomness. There is the built-in noise()
function but it isn’t actually implemented by any vendor’s GPU drivers
so we cannot use it practice. I found a Stack
Overflow question with a bunch of solutions including this pseudo
random number generator that you also find in other places on the
internet:
float rand(vec2 co){
return fract(sin(dot(co, vec2(12.9898, 78.233))) * 43758.5453);
}
When visualizing this function you can see obvious patterns which means it is not a very good random number generator:
Further down the the list of answers I found the solution that I adapted for my ray tracer. They use an integer-based pseudo random number generator and then interpret the bit representation of that integer as an IEEE floating point number with some tricks to make sure it falls in the [0, 1) range:
// A single iteration of Bob Jenkins' One-At-A-Time hashing algorithm.
uint hash( uint x ) {
x += ( x << 10u );
x ^= ( x >> 6u );
x += ( x << 3u );
x ^= ( x >> 11u );
x += ( x << 15u );
return x;
}
// Construct a float with half-open range [0:1] using low 23 bits.
// All zeroes yields 0.0, all ones yields the next smallest representable value below 1.0.
float floatConstruct( uint m ) {
const uint ieeeMantissa = 0x007FFFFFu; // binary32 mantissa bitmask
const uint ieeeOne = 0x3F800000u; // 1.0 in IEEE binary32
m &= ieeeMantissa; // Keep only mantissa bits (fractional part)
m |= ieeeOne; // Add fractional part to 1.0
float f = uintBitsToFloat( m ); // Range [1:2]
return f - 1.0; // Range [0:1]
}
By keeping a global seed variable you can generate a sequence of
random numbers by repeatedly calling random_float()
:
uint seed;
float random_float() {
// Returns a random real in [0,1).
seed = hash(seed);
return floatConstruct(seed);
}
I seed the random number generator with the frame number and the pixel coordinate so each invocation of the shader program gets a different random sequence:
seed = frame_number ^ hash(floatBitsToUint(gl_FragCoord.x) ^ hash(floatBitsToUint(gl_FragCoord.y)));
When visualizing this function you can see that it looks much more random:
Random unit vectors
The book suggests to use a rejection method for generating random 3D unit vectors. You would pick a random value between -1 and 1 for each of the X, Y and Z components and if that vector is inside the unit sphere you scale it up and if it is outside you discard it and try again. This is the code from the book:
vec3 random_unit_vector() {
while (true) {
vec3 p = random_vec3(-1.0, 1.0);
float lensq = dot(p, p);
if (1e-20 < lensq && lensq <= 1.0) {
return p / sqrt(lensq);
}
}
}
For some reason this would crash my ray tracer frequently. My theory is that in rare cases the function would get stuck in a loop, maybe just by chance it keeps generating vectors outside the unit sphere, or maybe the random number generator has a flaw where it just keeps generating big numbers on some seeds. What is certain is that loops on the GPU are always good to avoid if you can because they slow things down. There is a better method to generate random unit vectors that doesn’t require a loop and my ray tracer didn’t crash anymore when I switched to it:
vec3 random_unit_vector() {
float theta = random_float(0.0, 2.0 * PI);
float u = random_float(-1.0, 1.0);
float r = sqrt(1.0 - u * u);
return vec3(r * cos(theta), r * sin(theta), u);
}
The book shows the rejection method because it is easier to understand but there is no doubt that the non-loop method performs better. There is a similar non-loop method to generate random 2D unit vectors:
vec3 random_in_unit_disk() {
float theta = random_float(0.0, 2.0 * PI);
float r = sqrt(random_float(0.0, 1.0));
return vec3(r * cos(theta), r * sin(theta), 0.0);
}
No recursion
The ray tracer in the book uses recursion to bounce the ray around
the scene. Note that the ray_color
function calls
itself:
color ray_color(const ray& r, int depth, const hittable& world) const {
// If we've exceeded the ray bounce limit, no more light is gathered.
if (depth <= 0)
return color(0,0,0);
hit_record rec;
if (world.hit(r, interval(0.001, infinity), rec)) {
ray scattered;
color attenuation;
if (rec.mat->scatter(r, rec, attenuation, scattered))
return attenuation * ray_color(scattered, depth-1, world);
return color(0,0,0);
}
// Return sky color
vec3 unit_direction = unit_vector(r.direction());
auto a = 0.5*(unit_direction.y() + 1.0);
return (1.0-a)*color(1.0, 1.0, 1.0) + a*color(0.5, 0.7, 1.0);
}
The ray_color
function is initially passed the ray from
the camera eye into the scene. When the ray hits an object the ray color
is attenuated with the objects color and the ray_color
function is called again with the scattered ray. This continues
recursively until the ray bounce limit is reached, the ray is absorbed
by an object, or the ray does not hit any object and sky color is
returned.
Remember from earlier that in GLSL all functions are inlined? This is
also the reason why recursion is not allowed in GLSL. Fortunately this
function is fairly easy to rewrite without recursion. If you rearrange
the conditions in the function such that the
return attenuation * ray_color(scattered, depth-1, world)
line is at the very end you can see that it is almost tail recursive. A
tail recursive function can be easily rewritten as a loop by updating
the function parameters of the recursive call in place and looping back
to the start of the function. The only operation that happens after the
recursive call and is stopping the function from being truly tail
recursive is the multiplication by attenuation
. Any
operation that happens afterwards would be executed in reverse order as
the call stack is unwinding but luckily with multiplication order
doesn’t matter. Rewriting this function in GLSL without recursion we
get:
vec3 ray_color(Camera cam, Ray r, World world) {
HitRecord rec;
int depth = cam.max_depth;
vec3 color = vec3(1.0, 1.0, 1.0);
while (hit_world(world, r, Interval(0.001, INFINITY), rec)) {
// If we've exceeded the ray bounce limit, no more light is gathered.
if (depth <= 0) {
return vec3(0.0, 0.0, 0.0);
}
Ray scattered;
vec3 attenuation;
if (scatter(r, rec, attenuation, scattered)) {
r = scattered;
depth--;
color *= attenuation;
} else {
return vec3(0.0, 0.0, 0.0);
}
}
vec3 unit_direction = normalize(r.direction);
float a = 0.5 * (unit_direction.y + 1.0);
color *= mix(vec3(1.0, 1.0, 1.0), vec3(0.5, 0.7, 1.0), a);
return color;
}
Progressive rendering
When adding lots of objects to the scene my ray tracer gets laggy to the point where it is slowing down my entire desktop shell and I have solved this problem only partially so far. I added progressive rendering so that only one ray is simulated per shader invocation and the image is constructed by averaging the newly rendered frame with the accumulation of the previous frames. The image becomes more and more refined with each frame of the render loop. This requires some additional setup on the OpenGL side, namely I create two framebuffer objects (FBOs) to render to so I can feed the contents of the framebuffer as input to the shader program when rendering the next frame. This is commonly called ping-pong buffering:
- Draw to FBO1 texture with ray tracing shader
- Draw FBO1 texture to screen with a simple “copy” shader
- Draw to FBO2 with ray tracing shader and pass FBO1 texture as input
- Draw FBO2 texture to screen with “copy” shader
- Repeat, swapping FBO1 and FBO2
The ray tracing shader uses the frame number to update the average pixel color incrementally:
// Mix the current pixel color with the previous frame's pixel color.
vec4 current_color = vec4(pixel_color(cam, world), 0.1);
vec4 previous_color = texture(previous_texture, gl_FragCoord.xy / vec2(image_width, image_height));
frag_color = previous_color + (current_color - previous_color) / float(frame_number);
Non-linear post-processing effects like the gamma correction described in the book have to be applied after the averaging. Otherwise you will get a different result because averaging square roots is not the same as taking the square root of the average. I simply apply gamma correction in the “copy” shader that copies the FBO contents to the screen:
vec3 linear_to_gamma(vec3 color) {
return sqrt(color);
}
void main() {
frag_color = vec4(linear_to_gamma(texture(u_texture, v_texcoord).rgb), 1.0);
}
Simulating only one ray per shader invocation fixed my performance issues for a while but eventually as the scene got even more complex the performance issues returned. I tried a bunch of fixes like artificially limiting the frame rate and rendering only parts of the image per frame but these approaches didn’t help. It seems even if you pause between frames when you send off one frame to the GPU that takes a long time it will keep the GPU busy for so long that the desktop rendering stalls. I would be curious what other approaches there are to solve this problem. Maybe there is some other way of splitting the work across multiple frames or maybe compute shaders might be able to independently do work without disrupting the desktop responsiveness.
So this was my adventure of porting Ray Tracing in One Weekend to GLSL. All in all this was really fun and I was surprised how many interesting and varied computer science challenges came up in the process. My final code is available on GitHub for everyone to explore.