my writeup - mikejsavage.co.uk

Transcription

my writeup - mikejsavage.co.uk
GPU Path Tracing
3rd Year Computer Science Project
Michael Savage
2014
Contents
1 Introduction
3
1.1
Motivation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
Overview of ray tracing . . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Overview of path tracing . . . . . . . . . . . . . . . . . . . . . . .
6
1.4
Accomplishments . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2 Ray Tracing in Depth
2.1
8
Setting up the camera . . . . . . . . . . . . . . . . . . . . . . . .
8
2.1.1
Calculating the camera’s coordinate system . . . . . . . .
8
2.1.2
Building the focal plane . . . . . . . . . . . . . . . . . . .
9
2.2
Tracing rays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.3
Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.3.1
Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.3.2
Spheres . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.4
Anti-aliasing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Ray Tracing in CUDA
16
17
3.1
A brief overview of CUDA . . . . . . . . . . . . . . . . . . . . . .
17
3.2
Implementing a ray tracer in CUDA . . . . . . . . . . . . . . . .
18
3.3
Limitations of CUDA
19
. . . . . . . . . . . . . . . . . . . . . . . .
1
4 Path Tracing in Depth
4.1
22
The rendering equation . . . . . . . . . . . . . . . . . . . . . . .
22
4.1.1
BRDFs and BTDFs . . . . . . . . . . . . . . . . . . . . .
23
4.2
Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . .
23
4.3
Monte Carlo path tracing . . . . . . . . . . . . . . . . . . . . . .
25
4.3.1
26
Reflectance models . . . . . . . . . . . . . . . . . . . . . .
5 Path Tracing in Rust
29
5.1
A brief overview of Rust . . . . . . . . . . . . . . . . . . . . . . .
29
5.2
Implementing a path tracer in Rust . . . . . . . . . . . . . . . . .
30
5.2.1
Materials . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
5.2.2
Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
5.2.3
Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
5.2.4
Worlds . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
5.2.5
Piecing everything together . . . . . . . . . . . . . . . . .
33
6 Conclusion
6.1
6.2
6.3
37
Performance and evaluation . . . . . . . . . . . . . . . . . . . . .
37
6.1.1
Linearity with number of threads . . . . . . . . . . . . . .
37
6.1.2
Noise reduction with number of samples . . . . . . . . . .
38
6.1.3
Adjusting the recursion limit . . . . . . . . . . . . . . . .
38
6.1.4
Tweaking the light parameters . . . . . . . . . . . . . . .
39
Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
6.2.1
Russian Roulette path termination . . . . . . . . . . . . .
40
6.2.2
Materials . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
6.2.3
Acceleration structures . . . . . . . . . . . . . . . . . . . .
40
6.2.4
Do away with pixels . . . . . . . . . . . . . . . . . . . . .
41
6.2.5
Better integration methods . . . . . . . . . . . . . . . . .
41
Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
References
43
2
Chapter 1
Introduction
1.1
Motivation
Physically based renderers are pieces of software which aim to produce photorealistic images by using physically accurate models of how light and materials in the
real world work. They have historically been too computationally intensive to
use in real-time applications, instead being applied in offline rendering situations
where time is not so much of an issue. Examples of such problems could be movie
making or architecture, where waiting an hour per frame in intensive scenes is
acceptable. Even in situations like this, however, the software involved tends
to be based on hybrid methods, using both typical rasterisation and physically
based methods. An example of such a renderer would be Pixar’s Renderman1 .
Things are starting to look up though, as physically based rendering methods
tend to be embarrassingly parallel (as you will see later in this paper) and with
computers based on, or at least with components based on, massively parallel2
architectures becoming increasingly commonplace, it’s more and more feasible
to use purely physically based renderers.
The most commonly found massively parallel architecture today is in graphics
process units (GPUs). As the name implies, they are specialised hardware which
excel at the kind of number crunching graphics workloads require and have
hundreds of floating point units. Up until recently, they have been limited to
running functions exposed through the graphics library (OpenGL3 /DirectX4 )
and running simple programs called shaders, which are restricted to operating
on pixels or vertices. While general-purpose programming was possible, it wasn’t
1 http://renderman.pixar.com/view/renderman
2 Computing
using a large number of processors/machines
3 http://www.opengl.org/
4 http://en.wikipedia.org/wiki/DirectX
3
the intended use of the hardware and programs required a lot of shoehorning to
fit into the graphics library’s model.
Today we are seeing the rise of GPGPU – general-purpose computing on graphics
processing units – with the major GPU manufacturers, NVIDIA and AMD,
providing frameworks for running less restricted code on their hardware. AMD
and Apple are pushing OpenCL5 , a general-purpose, open source framework
for parallel computation, while NVIDIA are pushing their proprietary GPU
framework, CUDA6 . The latter, being more specialised, provides a lower level
interface and given that most GPGPU applications are written with performance
in mind, has led to it seeing greater success than OpenCL.
We are even beginning to see real-time global illumination work its way into
modern renderers, such as Unity7 , CryeENGINE8 and Octane/Brigade9 . The
first two are primarily designed as game engines, with Unity using the Enlighten10 library and CryENGINE using a technique called Light Propagation
Volumes(Anton Kaplanyan 2009), which to my knowledge does not handle certain physical effects such as refraction and caustics. Octane is designed to be a
real-time, physically accurate renderer and uses path tracing. I have chosen to
investigate methods similar to Octane’s as path tracing is a natural extension
of ray tracing, which I’m already familiar with and accurate images are more
interesting than “good enough” ones.
1.2
Overview of ray tracing
Ray tracing is a method for rendering scenes by simulating how light rays bounce
from light sources, to objects in the scene, to the viewer’s eyes.
There are many advantages to using ray tracing over rasterisation. Ray tracing
is conceptually simple and easy to understand. It can simulate many optical
effects that rasterisation is unable to handle, such as reflection, refraction and
caustics, or lens effects like depth of field. The major disadvantage of ray tracing
is that it’s too slow to be used in real-time applications.
Let’s start with a few definitions:
ray: A line extending to infinity from a point in one direction. Can be expressed
mathematically as: R(t) = p + td, where t ≥ 0, p is the point it originates
from and d is the direction it extends in.
focal plane: A plane placed between the focal point of the camera and the
scene, which the image gets drawn onto
5 https://www.khronos.org/opencl/
6 http://www.nvidia.co.uk/object/cuda_home_new.html
7 http://unity3d.com/
8 http://www.crytek.com/home
9 http://render.otoy.com/
10 http://www.geomerics.com/enlighten/
4
irradiance: Irradiance is the flux per unit area incident on a surface. Or more
simply, the intensity of the incoming light.
Ray tracing works by firing rays for each pixel from the focal point, through
the pixel’s corresponding location on the focal plane and into the scene. Upon
intersection with an object, the algorithm traces additional rays towards the
lights in the scene to determine if they are visible from the intersection point.
This is done because ray tracing makes the assumption that only visible lights
contribute to the irradiance at a point. Finally, the algorithm may decide to
shoot even more rays back into the scene depending on the material properties at
the intersection point. For example, if the object is a mirror, it will cast another
ray back into the scene in the direction of the reflection, from the intersection
point. Similarly, if the object is not opaque, we should cast a refraction ray into
the object.
\begin{figure}[htbp]
\caption[]% {An image explaining ray tracing, taken from Wikipedia11 }
\end{figure}
As ray tracing assumes that only visible lights make any lighting contribution,
ray tracing can be said to only account for direct illumination. Picture an
outdoor scene with a wall in an open area, and only the sun providing
illuminance. We can say from experience of being in the real world that it’s
unlikely the area shadowed by the wall would be pitch black. To account for
this, ray tracers commonly enforce a constant ambient term for lighting in the
11 http://en.wikipedia.org/wiki/File:Ray_trace_diagram.svg
5
scene. This might produce acceptable results in my contrived example, but it’s
clearly not physically accurate and is sidestepping the actual problem.
The mathematics behind ray tracing and its implementation are covered in
more detail in chapter 2.
1.3
Overview of path tracing
As mentioned above, ray tracing only considers direct illumination in a scene.
Even with the addition of an ambient term, this can produce incorrect results in
simple scenes. For example:
mirrored wall
Figure 1.1: A scene that ray tracing performs poorly on.
Above is an image depicting a room with a wall separating both sides, except
for a gap at the top. The entire top wall of the room is a mirror and a light
source is positioned to one side. In the real world, the entire room would be lit
as the reflection of the light in the mirror would provide illuminance to parts of
the room that can’t see the light directly (shaded in the above image). In a ray
traced scene however, only the non-shaded area, where the light is directly
visible, would receive illumination from the light source.
Conversely, path tracing is an algorithm that tries to render all scenes correctly
by evaluating the global illumination in a scene.
Path tracing drops the assumption that direct lighting is the only source of
illumination in a scene, and instead adopts the key idea that there should be no
difference between light emitted from a light source and light reflected from a
surface or scattered by participating media.
Upon impact with any surface, including diffuse materials, we spawn secondary
rays and cast them back into the scene. If these rays encounter a light source,
we can transfer illuminance down the path to the eye. This gives us the effect
that every surface the path interacts with affect the throughput and therefore
the total irradiance.
6
Since the tree of all possible light paths is both infinitely wide and infinitely
deep, we need to intelligently sample paths to get an efficient implementation. I
explain the mathematics of Monte Carlo integration and its application to path
tracing in chapter 4.
1.4
Accomplishments
In my time working on this project, I built a ray tracing renderer in CUDA and
outlined a path tracing renderer which would also be implementable in CUDA. I
have elaborated on the implementation and its limitations later in chapter 3.
Due to the limitations of CUDA, I decided it would be a more efficient use of
my time to implement that path tracing algorithm in CPU code. To keep in the
spirit of this project, I paid special attention to which language features would
and, more importantly, would not be available when writing GPU code. I have
gone into more detail on both the path tracing algorithm and my
implementation in chapters 3 and 4.
7
Chapter 2
Ray Tracing in Depth
In this chapter, I will expand on my earlier explanation of ray tracing to include
the mathematics and details relevant to implementing a ray tracing renderer.
2.1
Setting up the camera
To begin, we need a camera. Specifically, given the focal point/eye, the focal
length, the field of view1 (FOV), the aspect ratio2 of the output image, the view
direction or a point to look at, and an up vector, we can compute the
coordinates of pixels on the focal plane and the camera’s coordinate system.
2.1.1
Calculating the camera’s coordinate system
Suppose we are given the world’s up direction – which need not be the up
direction of the camera – and the direction we want the camera to point in. If
we are given a point to look at instead of a view direction, we can calculate the
view direction by subtracting the focal point from the point we want to look at.
From here on I will denote the camera’s forward, left and up vectors with cf , cl
and cu respectively, and the world up vector with u. I will also assume that cf
and u are normalised3 .
1 The FOV of the camera is the angle between the lines from the focal point to the top/bottom
of the focal plane. I have drawn an example in Figure 2.1.
2 The aspect ratio is the ratio between its width and height. Again see Figure 2.1.
3 A vector is normalised when it has length 1. A vector can therefore be normalised by
dividing it by its length.
8
We can compute the camera’s left and up vectors as follows:
uw × cf
|uw × f |
cu = cf × cl
l=
Note cu is automatically of length one as cf and cl are orthogonal.
The calculation of cl degenerates and gives a zero-length result when u = −cf ,
i.e. the camera is looking directly down.
2.1.2
Building the focal plane
The inputs we need to build the focal plane are: the coordinate system we just
calculated, the vertical FOV and aspect ratio of the camera and the focal length.
side view
rear view
cu
cu
focal plane
θ
cl
cf
l
aspect ratio = width / height
Figure 2.1: A diagram of the camera parameters. The focal point is at the origin.
In the above diagram, the camera’s vertical FOV is denoted by θ and the focal
length is denoted by l.
To begin, we need to compute the actual dimensions of the focal plane. This
can be done with some trigonometry. Let r denote the camera’s aspect ratio, fh
be half the height of the focal plane and fw be half its width. Using half
lengths makes the extents calculations a little simpler and we won’t be using the
full lengths.
9
We can then compute the dimensions as follows:
fh = 2l tan
θ
2
fw = rfh
Additionally, we need the extents of the focal plane. Let ftl , ftr and fbr be the
points at the top left, top right and bottom right corners of the focal plane
respectively. We can find the top left corner with:
ftl = o + lcf + fh cu + fl cl
where o is the focal point of the camera. The other corners are analogous and
omitted.
Given the three corners, it’s easy to linearly interpolate between them to find
the centre coordinates of each pixel on the focal plane.
2.2
Tracing rays
The next job of a ray tracer is to trace rays. We begin by firing rays from the
focal point (o) into the scene through their corresponding point/pixel on the
focal plane. The pseudocode for this looks like:
for x in [0, width) {
for y in [0, height) {
let p = CentreOfPixel( x, y )
let ray = Normalise( p - o )
}
}
pixels[ x, y ] = Irradiance( o, ray )
The equation for finding the world coordinates of a pixel given its x and y
coordinates looks like:
x
y
, ty =
width
height
= ftl (1 − tx ) + ftr tx + ty (fbr − ftr )
let tx =
px,y
After we have spawned the rays, we need to intersect them against objects in
the scene. The implementation of Irradiance would look like:
10
Irradiance( start, dir ) {
// hit is an object containing information
// about the intersection
let hit = Intersect( start, dir )
if hit == null {
return black
}
let ret = black
for l in lights {
let shadow_ray = Normalise( l.pos
let offset = l.pos + shadow_ray *
if DoesIntersect( offset, l.pos ret += l.colour * DotProduct(
}
}
- hit.pos )
EPSILON
hit.pos ) {
shadow_ray, hit.normal )
if object hit is reflective {
let r = Reflect( dir, hit.normal )
ret += Irradiance( intersection.pos + r * EPSILON, r )
}
// also spawn a secondary ray for refractive materials
}
return ret
There are a few parts of the above code that I haven’t yet explained.
The first is the EPSILON term when casting secondary rays. This is required
because of floating point imprecision. The intersection point of a ray and an
object might lie on the opposite side of the object’s face. When the outgoing
shadow or reflection ray is fired from this point, it immediately intersects with
the same object which is not what we want. EPSILON can be bounded using
formal analysis of floating point arithmetic (Matt Pharr 2010, 112) if you are
implementing a serious renderer, but for our intentions it’s fine to just define it
as a small value and it won’t produce any noticeable effect on the resulting
image.
The second thing that needs explaining is the scaling by DotProduct(...)
when adding the light source’s contribution. This is a result of Lambert’s cosine
law, which says the intensity of light reflected off a diffuse surface is proportional
to the cosine of the angle between the incident light and the surface normal. I
have given a diagram (Figure 2.2) which should make it clear what is happening.
As an aside, this is also one of the reasons why the Earth is hotter at the
equator than the poles. It’s fitting that Lambert was also an astronomer.
11
dA
dA
Figure 2.2: Lambert’s cosine law. Light incident at a shallow angle is spread
over a larger area than light arriving perpendicular to the surface and therefore
the irradiance is reduced.
Next, the implementation of Reflect. This is a standard result in graphics so I
won’t be including a full derivation, instead I will just give the equation.
Suppose we are given a surface normal n and the direction an incoming ray is
coming from4 , v. We then compute the reflected ray direction r as follows:
r = −v + 2 (n · v) n
Lastly, the implementation of Intersect. A naive implementation iterates over
every object in the scene and intersects the ray with each of them, keeping track
of the nearest intersection. This can be improved by introducing an object
hierarchy of some kind, for example a BSP5 /k-d tree6 , or bounding volume
hierarchy7 .
The implementation of DoesIntersect takes the start point of the ray and a
non-normalised ray direction and returns a boolean indicating whether there
was an intersection along the length of the ray. It can also call separate
predicate intersection tests on primitives, as these can often be made faster than
methods that actually need to compute the intersection point.
This requires us to have intersection routines for each kind of primitive we
expect to see in the scene, which is what I will cover, amongst other things, in
this next section.
4 Note this is the negative of the direction of the ray. This is simply convention and makes
certain calculations a little more intuitive.
5 http://en.wikipedia.org/wiki/Binary_space_partitioning
6 http://en.wikipedia.org/wiki/K-d_tree
7 http://en.wikipedia.org/wiki/Bounding_volume_hierarchy
12
2.3
Shapes
For the sake of simplicity, I only implemented planes and spheres. With proper
abstraction, it should be easy to add more shapes to the renderer.
2.3.1
Planes
We will parameterise an infinite plane by its normal and perpendicular distance
from the origin. Or more formally, a point X lies on the plane with normal n
and perpendicular distance d when:
X·n=d
Also see Figure 2.3 for a digram.
n
d
0
Figure 2.3: A plane and its parameters.
2.3.1.1
Intersection
Given a ray parameterised by R(t) = p + td (recall the definition from Section
1.2), we can compute the value of t at the intersection point between it and a
plane with the following:
(p + td) · n = d
p · n + td · n = d
d−p·n
t=
d·n
And check whether t ≥ 0. We also need to handle the case where the ray is
parallel to the plane and doesn’t lie along it, i.e. d · n = 0 and p · n 6= d.
13
Interestingly enough, we can choose to handle the degenerate case by not
handling it, in that a division by zero produces +∞. Note that the possibility of
it producing −∞ is ruled out by the t ≥ 0 test. This works because the
Intersect routine only keeps track of the nearest intersection.
2.3.1.2
Surface normal
The surface normal at any point on a plane is trivially n.
2.3.1.3
UV coordinates
Any two dimensional surface can have every position on it represented by a pair
of coordinates. These are conventionally called u and v, hence UV coordinates.
For a plane, the logical UV parameterisation is to introduce orthogonal u and v
axes in the plane, like x and y coordinates in 2D cartesian space. Therefore, to
compute the UV coordinates we should transform the plane so it lies parallel to
the XY plane. Then we take the x and y coordinates of our point as u and v
respectively.
One way to think about this is to picture the transformation as rotating the
plane so its normal becomes k8 . Therefore, we need a method to construct a
transformation matrix that represents the rotation required to line up a pair of
direction vectors, i.e. given a pair of unit vectors n and n0 , we should construct
a matrix M that satisfies M n = n0 . I will also denote the angle between them
as θ, although this is never explicitly computed.
The steps involved in finding this matrix are:
1. Take the cross product, n × n0 and normalise to obtain the axis of rotation,
a.
2. Compute n · n0 = cos θ.
3. Construct the matrix corresponding to a rotation of θ about a.
I have omitted the maths involved in building a rotation matrix for an arbitrary
axis as they are a standard result in computer graphics. Nevertheless, observe
that it only involves sin θ and cos θ and never just θ, so we can use the identity
sin2 θ + cos2 θ = 1 as an optimisation.
The first step degenerates when the supplied vectors are parallel. We should
explicitly handle two cases:
• n = n0 : Return the identity matrix.
• n = −n0 : Return the negative identity matrix.
8 i, j and k form the basis of the 3D cartesian coordinate system, being unit vectors in the
direction of the x, y and z axes respectively.
14
2.3.2
Spheres
A sphere is parameterised by its centre c and its radius r. Again more formally,
a point X lies on the surface of the sphere if it satisfies:
|X − c| = r
2.3.2.1
Intersection
When intersecting with solid objects such as spheres, we treat the inside as
hollow, i.e. if a ray starts inside the object, we still consider the intersection
point to lie on the surface and not where t = 0. This makes it possible to
implement specular transmission9 .
Following the derivation from (Christer Ericson 2004, 177):
If we substitute X for the ray equation and square both sides of the above and
use the identity |u|2 = u · u and let m = p − c, we can derive a quadratic
equation in t:
(m + t · d) · (m + t · d) = r2
(d · d) t2 + 2 (m · d) t + (m · m) = r2
t2 + 2 (m · d) t + (m · m) − r2 = 0
There are two cases we need to handle with this:
Let d be the discriminant d = b2 − c, where b = 2 (m · d) and c = (m · m) − r2 .
• d < 0: The quadratic has no real solutions, i.e. the ray missed the sphere.
√
• d ≥ 0: The roots of the equation are given by t = −b ± b2 − c10 . We
want the smaller t that is still positive. If both roots are negative, we have
a situation where the ray started outside the sphere and was pointing away
from it.
9 The
10 As
refraction of light through non-opaque a surface.
the quadratic is of the form x2 + 2bx + c = 0, we can simplify the quadratic equation a
little.
15
2.3.2.2
Surface normal
If we make the assumption that the supplied point, X, lies on the surface, we
can skip the sqrt involved in computing the length of the vector by instead
taking the length to be r.
n=
2.3.2.3
X−c
r
UV coordinates
A possible UV mapping for a sphere would be to convert cartesian coordinates
to spherical coordinates11 denoted by the tuple (r, θ, φ), but without r as it’s
unnecessary and with θ and φ scaled to lie in the interval [0, 1]. Again this is a
standard transformation so I won’t be including it here.
2.4
Anti-aliasing
If we implemented a ray tracer as described, we would end up with hard edges
between objects, shadows, reflections of objects and shadows, and so on. This is
ugly. In practice we apply a technique called anti-aliasing to reduce the artifacts
around edges.
Figure 2.4: A before and after image showing the effect of anti-aliasing in my
path tracing renderer (Chapter 5).
Anti-aliasing can be performed in many ways, with techniques such as fast
approximate anti-aliasing 12 (FXAA) and multisample anti-aliasing 13 (MSAA)
being the common approaches to anti-aliasing rasterised graphics.
With a physically based renderer, we can get more accurate anti-aliasing by
applying small random adjustments to the ray when we cast it from the camera
and into the scene. We repeat this process as many times as necessary and take
the mean value across all the samples. Again this is not the only way of
reducing aliasing, and is probably not even the best as discussed in Section 6.2.4.
11 http://en.wikipedia.org/wiki/Spherical_coordinate_system
12 http://en.wikipedia.org/wiki/Fast_approximate_anti-aliasing
13 http://en.wikipedia.org/wiki/Multisample_anti-aliasing
16
Chapter 3
Ray Tracing in CUDA
In this chapter, I will briefly explain the architecture behind CUDA and talk a
bit about the implementation of the ray tracer I wrote in CUDA. I will finish by
discussing some of the limitations I encountered when writing my
implementation, which is my justification for switching to CPU code for the
remainder of my project.
3.1
A brief overview of CUDA
CUDA is a massively parallel architecture that allows a subset of C/C++ code
to be run on a GPU. Since modern GPUs have many hundreds of processors,
they lend themselves well to executing algorithms which benefit from
parallelism without requiring much, or any, data to be shared between them.
In ray tracing, rays fired from the camera don’t rely on information from any
other rays which are also being cast into the scene. This gives us an obvious
and very clean separation of work between threads, in which we assign each
thread to have complete ownership of one or more pixels. This model can scale
to over a million processors depending on the output image’s resolution.
CUDA also supports enough language features to make implementing a ray
tracer possible. To be specific, the only feature we really need is shared memory
arrays that all the threads can write colour data to, which is provided by
cudaMalloc. Having recursion – available since CUDA 3.1 – also makes
implementing the algorithms a little easier, but isn’t required as the algorithms
can be reworked to use tail recursive, and therefore non-recursive, calls without
much effort.
NVIDIA also provide a library for generating pseudo-random numbers in device
code called cuRAND1 . Pseudo-random numbers are necessary for sampling
1 https://developer.nvidia.com/curand
17
methods described in Chapter 4, and not having to implement our own
pseudo-random number source is a plus.
3.2
Implementing a ray tracer in CUDA
The first problem to overcome was how to deliver work to each thread. My
original idea was to have a pair of queues for each thread, with threads reading
work from one and adding work for the next frame to the other. This model
would have potentially come in handy when I moved onto writing a path tracing
renderer as it would allow me to spawn many secondary rays from each
intersection. Sadly, CUDA doesn’t allow dynamic allocations to be made in
GPU code which raised questions like “what should the maximum length of
each queue be?”. This would most likely have required a lot of trial and error to
answer optimally.
While searching for a replacement model for delivering work, I realised there is
no need to queue arbitrary amounts of work and we only need to spawn one
secondary ray for each intersection. Some examples of an algorithm like this can
be found in smallpt2 and (Dietger van Antwerpen 2010).
I started by using Sean Barret’s stb_image_write.h3 for writing images, which
required me to copy pixel data from the GPU. Moving data across the main
memory/GPU divide is expensive, but as it only happened once it didn’t really
matter.
I wanted the renderer to be making samples continuously as long as it was
executing, constantly displaying its progress at regular intervals. I felt it would
be more interesting and would help me diagnose errors in my code. In theory
CUDA makes this easy to do by allowing you to write to a framebuffer object
(FBO) and then treating that as a texture to be painted on a quad across the
entire viewport with a single call. CUDA doesn’t allow you to work with an
FBO while it is bound in the graphics context, which means I would need to
write some kind of synchronisation code to avoid race conditions.
I wanted to avoid assigning too many pixels that are difficult to render to the
same thread. Depending on the exact implementation, this could have either
slowed the renderer down with many idle threads as it waited for the last busy
thread to complete its work. It could also have led to a situation where some of
the threads were lagging behind the others, which may have lead to artifacts
being present in the output image for longer than necessary.
I thought the most common cause of overworking a thread would be if I
assigned pixels to threads in stripes or blocks. If only a small, localised part of
the scene was difficult to render, this would have caused a small number of
threads to have the majority of the work.
2 http://www.kevinbeason.com/smallpt/
– specifically the forward.cpp modification.
3 http://nothings.org/stb/stb_image_write.h
18
To get around this, I decided to assign every Nth pixel to each thread. See
Figure 3.1 for an example of this mapping scheme with 25 threads.
Figure 3.1: Each pixel has been coloured according to which thread it was
assigned to. There are 25 separate threads.
The CUDA code for initialising the ray tracer and spawning the kernel threads
looks like:
sphere scene[] = { /* ... */ };
size_t spheres = sizeof( scene ) / sizeof( scene[ 0 ] );
queue* d_work;
sphere* d_scene;
job** d_work;
cudaMalloc( ( void** ) &d_work, sizeof( job* ) * N );
cudaMalloc( ( void** ) &d_scene, sizeof( scene ) );
cudaMemcpy( d_scene, &scene, sizeof( scene ),
cudaMemcpyHostToDevice );
// initialise N work queues on the GPU
init_work<<< 1, 1 >>>( d_work, N );
// wait for init_work to terminate
cudaDeviceSynchronize();
// (omitted) initialise FBO
// set d_image to point to the FBO data
// spawn N worker threads
worker<<< 1, N >>>( d_work, d_scene, spheres, d_image );
cudaDeviceSynchronize();
// (omitted) destroy FBO
cudaFree( d_work );
cudaFree( d_scene );
3.3
Limitations of CUDA
In the end, I decided to stop using CUDA because I felt like I was wasting a lot
of time Googling to resolve API struggles in order to perform what I felt were
basic tasks.
19
Since I am more familiar with C than C++, and I was unsure how much of the
C++ object system CUDA actually supports, I wrote my CUDA code in more
of a C style. This led to code like:
__device__ vec vec_add( const vec u, const vec v ) {
return ( vec ) {
u.x + v.x,
u.y + v.y,
u.z + v.z,
};
}
This is not ideal as operator overloading leads to neater code, but it wasn’t too
much of a problem in practice.
Another problem that I aware of is that CUDA code is very hard to debug. It
requires some non-obvious switches4 on the compiler command line to enable
usage of printf, which made things a bit easier but figuring this out felt like
time that could be better spent.
The CUDA compiler also requires another command line switch5 before it will
allow you to supply more than one source file containing kernel code, which I
found confusing.
Another problem that stalled progress for a few days near the start of my
project was that my code would fail and give no indication of why. After some
searching and generous sprinkling of debugging code, I managed to coerce an
“Out of memory” error out of CUDA, which was only being thrown when I ran
my (non-allocating) kernel above a certain number of threads. After a few days
of not very enthusiastic debugging, I narrowed the problem down to the
following:
__global__ void worker( Queue* const queues, ... ) {
Queue q = queues[ threadIdx.x ];
}
The C specification says that struct/fixed size array assignments are performed
by creating a new copy of the data. In my code, queues was a very large array.
I had incorrectly assumed when writing this that the compiler would have
skipped this copy since I was only ever reading from that queue. Instead this
was causing my GPU to run out of memory when I ran it.
The fix was to replace the second line with Queue& q = queues[ threadIdx.x
];.
4 -gencode
arch=compute_30,code=\"sm_30,compute_30\"
5 -rdc=true
20
Careful readers will have noticed that I said “in theory” before saying CUDA
makes it easy to write to FBOs to use as textures. I have provided a screenshot
of the output of the final version of my CUDA renderer (Figure 3.1) to
elaborate on what I mean.
Figure 3.2: The straw that broke the camel’s back. My attempt at rendering
from FBOs in the CUDA implementation was unsuccessful.
21
Chapter 4
Path Tracing in Depth
In this chapter, I formalise the concept of rendering by introducing the
rendering equation. Next, I will introduce Monte Carlo integration as a method
for numerically solving integrals we can’t integrate directly. Finally, I shall
introduce Monte Carlo path tracing as one possible method for finding a
numerical solution to the rendering equation.
4.1
The rendering equation
The rendering equation was introduced as a way of mathematically expressing
irradiance at a point in space. It is based on the idea of the conservation of
energy in that light reflected off a point is equal to the light incident at a point,
minus some that is absorbed by the material.
Lo (x, ωo ) = Le (x, ωo ) +
Z
Li (x, ωo )f (x, ωo , ωi ) |cos θ| dωi
S2
Where the meaning of the terms is as follows:
x: The point we are considering.
ωo : The direction of outgoing light we are considering from x.
Lo (x, ωo ): The total light output at x in the direction of ωo .
L
R e (x, ωo ): The emittance at x in the direction of ωo .
. . . dωi : The integral of all directions (ωi ) over the 3D unit sphere.
S2
f (x, ωo , ωi ): The evaluation of the BRDF/BTDF at x for the given outgoing and
incident directions. These terms are used to model the physical response
of light to a surface and are explained in the following section.
22
|cos θ|: Recall Lambert’s cosine law from Section 2.2 and Figure 2.2. θ is the
angle between the surface normal and ωi . We take the absolute value of
the cosine so light arriving at the back of the suface also reflects/transmits
correctly.
4.1.1
BRDFs and BTDFs
The bidirectional reflectance distribution function (BRDF) of a material is a
function that describes how it reflects incoming light. The bidrectional
transmission distribution function (BTDF) describes how a surface transmits1
incoming light. They both behave similarly and when referring to a function
that could be one or the other, I am going to say BxDF.
These functions can be approximations to real materials, or they can be derived
from real world measurements. Some examples of the latter kind would be the
measured BRDFs from Cornell University’s graphics lab2 , or from the
Mitsubishi Electric Research Laboratories BRDF database3 .
It is worth mentioning that physically based BxDFs satisfy two properties:
• Reciprocity: if you swap the incident and outgoing light directions the
result is unchanged, i.e. for any BxDF f , f (x, ωi , ωo ) = f (x, ωi , ωo ) for all
ωi and ωo .
• Conservation of energy: the total energy of light reflected or transmitted
is less or equal to the total energy of incident light, or where θ is the angle
between ωi and ωo :
Z
f (x, ωi , ωo ) cos θ dωi ≤ 1
S2
While perfectly diffuse and perfectly specular materials do not exist in the real
world, they are a good base to start from. Perfectly diffuse materials reflect all
incoming light evenly across the hemisphere aligned with the surface normal.
Perfectly specular materials reflect light according to perfect specular reflection
(recall Reflect from section 2.2). I shall provide implementations of BRDFs for
these materials later.
4.2
Monte Carlo integration
Monte Carlo integration is a numerical method for evaluating integrals that
would otherwise be difficult or impossible to solve exactly.
1 Recall
that specular transmission means refraction.
2 http://www.graphics.cornell.edu/online/measurements/reflectance/
3 http://www.merl.com/brdf/
23
Rb
Suppose we want to evaluate the integral a f (x)dx. I shall base the derivation
of the Monte Carlo estimator from (Matt Pharr 2010, 642). Start by assuming
we are able to draw samples Xi from a distribution with probability distribution
function (PDF) p(x). The possible values of the distribution should be bounded
and the PDF should be non-zero within those bounds, i.e. for some a and b:
x 6∈ [a, b] =⇒ p(x) = 0
x ∈ [a, b] =⇒ p(x) > 0
Next, denote the Monte Carlo estimator given N samples as:
FN =
N
1 X f (Xi )
N i=1 p(Xi )
We can show that the expected value of the estimator, E [FN ] is equal to the
integral. The expected value of a function is defined as:
E [FN ] =
Z
f (x)p(x) dx
Next we show, using some standard results from probability, that the expected
value of the estimator is in fact equal to the integral.
"
#
N
1 X f (Xi )
E [FN ] = E
N i=1 p(Xi )
N
f (Xi )
1 X
=
E
N i=1
p(Xi )
N Z
1 X b f (x)
=
p(x) dx
N i=1 a p(x)
N Z
1 X b
=
f (x) dx
N i=1 a
Z b
f (x) dx
=
a
Although I am not going to√show it here, the error in the Monte Carlo estimator
decreases at a rate of O( n). For a proof of this, see (Eric Veach 1997, 39).
This means to decrease the error to one half of what it was, we need to take
four times as many samples, and so on.
24
4.3
Monte Carlo path tracing
I have now covered everything required to outline an algorithm for Monte Carlo
path tracing. We will reuse the code for initialising the camera and focal plane
as described in chapter 2. We do, however, need to modify the Irradiance
function and the code for spawning initial rays. The code for the former now
looks like:
Irradiance( start, dir, depth ) {
// don't loop forever
if depth > 5 {
return black
}
// hit is an object containing information
// about the intersection
let hit = Intersect( start, dir )
if hit == null {
return black
}
// specular reflection is now handled by the BRDF
let ( outgoing, reflectance, pdf ) =
SampleBxDF( is.bxdf, is.normal, -dir )
let emittance = black
if is.light != null {
emittance = Emittance( is.light, is.normal, -dir )
}
}
// no need for explicit light sampling
return emittance +
Irradiance( is.pos + outgoing * EPSILON, outgoing )
* reflectance
* normal.dot( outgoing )
/ pdf
The most interesting code lies in SampleBxDF. This function takes the surface
BxDF at the intersection point, the surface normal and the direction of outgoing
light – towards the point we are computing the irradiance for. It then returns a
tuple, an outgoing ray to consider next, the evaluation of the BxDF with that
ray, and the probability density for choosing that outgoing ray.
Observe that the result of Irradiance looks similar to the Monte Carlo
estimator and the rendering equation.
25
Also note that Irradiance can be transformed into tail recursive code, and
therefore into iterative code. This makes it easily translatable to GPU code.
We require a maximum recursion depth and a corresponding depth parameter,
otherwise the Irradiance function would loop forever. Using a hard depth
limit like this does actually introduce error to renderer and I discuss how to
mitigate it in section 6.2.1.
I also mentioned that we need to change the code that spawns the initial rays
from the camera. We now need to keep track of the sum of evaluations of the
Monte Carlo estimator, and the number of samples made for each pixel. In
pseudocode:
let N = 0
while true {
N += 1
for x in [0, width) {
for y in [0, height) {
let p = CentreOfPixel( x, y )
let ray = Normalise( p - o )
}
}
}
totals[ x, y ] += Irradiance( o, ray )
pixels[ x, y ] = totals[ x, y ] / N
4.3.1
Reflectance models
Here I will present the implementation of BRDFs for perfectly diffuse, or
Lambertian4 , material and perfectly specular material.
4.3.1.1
Lambertian reflection
The Lambertian reflectance model says that all incident light is scattered evenly.
Another way of putting this, is that the BRDF for a Lambertian material is
constant.
The naive implementation would be to return a uniform random direction from
the unit hemisphere5 around the surface normal. If, however, we recall the
Lambert’s cosine law and scaling by a cosine term in the rendering equation, we
can do better.
4 http://en.wikipedia.org/wiki/Lambertian_reflectance
5 The
set of directions {(x, y, z) | x2 + y 2 + z 2 = 1, z ≥ 0}.
26
Light incident at shallow angles will add very little to the total irradiance. We
can’t choose to ignore these contributions entirely as that would introduce error
into the resulting image. Instead, we can only consider such directions less
often, and the division by the PDF in the Monte Carlo estimator will ensure
that the expected value is unchanged. Specifically, we can sample the unit
hemisphere about the surface normal, weighted by the cosine of the outgoing
ray and the surface normal.
Instead of directly sampling the unit hemisphere, we will use an algorithm
called Malley’s method (Matt Pharr 2010, 668).
First, we uniformly sample the unit disk6 to obtain a point p. This is easiest to
do by sampling the disk in polar coordinates7 , then converting to cartesian.
Given a pair of uniform random samples a and b from the interval [0, 1), we can
compute p by doing:
let r =
√
a, θ = 2πb
p = (r cos θ, r sin θ)
Next, we project p upwards until it reaches the surface of the hemisphere
(Figure 4.1). We call this second point p0 , which can be computed as follows:
p
p0 = px , py , 1 − x2 + y 2
p'
p
Figure 4.1: Malley’s method.
Since the unit hemisphere is aligned with the z axis, we need to transform the
hemisphere and our sample to be aligned with the surface normal. The
construction of the rotation matrix to do this is the same as in Section 2.3.1.3,
so I won’t include the derivation again here.
6 The
set of points {(x, y) | x2 + y 2 ≤ 1}.
7 http://en.wikipedia.org/wiki/Polar_coordinate_system
27
We can take the transformed p0 to be our ωi , so the last step required is to
evaluate the PDF, p(ωi ). I won’t include the derivation here, but recalling again
from section 2.3.1.3 that k is the unit vector in the direction of the z axis, and
denoting the surface normal as n, we can evaluate it as follows:
1
(ωi · n)
π
1
p(ωi ) = (p0 · k)
π
p0
p(ωi ) = z
π
p(ωi ) =
4.3.1.2
Specular reflection
The BRDF for perfect specular reflection is considerably simpler. We take ωi to
be the reflection – see Reflect in section 2.2 – about the surface normal, and
the probability density to be one.
One thing to note is that we don’t want Lambert’s cosine law to apply here, as
we don’t want reflective surfaces to get dimmer if we look at them from shallow
angles. In order to keep the Irradiance function simple, we divide the
reflectance returned by the specular BRDF by cos θ.
28
Chapter 5
Path Tracing in Rust
As mentioned previously, I decided to switch to CPU code for my path tracing
implementation. The language I chose to use for this was Rust1 . In this chapter
I will briefly describe why I think Rust was a good choice, and then discuss
details of my implementation.
From here on I will refer to objects in a scene as entities, to prevent confusion
between them and objects in the OOP sense.
5.1
A brief overview of Rust
Rust is a systems language being developed by Mozilla2 . It is currently still
considered unstable, with backwards incompatible changes happening fairly
often. This isn’t such a big problem as recent versions have been reliable enough
that I don’t feel the need to use nightly builds.
Rust offers pattern matching and a type system that sits between C and Haskell,
which makes it pleasant to program in. It also has a CSP-like3 concurrency
model built in to the language. Having shared memory is still possible, but you
have to sacrifice some of the compiler’s safety guarantees in code that uses it.
It is worth mentioning that these abstractions can also be mapped onto features
available in GPU code.
1 http://www.rust-lang.org/
2 http://www.mozilla.org/
3 Rust
favours channels and message passing over shared memory.
29
5.2
Implementing a path tracer in Rust
I will begin with a table describing the file system layout of my project. I have
made a list of notable files and folders and their purposes in (Table 5.1). The
design is largely inspired by the renderer described in (Matt Pharr 2010).
Table 5.1: An overview of the modular separation of my renderer.
Path
Purpose
lights
Defines Light trait. Provides implementation of area
lights.
Defines Material and BxDF traits. Provides
implementations of matte, specular and checkerboard
materials. Also provides Lambertian and specular
BRDFs.
Implementation of vectors, rotation transformations,
RGB colours, degrees/radians abstractions and generic
interpolation.
Sampling routines for the triangular distribution
(section 5.2.2), unit disk and unit hemisphere.
Defines Shape trait. Provides sphere and plane
implementations.
Defines World trait. Worlds are objects containing a
set of entities and expose methods for intersecting rays
with the scene. Provides implementations of a naive
world and the union of two worlds.
Combines area lights, materials and shapes to
represent real objects in the scene.
Spawns the rendering threads and drawing loop.
Contains the Monte Carlo path tracing
implementation.
materials
maths
maths/sampling
shapes
worlds
entity.rs
main.rs
5.2.1
Materials
In addition to implementing matte and specular materials, I thought it would
be interesting to add a checkerboard material. You can see an example of what
it looks like on the cover of this report.
I wanted the checkerboard to be more flexible than just allowing two different
colours, so I made it possible to use any pair of materials for the checks. This is
what made it possible to have alternating matte and mirrored checks like on the
cover.
The interesting section of code for the checkerboard material looks like:
30
impl Material for Checkerboard {
fn get_bxdf( &self, u : f64, v : f64 ) -> ~BxDF {
// ten checks for every unit distance
let u10 = ( u * 10.0 ).floor() as i64;
let v10 = ( v * 10.0 ).floor() as i64;
// compute UV coordinates within the check
let uc = u * 10.0 - u10 as f64;
let vc = v * 10.0 - v10 as f64;
}
}
// odd and
if ( u10 +
return
}
else {
return
}
even parity use separate materials
v10 ) % 2 == 0 {
self.mat1.get_bxdf( uc, vc );
self.mat2.get_bxdf( uc, vc );
I also thought my implementation of the Lambertian BRDF was interesting and
it shows off Rust’s type inference and support for tuples.
impl BxDF for Lambertian {
fn sample_f( &self, normal : Vec3, _ : Vec3 )
-> ( Vec3, RGB, f64 ) {
let rot = Rotation::between( Vec3::k(), normal );
let sample = hemisphere::sample_cos();
let out = rot.apply( sample );
let pdf = sample.z * Float::frac_1_pi();
}
}
return ( out, self.reflectance, pdf );
Rotation::between gives the rotation matrix that represents the rotation from
one normal vector to another (section 2.3.1.3). sample is a random direction in
the unit hemisphere aligned with the z axis, chosen by Malley’s method (section
4.3.1.1). pdf holds the probability density for the chosen direction.
5.2.2
Sampling
The only thing I haven’t already explained in the sampling directory is the
triangular distribution4 sampling code. This module provides a function that
4 http://en.wikipedia.org/wiki/Triangular_distribution
31
returns a value in the interval (−1, 1) distributed by a symmetric triangular
PDF with a mean of zero.
This module is used to generate random pixel offsets for anti-aliasing and in my
code I have called it tent.
5.2.3
Shapes
Objects that implement the Shape trait are required to provide three methods,
all corresponding to operations defined in Section 2.3. The interface is defined
as:
pub trait Shape {
// return smallest positive t, or None if no intersection
fn intersection( &self, start : Vec3, dir : Vec3 )
-> Option< f64 >;
// return the surface normal at point
fn normal( &self, point : Vec3 ) -> Vec3;
// return the UV coordinates of a point
fn uv( &self, point : Vec3 ) -> ( f64, f64 );
}
Rust does not have a null value. Instead, it uses the Option type to annotate a
value with the fact that it might not exist. This is just like Maybe from Haskell.
The intersection method can therefore either return None to represent no
intersection, or Some( t ) where t corresponds to the t in the ray equation.
5.2.4
Worlds
A World is a representation of a scene, containing information about the entities
and their emissive properties within it. It also exposes a method for generating
intersections with entities in the scene, which returns an object containing
information about the intersection. Specifically, the Intersection object
contains the value of t for the intersection, the entity the ray collided with, and
evaluates the ray equation for us to get the intersection point.
I wrote two implementations of World in my renderer. One called SimpleWorld
which naively intersects rays with every entity in the scene, and UnionWorld
which takes a pair of worlds and intersects rays with both of them.
The latter is more interesting, so here is what the UnionWorld implementation
looks like:
impl World for UnionWorld {
fn intersection< 'a >( &'a self, start : Vec3, dir : Vec3 )
32
-> Option< Intersection< 'a > > {
let oi1 = self.w1.intersection( start, dir );
let oi2 = self.w2.intersection( start, dir );
match ( oi1, oi2 ) {
( Some( i1 ), Some( i2 ) ) => {
if i1.t < i2.t {
return Some( i1 );
}
}
}
}
}
return Some( i2 );
( x, None ) => x,
( None, x ) => x
In Rust, 'a refers to a lifetime variable 5 . The Rust compiler uses lifetime
variables to prove that references which are in scope will never point to recycled
areas of memory. I’m using them here because Intersection contains a
pointer to the entity it intersected with, and Rust requires the annotations so it
can infer that the entity will never be deallocated before all Intersection
objects become unreachable.
5.2.5
Piecing everything together
All of the above comes together in main.rs. I will start by presenting the code
for spawning the rendering threads:
for n in range( 0, THREADS ) {
// (omitted) increment Arcs
spawn( proc() {
// (omitted) pull data out of Arcs
sampler( eye, pixels[ n ], centres[ n ],
world, up_pixel, left_pixel );
} );
}
The above code uses several variables which are defined elsewhere. They are as
follows:
5 http://static.rust-lang.org/doc/master/guide-lifetimes.html
33
eye: The camera’s focal point.
pixels: Maps threads to the pixel indices they should be drawing.
centres: For each pixel in pixels, the corresponding element in centres contains the centre of that pixel.
world: An instance of World.
up_pixel, left_pixel: These are vectors holding the distance between vertically and horizontally adjacent pixels on the focal plane. We use them for
anti-aliasing.
Since I am sharing data between threads, the Rust compiler requires that each
thread increments an atomic reference counter (Arc6 ) at the start, and
decrements it when the thread terminates. This is so Rust knows when it can
free those objects without resorting to a garbage collector.
The purpose of the sampler function is to take irradiance samples for each pixel
it controls and update the output image data accordingly.
fn sampler( eye : Vec3, pixels : &[ uint ], centres : &[ Vec3 ],
world : &World, up_pixel : Vec3, left_pixel : Vec3 ) {
let mut samples = 0;
loop {
samples += 1;
for i in range( 0, pixels.len() ) {
let dx = left_pixel * tent::sample();
let dy = up_pixel * tent::sample();
let ray =
( centres[ i ] + dx + dy - eye ).normalised();
let color = irradiance( world, eye, ray, 0 );
}
}
}
unsafe {
let p = pixels[ i ];
image[ p ] =
( samples, image[ p ].val1() + color );
}
if samples % 10 == 0 {
println!( "{}", samples );
}
6 http://static.rust-lang.org/doc/0.10/sync/struct.Arc.html
34
Recall that tent::sample() is sampling a triangular distribution. It is also
worth mentioning that Rust implicitly creates separate random number
generators for each thread7 , so generating samples is thread-safe and isn’t
slowed down by any synchronisation code.
In the above code, image is a shared array of tuples holding the number of
samples taken for each pixel and the sum of said samples. Rust mandates that
writing to shared variables is placed inside an unsafe block, which relaxes
restrictions the compiler places on your code. Inside this block I also use the
val1() method on pairs, which extracts the first value.
All that remains is the implementation of irradiance:
fn irradiance( world : &World, start : Vec3, dir : Vec3,
depth : uint ) -> RGB {
if depth > 5 {
return RGB::black();
}
let ois = world.intersection( start, dir );
return ois.map_or( RGB::black(), | is | {
let normal = is.other.shape.normal( is.pos );
let ( u, v ) = is.other.shape.uv( is.pos );
let bxdf = is.other.material.get_bxdf( u, v );
let ( outgoing, reflectance, pdf ) =
bxdf.sample_f( normal, -dir );
let emittance = /* (omitted) */ ;
let throughput = abs( normal.dot( outgoing ) ) / pdf;
}
return emittance + (
reflectance * irradiance( world,
is.pos, outgoing, depth + 1 )
).scale( throughput );
} );
In Rust, map_or is a method on Option values that behaves like maybe in
Haskell. It takes a default value and a function, where the default is returned
immediately if the Option value is None. Otherwise, the function is applied to
the value inside the Just and the result is returned.
I omitted the right hand side of emittance because nested map_ors are noisy to
read and it’s unnecessarily complex for a code fragment.
7 http://static.rust-lang.org/doc/0.10/rand/fn.random.html
35
You should still include offsetting outgoing rays by a small epsilon term. In my
code I moved it to be inside the intersection method in World.
36
Chapter 6
Conclusion
In this chapter I will evaluate the performance of my renderer and show the
effects of adjusting various parameters. I will also discuss briefly how I feel this
project is related to my course. Finally, I will talk about the ideas I would like
to have followed through on if I had more time.
6.1
Performance and evaluation
Even though it is easy to predict how my renderer will perform, it is still helpful
to provide empirical data so we can analyse it more formally.
6.1.1
Linearity with number of threads
Since there are no dependencies between threads, we would expect the time to
perform a fixed number of samples per pixel to be inversely proportional to the
number of running threads. As my desktop has four CPU threads, I have made
measurements for one, two, four and eight threads with 50 samples per pixel.
Beyond four threads, we would expect to see performance plateau, and then
drop off as context switching starts to dominate.
37
Table 6.1: Measuring how varying the number of threads affects
the time required to take 50 samples per pixel. We can see the time
taken increases more or less linearly until I run out of hardware
threads.
6.1.2
Threads
Time
Time × threads
1
2
4
8
73.0s
37.4s
20.2s
20.0s
73.0
74.8
80.8
160
Noise reduction with number of samples
When you watch the renderer running, there is a very obvious correlation
between how many samples the renderer takes and the output image quality.
Images with a low number of samples per pixel will have high variance, which
manifests itself as noise. As a demonstration of this, I have taken screenshots of
the renderer after 10, 100 and 500 samples per pixel (Figures 6.1, 6.2 and 6.3
respectively). Additionally, the image on the cover was taken after 25,000
samples per pixel. The signature graininess of path tracing renderers is very
visible in the first two images.
Figure 6.1: A screenshot taken after 10 samples per pixel. We can barely make
out what the scene is supposed to be.
Figure 6.2: A screenshot taken after 100 samples per pixel. We have a much
better idea of what the scene looks like than with 10 samples.
Figure 6.3: A screenshot taken after 500 samples per pixel. The graininess is
still noticeable, but it is clearly an improvement over 100 samples.
6.1.3
Adjusting the recursion limit
If we decrease the maximum recursion depth of my renderer, we can expect to
see a performance increase in terms of number of samples made, however, the
image quality will decline as paths struggle to find a light source. We will also
38
introduce error into specular reflections between objects as a result of paths
being truncated prematurely.
I have saved the output image after 200 samples per pixel and with depth limits
of two, three and five (Figures 6.4, 6.5 and 6.6). I have also measured the time
it took to finish the renders. We would expect the time taken to be proportional
to the depth limit.
Table 6.2: The time is not quite increasingly linearly. I expect it
is slightly better than linear because anti-aliasing makes spawning
the initial rays more expensive.
Depth
Time
Time ÷ depth
2
3
5
46s
59s
85s
23
20
17
Figure 6.4: Taken with a depth limit of five. We can see the spheres reflecting
each other many times.
Figure 6.5: Depth limit three. The image is darker as paths struggle to find a
light source and we have lost some of the reflections.
Figure 6.6: Depth limit two. The image is darker still and the spheres’ reflections
of each other and in the floor are now just black circles. We have also almost
entirely lost the colour bleeding on the back wall.
6.1.4
Tweaking the light parameters
Adjusting the size of the light will affect the convergence rate of the path
tracing algorithm as it will change the likelihood that an individual path
reaches the light source. If we increase the size of the light and decrease its
emittance so its overall power1 does not change, we can expect the image to be
more or less the same but converge more quickly.
1 It’s
surprisingly difficult to find an exact formula for power, but it is proportional to the
surface area of the light.
39
For Figure 6.7, I have rendered a scene where the light’s radius has been
doubled and its emittance has been quartered. This render has 500 samples per
pixel so it is comparable with Figure 6.3.
Figure 6.7: The same number of samples as Figure 6.3, but with a bigger light.
Notice how the image is less grainy, but otherwise very similar.
It can be seen from the above analysis in this section that my renderer performs
as expected when we tweak various parameters, which makes me feel confident
that my implementation is correct.
6.2
Future work
My renderer is neither perfect nor complete. If I had more time to work on it, I
would like to have implemented the following, roughly in order of priority.
6.2.1
Russian Roulette path termination
Recall how I mentioned in section 4.3 that the hard recursion limit was
introducing error into the renderer. We can avoid this error by using a
technique known as Russian roulette. Before recursing, we randomly truncate
paths with probability based on their throughput, and scale the throughput of
surviving paths accordingly. While this does increase the variance of the
estimator, it allows us to stop spending time on paths that don’t contribute
much. Therefore, in theory we can make more samples in a fixed period of time.
6.2.2
Materials
Only having perfectly diffuse and specular surfaces is rather limiting. My
renderer would be much more flexible if it supported more reflection models and
could load images from disk to use as textures. It would also be useful to have
methods for combining surface properties to form reflection models that lie
between what has been explicitly implemented.
6.2.3
Acceleration structures
The naive SimpleWorld implementation doesn’t scale to large scenes that
contain many entities and lights. As mentioned in section 2.2, we can improve
on this by splitting the scene up.
40
I would have liked to implemented a decoder for the Quake 3 BSP2 format.
Quake 3 levels are distributed with a pre-built BSP tree as part of the file
format, and implementing ray intersections against a BSP tree is simple
(Christer Ericson 2004, 376).
6.2.4
Do away with pixels
The current implementation of anti-aliasing requires a pair of sqrts for every
primary ray. Additionally, if a ray is offset to lie at the half-way point between
two pixels, it should be able to contribute to both equally and not just to the
pixel it was cast for.
Both of these problems are solvable by sampling the focal plane continuously
and combining all nearby samples to compute final pixel colours. This would
improve the rate of convergence of my renderer, and would also make it easy to
decouple the reconstruction filter from the ray tracing implementation.
6.2.5
Better integration methods
There has been a lot of research on speeding up the path tracing algorithm
without introducing error. These methods include:
Bidirectional path tracing: In addition to shooting rays from the camera,
we also shoot rays originating from the light sources in the scene. We
then join the vertices of both paths to form many more that definitely
contribute light to the sample. See (Eric Veach 1997, 297).
Metropolis Light Transport3 : This is the application of the MetropolisHastings algorithm4 to path tracing. Roughly, it works by applying small
adjustments to promising paths in the hope that it leads to more paths
with high throughput.
6.3
Final thoughts
I feel like I learned a lot whilst working on my project, in both computer
graphics algorithms and new languages. I believe that my renderer is a good
first step towards writing a more complete renderer. In addition, the design
decisions I made will make it easy to extend and study in the future. I also feel
I was successful in leaving open the option of converting my renderer back to
GPU code.
2 http://www.mralligator.com/q3/
4 http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm
41
Rust was an interesting language to learn because of its non-standard approach
to memory management and powerful type system. I hope to find more projects
that I can use it in.
My project draws material from both the Computer Graphics and Computer
Architecture courses, and builds on the things I learned in those courses. In the
Computer Graphics course, we learned about the ray tracing algorithm and
reflectance models, both of which I have applied and expanded on in my project.
The Computer Architecture course covers GPGPU programming, which sparked
my original idea for this project.
In this report, with the guidance of Dr Joe Pitt-Francis, I have described
algorithms for producing photorealistic images and provided an example
implementation of such a renderer. In addition, I have laid the groundwork for
myself to continue expanding my knowledge in this field.
42
References
Anton Kaplanyan. 2009. “Light Propagation Volumes in CryEngine 3.”
http://www.crytek.com/download/Light_Propagation_Volumes.pdf.
Christer Ericson. 2004. “Real-Time Collision Detection.” Morgan Kaufmann.
Dietger van Antwerpen. 2010. “Unbiased Physically Based Rendering on the
GPU.” http://repository.tudelft.nl/view/ir/uuid%
3A4a5be464-dc52-4bd0-9ede-faefdaff8be6/.
Eric Veach. 1997. “Robust Monte Carlo Methods For Light Transport
Simulation.” http://window.stanford.edu/papers/veach_thesis/thesis.pdf.
Matt Pharr, Greg Humphreys. 2010. “Physically Based Rendering; Second
Edition.” Morgan Kaufmann.
43