Sunday, January 22, 2017

The Real Slim Shady: A Basic OpenGL Shader

One of the most fun things you can do with a computer is playing games. For the unwittingly, it is kind of mindboggling what technology goes into a game, even the simple ones. One of the most basic things a game code must do, is putting graphics on to the screen. Historically graphics programming has been difficult as it involved talking directly to the hardware and knowing how video hardware works. Modern computing has not made this any easier unfortunately. Video hardware certainly is more capable now and it lets you do amazing things, but simply getting an image onto the screen at all still is pretty hard.

Since I’m currently on a rather old computer, I’m going to stick with OpenGL version 3.3. One advantage of using an older version is that it will run on most systems. Programming for the latest greatest Vulkan certainly is cool, but since we’re not exactly making Quake 9, OpenGL 3.3 is a fair choice for what we want to accomplish. Creating an OpenGL context [with core profile] is platform-specific business. In SDL you would do, just before creating the window:

// use OpenGL core version 3.3
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 3);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 3);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_CORE);

GPU hardware

There is a lot that can be said about how modern graphics hardware works. In short, the GPU is a massively parallel co-processor with hundreds, even thousands of cores. These cores all execute the same code in lock-step, so you can’t use it to run different threads like you would on a regular CPU. It can be used however for parallel computing and data processing. GPUs specialize in two things: 3D math and pushing pixels. Therefore shader programs typically consist of two or more shader modules that do exactly that: a vertex shader calculates 3D transformations, while a fragment shader makes up the color for a fragment. This fragment will result in getting a pixel light up onscreen.

You may also have a geometry shader, that makes up new geometry on the fly. This is great for example for rendering waving grass. OpenGL 4 includes two shaders especially for tessellation; enhancing detail in geometry.

The GPU uses its own instruction set, and therefore has its own C-like programming language: OpenGL Shading Language (GLSL). GLSL dates all the way back to 2004 but do mind that for each version of GLSL there are little differences. Also, GLSL on OpenGLES (for smartphones, tablets, Raspberry Pi and the like) is a little different in the details.

Now, although you can do computation on a GPU, it is first and foremost a data processor. By this I mean that you have your data set, consisting of many 3D vertices, and push it through the GPU to produce an image. So on every frame of the game, we will upload the vertex data into the GPU’s memory, and tell it to render. Models that are static can be cached, naturally, but it is up to you how to handle this.

Vertex Array And Buffer Objects

OpenGL works with objects, so first create a Vertex Array Object (VAO). These objects are not to be confused with “models”; think of a VAO as a descriptor for memory layout, really. The VAO is coupled to a Vertex Buffer Object (VBO), which is a block of memory containing all vertex data: 3D world coordinates, texture coordinates, vertex color. You are free to choose the layout, again, the VBO is just a raw block of memory, and the layout is described by the VAO. Finally, the VAO may contain an Index Buffer Object (IBO) containing index data for drawing using indices.

GLuint vao;
glGenVertexArrayObjects(1, &vao);
glBindVertexArray(vao);

GLuint vbo;
glGenBuffers(1, &vbo);
glBindBuffer(vbo);

Since the VAO only describes a layout, you may well render the entire scene using only a single VAO and a single VBO. This is certainly an efficient way to do it.

For example, I could create a buffer that is laid out like this:

float x,y,z; 3D world coordinates (vec3)
float u,v; texture coodinates (vec2)
float r,g,b,a; color (vec4)

We can put these floats all together in a single buffer:

// here a single triangle strip to make a 2D square
constexpr GLfloat f = 0.5f;
static const GLfloat data[] = {
    // 0:vertex        1:texcoord   2:color
    -f, -f,  0,        0, 0,        1, 1, 1, 1,
     f, -f,  0,        1, 0,        1, 1, 1, 1,
    -f,  f,  0,        0, 1,        1, 1, 1, 1,
     f,  f,  0,        1, 1,        1, 1, 1, 1
};

glBufferData(GL_ARRAY_BUFFER, sizeof(data), data, GL_STATIC_DRAW);

Note that in this example I use per-vertex coloring. You can easily leave colors out, and make it a global variable if you wish to do so.

This is just a very static example, so we pass the hint GL_STATIC_DRAW. For data that changes between frames (like often is the case in games), it’s better to pass GL_DYNAMIC_DRAW.

Now, we were talking about shaders. We want the world coordinates to be fed into the vertex shader, and the texture coordinates and color should go into the fragment shader. You can’t exactly do it like that. Instead, you feed all data into the shader program, where it arrives in the vertex shader. The vertex shader may pass it on to the fragment shader. It works this way because of the graphics pipeline, it goes through different shader stages.

How do we tie vertex data to shader variables? This is done by location. You can either hardcode the location number in the shader with the layout(location) syntax, or you can call glGetAttribLocation(). That location must be programmed into the VAO for it to work. So the VAO describes what is in the memory block, and it also describes where that data should go as input for the shader.

glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 9 * sizeof(GLfloat), (GLvoid*)0);
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 9 * sizeof(GLfloat), (GLvoid*)(3 * sizeof(GLfloat)));
glEnableVertexAttribArray(2);
glVertexAttribPointer(2, 4, GL_FLOAT, GL_FALSE, 9 * sizeof(GLfloat), (GLvoid*)(5 * sizeof(GLfloat)));

Note the numbers 0, 1, 2 as the location in this code. The numbers 3, 2, 4 are the number of GLfloats that attribute consists of. The stride is in this case nine floats, and finally the offset.

Finally, unbind the VAO to “save” it, and bind it again once you use it. OpenGL will now know that this VAO and VBO belong together.

glBindVertexArray(0);

The data in the VBO may be updated as you wish. Bind the VAO and issue a call to glBufferData() to upload the data into the video RAM. Note that such a transfer happens entirely asynchronously without intervention of the CPU. The data can finally be drawn (visualized) by calling glDrawArrays():

// buflen div 9 ... because x,y,z + u,v + rgba
glDrawArrays(GL_TRIANGLES, 0, buflen / 9);

For super-optimal performance, you should minimize the number of drawing calls. The trick is to put as many 3D objects into a single VBO as you can, and draw all of them using a single call to glDrawArrays().

Now we are drawing something but the screen remains black. After all the work we did, this is really only half the story. In order to get things displayed we need to implement the vertex and fragment shaders.

The Vertex Shader

We are finally getting to write shader code.

The purpose of the vertex shader is to calculate the “device” position of a vertex. The output is in normalized device coordinates. For example, you have a point in 3D space in world coordinates. This point is transformed by the camera view. Next, a projection is applied so that the point may be rendered in the right position on screen.

OpenGL 3 deprecated all the nice matrix transformation functions, so now there is a lot of work to be done by ourselves. The transform itself is rather simple (see the code snippet) but you have to setup the matrices yourself. Exactly how to do that is not explained here.

The GLSL source code is put into the C/C++ program as a string constant. You may also choose to load it into a char buffer from a file.

const char *vertex_glsl = "#version 330 core\n  \
                                                \
layout(location = 0) in vec3 in_vertex;         \
layout(location = 1) in vec2 in_texcoord;       \
layout(location = 2) in vec4 in_color;          \
out vec2 tex_coord;                             \
out vec4 color;                                 \
                                                \
uniform mat4 projection = mat4(1.0);            \
uniform mat4 view = mat4(1.0);                  \
uniform mat4 model = mat4(1.0);                 \
                                                \
void main() {                                   \
    gl_Position = projection * view * model * vec4(in_vertex, 1.0);   \
    tex_coord = in_texcoord;                    \
    color = in_color;                           \
}\n                                             \
";

We explicitly request GLSL version 3.30 core. There are many versions of OpenGL and GLSL, and each is a little different. By sticking with an older version, we are safe to say that this code will run fine on older hardware. The version directive ends with an explicit newline character.

Next, there are some in variables, which is input for the shader. The layout(location = #) syntax denotes the memory layout of the vertex buffer object (VBO) that we are using. We pass the location number in the call to glVertexAttribPointer(), which basically ties that memory to the input variable. See also the section above, about vertex array and buffer objects.

An out variable is output of the shader. A vertex shader only works with positions and has little use for colors. Therefore we pass the color, as well as texture coordinates on to the next stage, the fragment shader, by assigning to an out variable.

Next we have some uniform matrices. A uniform is a variable that is a shared constant across all GPU cores. A mat4 matrix can be initialized with a call to glProgramUniformMatrix4fv() (but only after the shader program is in use. Read below).

Note how gl_Position is a built-in output variable. The main() function does not return any value as you would in C.

The Fragment Shader

The fragment shader calculates fragment colors. Think of a fragment as a single pixel of an image. Like the vertex shader, the fragment shader is a small GLSL code.

const char *fragment_glsl = "#version 330 core\n  \
                                                  \
in vec2 tex_coord;                                \
in vec4 color;                                    \
out vec4 result;                                  \
                                                  \
uniform sampler2D tex_unit0;                      \
                                                  \
void main() {                                     \
    result = texture(tex_unit0, tex_coord) * color;  \
}\n                                               \
";

The fragment shader takes texture coordinates and color as input. These are passed in from the previous stage; the vertex shader. The output is an RGBA color value.

A texture unit is used so that we can do texture mapping. Set variable tex_unit0 to 0, meaning the default texture unit. Note the use of the texture() function, and take note of how easy it would be to do multi-texturing.

Strangely enough, the fragment shader has no built-in output variable. Previous versions of GLSL had gl_FragColor, but it is deprecated in GLSL 3.30. Somehow it automagically knows that the first output variable must be the fragment’s color.

The Shader Program

GLSL code is compiled and linked just as regular C code, but unlike regular C code there is no command-line compiler. Instead, we call an OpenGL function that compiles the code for us. The convenience of this is that it will compile and run on any kind of hardware that supports this version of OpenGL, without requiring additional software to be installed.

The shaders can be compiled once you have an OpenGL context. To be clear, the “compiler” must be called from your C/C++ program:

GLuint vertex_id = glCreateShader(GL_VERTEX_SHADER);
glShaderSource(vertex_id, 1, &vertex_glsl, nullptr);
glCompileShader(vertex_id);

Well, that was easy. Do the same thing for the fragment shader; just replace “vertex” with “fragment”. Always check whether the compile succeeded:

GLint status = GL_FALSE;
glGetShaderiv(vertex_id, GL_COMPILE_STATUS, &status);

char buf[256];
GLsizei len = sizeof(buf) - 1;
glGetShaderInfoLog(vertex_id, sizeof(buf), &len, buf);
if (len >= sizeof(buf)) {
    len = sizeof(buf) - 1;
}
buf[len] = 0;

if (status != GL_TRUE) {
    error("failed to compile vertex shader:\n%s", buf);
}

OK, we succeeded in compiling the vertex and fragment shader. This does not make a shader program yet, however. We must link the two shader objects together to form the shader program:

GLuint program_id = glCreateProgram();
glAttachShader(program_id, vertex_id);
glAttachShader(program_id, fragment_id);
glLinkProgram(program_id);

Not too difficult. Always check whether linking succeeded:

GLint status = GL_FALSE;
glGetProgramiv(program_id, GL_LINK_STATUS, &status);

char buf[256];
GLsizei buflen = sizeof(buf) - 1;
glGetProgramInfoLog(program_id, sizeof(buf), &buflen, buf);
if (buflen >= sizeof(buf)) {
    buflen = sizeof(buf) - 1;
}
buf[buflen] = 0;

if (status != GL_TRUE) {
    error("failed to link shader program:\n%s", buf);
}

For C programmers: it is not a problem that the vertex and fragment shader both define function main(). They are separate, but they are both present in the linked shader program.

Note that compiled shader objects can be combined into programs; you may reuse the same vertex shader object and link it against different fragment shaders to create new shader programs.

Now that we have a shader program, we can use it when rendering the scene:

glUseProgram(program_id);

When rendering a scene you may switch between shader programs to do multi-pass rendering for special effects like bloom, neon glow, depth of field. Naturally, the number of program switches while rendering a single frame should be minimized for the sake of efficiency.

Closing words

Programming the GPU with shaders is quite cool. Due to the sheer amount of cores, there is tremendous power at your hands. Conversely, it can be awkward, especially at first. I mean, take the fragment shader for example. How are you going to put color into a picture, when you don’t know exactly where you are in that picture?

One big omission in our fragment shader is lighting. Even though lighting was very simple to do in OpenGL 2, it is hard in OpenGL 3. You are actually expected to implement it by yourself in the fragment shader.

OpenGL is not really for hobbyists. All of this stuff is so complicated I would nearly discourage using shaders at all. You may well be better off by sticking to SDL2 surfaces or a fully fledged game engine like Unity, Unreal Engine or whatever. I’m a freak and I wanted to do shaders anyway. It gives great satisfaction once you succeed, but I have to say it has gotten to the point where it takes a disproportionate amount of effort to get there.