Cubemaps are a handy way to draw 3D backgrounds. They provide a way to sample a texture based on a 3D vector.
Imagine a camera in a box. What the camera sees depends only on the direction the camera is looking.
Since cubemaps are made from a cube, there needs to be 6 different textures used to fully describe the cube (one for each face of the cube).
This is all a cubemap is. Cubemaps can get really fancy by creating reflections or refractions, but we'll get into that later.
Cubemaps are sampled using a 3D vector. In OpenGL, the same
texture() function is used:
So we're done, right? We just use the camera's facing direction to sample the cubemap?
What went wrong? Well, we're only sampling one color from the cubemap. The color returned from the texture function only cares about the 3D vector passed into the function. Even though the fragment function iterates over the pixels being drawn to the screen, we're still using the same 3D vector value so the texture function will always return the same color.
What's the solution? To create a unique 3D vector for each pixel being drawn to the screen. The vector starts from the camera's position and travels to the projection screen's pixel coordinate that's being drawn. This vector is called the camera's View Direction.
Many interesting effects in shaders depend on the direction the camera is looking. Effects like reflection and refraction depend on this view direction.
The view direction is usually given to us shader developers by the game engine. For example in Unity's Shader Lab, it's the fourth parameter to a surface shader's custom lighting function. But what if we aren't in a game engine? Maybe we're making a demo, or just want to explore writing shaders without having to wrangle with the overhead of an entire game engine? We'll have to define the camera's facing direction ourselves.
So how do we define a vector that represents the camera's facing direction? We could use literal vector values, but that's not easy to understand. Which way is the camera facing if the vector is (0.34, 0.17, -0.58)? A little right, a tiny bit up, and mostly backwards..? An easier idea is to use rotations. How much left-and-right is the camera looking, and how much up-and-down? 30 degrees to the left and 15 degrees up is much easier to understand.
Converting a rotation into a 3D vector is the following math equation.
Okay. That's how we create a facing direction for the camera. How do we create a unique view-direction vector for each pixel on the screen? Sounds impossible, but it's a very similar to the equation above. Without getting too heavy into the math, the approach is to convert the screen's x-direction into a vector in world-space and convert the screen's y-direction into another vector in world-space. Remember the screen exists as a square inside our fictional world. The screen has its own x and y coordinate system used to define where a pixel is on the screen, but these points on the screen can also be described as a position in this made-up world. The concept is "For some pixel on the screen, where is this pixel located in the 3D world we created?"
The following equation and cross-product will result in the two vectors we need. Cross products are a great way to transform one coordinate system into another.
Now we have all we need to create a view direction for each pixel! The final equation will give us a view-direction for each pixel on the screen.
First step. Let's define the camera's orientation in terms of rotation. A rotation about the y-axis and a rotation about the x-axis, stating that the camera, by default, faces down the positive z-axis.
Assuming no rotation at all, what values should be returned for the camera? (0, 0, 1). Easy.
Now let's say the camera is rotated 90 degrees so it's facing down the positive x-axis instead. What values should be returned for the camera? (1, 0, 0).
Finally, let's reconcile these values. How do we produce both values only using a single rotation value? Well, the y-component will never change since we're handling rotating up-and-down later. The result is a straight-forward (sin(a), 0, cos(a)). Cool.
Now let's rotate up and down. If we don't rotate at all, the result should be (0, 0, 1). If we rotate 90-degrees, the forward-direction is no longer Z, but positive y. We need to reconcile a rotation of 0 becomming (0, 0, 1) and a rotation of 90 becomming (0, 1, 0). This bit of code should do that. (0, sin(b), cos(b)).
Finally, let's combine these two equations up: (sin(a) cos(b), sin(b), cos(a) cos(b)).
Okay, we have the camera's facing direction, now we just need to transform up-and-down positions on the screen into world coordinates. The best idea seems to be defining an i and j unit-vector in the world that points in the same direction as the screen is oriented in the world.
Let's start with the j-hat vector, the up vector. When there's no rotation, this vector should be (0, 1, 0). When the camera is fully rotated up, the vector should be (0, 0, -1). These values come from visualizing the plane actually being rotated in the world. Again, let's reconcile the values. The result will be (cos(b + 90) sin(a), sin(b + 90), cos(b + 90) cos(a)). The up-vector is always just an additional 90-degrees rotation of the up-down rotation amount. So it's an easy copy from the equation above.
The i-hat vector is easily calculated via a cross product. Camera-orientation cross j-hat will produce the i-hat. Done and done.