What is Parallel Programming

(Published August 2018)

In a typical program, you have a problem being solved one step at a time. The steps are executed sequetially one after the other until the problem's solved or the user is done using it. You typically break this big problem down into smaller, different tasks to make the writing process easier.

For example, software to load a webpage could be broken down into the steps - ask a server for a webpage, wait and recieve the data, draw the webpage.

In Parallel Programming, the program performs a ton of steps (hundredss or thousands) at the same time! The difference is each step must be (nearly) exactly the same. This means parallel programming is not suited for many problems, but it's very powerful for specific problems.

The figure below shows a rough overview of an example parallel program. Some large input data is broken up into many little chunks. Each chunk is run through the exact same parallel program at the same time. After every chunk is finished, the data is reconstructed.

Diagram of a Parallel Program

For example, let's run through a program to blur an image, using a 6 by 6 pixel image. A simple blur algorithm is to examine the 9 pixels around the target pixel and average them out to produce a resulting pixel value.

16 by 16 pixel image 16 by 16 pixel image blurred

In regular programming, blurring a 6x6 image would take 36 cycles:

for (int col = 0; col < width; ++col) {
    for (int row = 0; row < height; ++height) {
        blurPixel(col, row);

In a parallel program, the image's pixels would be grouped up and processed at the same time (parallel). How many groups depends on how many processors we have available to use. If we have 3, we can make 3 groups of 12 pixels. 6 processors means 6 groups of 6 pixels. And optimally 36 processors would allow 36 groups of 1 pixel.

Each group would perform the blur function at the same time, therefore the number of cycles the program will take to run is the number of pixels per group - 12 cycles for 3 processors, 6 cycles for 6 processors, and 1 cycle for 36 processors! This is a huge improvement in efficiency!

Concurrent Programming?

Concurrent programming is very similar - a program has multiple threads that run at roughly the same time. Concurrent programs split the problem into multiple different tasks. Each thread performs one task usually with different input data.

Diagram of a Concurrent Program

Where is Parallel Programming used

Where is it supported?


General Purpose computing on Graphics Processing Units

Programs that run on GPUs are called "Shaders" (because they were originally only used to color and shade images for display).

Parallel programs that don't output display are called Compute Shaders.

Here are some libraries that open up GPUs for compute shaders.


Workflows for Writing a Program

  1. Define the structure of the input data (floats, ints, etc.)
  2. Write the program from the scope of a single step, a single iteration.
  3. Build and launch the program from the CPU using a more typical programming paradigm (C++ and Python are common)

Example (fake) Compute Shader

Returnning to our example of blurring an image, here's the more common code again.

for (int col = 0; col < width; ++col) {
    for (int row = 0; row < height; ++height) {
        blurPixel(col, row);

And here's how it may look in an OpenCL shader.

#version 430
uniform image2D destTex;
uniform image2D inImage;
layout (local_size_x = 6, local_size_y = 6) in;

void main() {
    ivec2 storePos = ivec2(gl_GlobalInvocationID.xy);
    vec3 color = tex2D(inImage, gl_GlobalInvocationID.xy);
    // grab nearby pixels
    // average them to produced a blurred value
    imageStore(destTex, storePos, vec4(color, 0.0));

There's some shader jargon that's not the purpose of this tutorial, the most interesting part is what's in the main() function. First we get which pixel we're currently calculating for, then we grab the input pixel color and blur it, finally we save the blurred value.

Notice that we don't describe any sort of loop or which pixel(s) will be worked on next, that data is handled by the GPUs and we're given the data from a series of functions and variables provided by the API (like gl_GlobalInvocationID).

Potential Uses

Parallel programming is best suited for cases where you have a LARGE amount of input data that can be broken up into equal chunks that all need to be processed the same exact way.


I suggest searching for terms like Vulkan, OpenCL, or CUDA and look for tutorials, APIs, and example shader programs to get started!

© Bitzawolf 2019