PNG

We're dealing with binary data, so if you have any questions or want a quick refresher, go check out the handbook entry on bitwise and binary data.

What is a PNG?

PNG is a license-free image compression. It's awesome since its data is packaged fairly tight and in a lossless format (no specific data is actually lost during the compression, unlike JPEG images). It consists of these segments of data:

  1. Standard Header
  2. IHDR Chunk
  3. Other Chunks
  4. IEND Chunk

AND, it's byte-data! We're no longer in the realm of text-based files! How exciting!

The Header is the same for all PNG files, and it is used to assert that the file being read is a PNG file. If the file is meant to be a PNG file, but fell to the dark side somehow (got corrupted), and the header is different, then according to PNG specs, any PNG-reader must stop reading and give up! We'll get into reading and determining a correct header later on.

Next up are Chunks. Chunks are segments of information describing something about the image. There are 2 main kinds of Chunks: Critical and Ancillary chunks. Crit chunks contain vital information (as rightly assumed by its name) about the image, and if these chunks cannot be read properly or some other issue arises from them, then the whole image is considered compromised. Ancillary chunks, on the other hand, can be safely ignored without any issue. Again, we'll get into Critical and Ancillary chunks soon.

3 Particularly important Crit chunks are the IHDR, IDAT and the IEND chunk. The IHDR chunk must always be the first chunk in the PNG file, arriving right after the header. It describes the most important details about the image, its: dimensions, channel bit-depth, color-type, compression method, filter method, and interlace method in that order.

The IDAT chunk is the ImageDATa. Its data is the compressed and filtered image data in the format specified by the image's color-type. This is the chunk with all of the sample information.

The IEND chunk simply signifies the end of the PNG image and must be the last chunk!

PNG Chunks

First up is the header.

The header is a set sequence of 8 bytes, each doing something actually important! Here's a step-through of each byte, what its hexadecimal value is, what character it represents (per the standard ASCII table), and why it matters.

	
Hex Number
	
Character
	
Rationale
0x89 [non-ASCII] Non-ASCII: Prevents interpretation as a text doc.
0x50 P
0x4E N
0x47 G
0x0D CR Goes with the next char, making 'CRLF', line-return and line-feed.
0x0A LF Detect DOS-Unix line ending to stop text-readers now.
0x1A Ctrl-Z Really, stop reading this as a text-document DOS (Windows)
0x0A LR SERIOUSLY, THIS IS NOT A TEXT DOCUMENT!

Chunks

Now on to the actually useful part of PNG, Chunks! Gettin' Chunky!

Chunks are comprised of some number of bytes, and in order to know when one chunk ends and another begins we need to know some information about the chunk. The first 8 bytes of a chunk tell us this information, you can think of it as a header for chunks themselves!

  1. 4 bytes: Chunk length in bytes
  2. 4 bytes: Chunk type (its name, must be all ASCII and follow a specific naming convention)
  3. <length> bytes: chunk data. How to interpret these bytes depend on the chunk type.
  4. 4 bytes: CRC expected value.

Chunks also have a particular naming scheme. The name is made up for 4 characters, each 1 byte. If the character is upper or lowercase determines something about the chunk itself. A character is uppercase if the value of the character is below 97 decimal, or below 61 hexadecimal.

Uppercase (below 97) Lowercase (above or equal to 97)
First Critical! Ancillary
Second Public (PNG-Defined chunk) Private (Chunk not defined/recognized by official PNG)
Third -Specially reserved by PNG; should always be upper- ---
Fourth Unsafe to copy Safe to copy

Public vs Private chunks make practically no difference, only that a private chunk would indicate that this is a special chunk in regards to a more specific application.

Safe to copy means that the particular chunk can be written to a modified PNG image, even if the chunk is unrecognized by the program, while unsafe means that this chunk depends on image data, so any changes to crit chunks (which include image data) make it so that this particular chunk may not be copied to the modified form, but it may be rewritten if the program understands the chunk and how to properly rewrite it with the new, modified data.

As mentioned before, there are critical chunks, and ancillary chunks. To further the definition, critical chunks are those defined or accepted by the PNG standard. For ancillary chunks, there are a few that are defined by the specification, but they really can be anything including user-defined chunks. Since they can be ignored, it really doesn't matter!

Critical Chunks

IHDR 0x49484452
Image HeaDeR. Much like the header, this chunk is very standardized and must ALWAYS be the first chunk! Its length is always 13 bytes, with the following byte-data information: Image: width (4 bytes), height (4 bytes), channel bit-depth (1), Color-Type (1), Compression Method (1), Filter Method (1), Interlace Method (1).
PLTE 0x504C5445
PaLeTtE. Contains between 1 to 256 palette entries, each of which are a 3-bytes series in the form of RGB, so that the Red-byte has a value between 0 and 255 which correspond from no Red to Full Red respectively. This chunk will always have a length divisible by 3, since each palette entry must have 3 bands. This chunk must appear for color-type 3, and can for 2 and 6, but not for 0 and 4. For type 3, this chunk MUST precede the first IDAT chunk. Only 1 of these are allowed! The palette effectively defines an array of colors, each composed of 3 bands (RGB), where index 0 refers to the first color defined in this palette chunk. In color types 2 and 6, this chunk is simply a suggestion for displays that cannot do truecolor, which then this chunk lists 255 colors to which the truecolor image can be "quantized". This palette uses 8 bits per sample, despite what the image bit depth specs are.
IDAT 0x49444154
Image DATa. The heart and soul of the image information. This chunk is so complicated for both writing and reading, it gets its own entire section: ADHD Link.
IEND 0x49454E44
Has a length of zero, no data field, and marks the end of the PNG datastream.

Ancillary Chunks

These chunks can be safely ignored if a decoder can't understand them, or if they're corrupt (a false CSC).

bKGD 0x624B4744
BacKGrounD color. Specifies a default BG color to present the image against. Not required to honor this chunk. In indexed-color(type 3), this chunk contains 1 byte of info representing the palette index. In greyscale, with and without alpha (types 0 and 4), this chunk has 2 bytes representing a short defining the level of grey to be used and its value will be between 0 and 2^(bitdepth) - 1. For truecolor, with and without alpha (types 2 and 6), there are 6 bytes, or more appropriately 3 shorts each defining a Red, Green, and Blue band between 0 and 2^(bitdepth) - 1. This chunk must come before IDAT, but after the PLTE if it exists.
cHRM 0x6348524D
A complex method for physically defining color and light to produce a very similar image among any moniter.
tRNS 0x74524E53
Defines a particular color to be defined as transparent whenever it is found in the image. This is to give an image either-or transparency (not translucency, that's what greyscale-alpha and RGBA are for). The chunk must precede IDAT, and come after PLTE.
color type 3 (indexed), this chunk contains a series of one-byte alpha values corresponding to entries in the PLTE chunk. This way, each palette entry is given a transparency channel too. If there are less entries here than in PLTE, then the rest of the alpha values remain 255 (1.0). There cannot be more entries here than in PLTE.
For color type 0 (greyscale) a value stored as a short (2 bytes) is used to represent the value that should be transparent. The value is within the same boundaries as any other greyscale value (0 to 2channelBitDepth - 1). If the bit-depth is less than 16, then the LSB are used (in other words, it doesn't matter, just read it as a short and it will always be fine).
For color type 2 (RGB), tRNS contains an RGB grouping of shorts in the format Red, Green, Blue (each taking up 2 bytes, thus a sum of 6 bytes). Pixels of the same exact color should be handled as transparent (alpha = 0.0), all others have alpha at full.
tRNS is forbidden from types 4 and 6 (greyscale/RGB with alpha) because they already contain alpha.

Image Data

All PNG data comes compressed using a standard algorithm called "DEFLATE." It works by finding common values sequentially and instead of listing all of the values, it says something like "The next 7 values will be '4'," so that way space is saved. Undoing it, or even doing it, is quite complex and java provides a class to do that for us (InflaterInputStream).

Color data can either be greyscale or RGB. Color ranges from 0 == none to most-intense (depends on sample-depth).

PNG is conceptually a very long, 1D array with pixels appearing left-to-right and scanlines appearing top-to-down, unless interlacing is being used so that then the data is in a different (yet predictable) order.

Three types of pixels are supported/defined {with allowed channel bit-depths in braces}:

Indexed {1, 2, 4, 8}
A single pixel is actually an index (unsigned shorts) into the supplied palette. Bit-depth determines max palette entries. It's determined by 2bit-depth, so that a bit-depth of 8 (the maximum allowed bit-depth) will result in a max of 256 palette entries. See the PLTE chunk for info on the palette.
Greyscale {1, 2, 4, 8, 16}
A single pixel from 0 == black to max == white. A bit depth of 1 means each bit is a pixel and so it's either all black or all white.
Greyscale with Alpha {8, 16}
First is the black channel, then the alpha channel for each pixel. So a bit depth of 8 means one byte for the level of white, then a byte for the level of alpha.
Truecolor {8, 16} w/ alpha {8, 16}
3 channels in the order: Red, Green, Blue. The bit-depth specifies the size of each channel, not the actual sample/pixel size. So a bit-depth of 8 means each channel has its own byte, so each pixel/sample actually has a size of 24 (8 * 3). Alpha is tacked on last if it's expected.

Similarly, here are PNG's color-types with number of channels in brackets:

  1. Greyscale [1]
  2. Truecolor (RGB) [3]
  3. Indexed [2] //The actual image data are unsigned-shorts
  4. Greyscale with Alpha [2]
  5. Truecolor with Alpha (RGBA) [4]

Thus, PNG gives the channel bit-depth and each color type has its own set number of channels, so you can determine the number of bits per pixel: pixelBitDepth = channelBitDepth * numOfChannelsInColorType.

Pixels smaller than a byte are packaged into a byte anyway, so there are multiple pixels per byte. This works by still having the pixels go sequentially from left-to-right, but just within the pixel. So for a pixel bit-depth of 4 (meaning greyscale [color-type #0] with a bit-depth of 4) the 0th pixel is in the first 4 bits of a byte, then the 1st pixel in the 2nd 4 bits of a byte.

This is why the bit-depths are restricted, so that packaging is always efficient and there are no wasted spaces, yet everything is clean and happy and in full-bytes.

In the very unfortunate case that the bit-depth is less than a byte and the scanline-size (width in pixels of the picture) is modulus bit-depth != 0 (basically, there will be wasted space within the last byte of the scanline), the rest of the bits in that byte are unspecified and don't matter. This is to force all scanlines to start with a fresh byte, because doing so otherwise would be very messy (remember filter-type bytes precede every scanline).

Finally, a filter-type byte is added to the beginning of each scanline.

Alpha represents transparency (novel!!) where max (dependent on bitdepth, 2bitdepth-1) represents full opacity (solid) and 0 is fully transparent. The bit-depth of the alpha-channel is the same as the bit-depth for the other channel in the data. To have alpha without a segment of alpha data for each pixel, a certain pixel color (or index for indexed data) can be set to be transparent. This is achieved through the tRNS ancillary chunk.


PNG allows something called Adam7 interlacing, which is where pixel data is scattered all over the place in an organized way:

     1 6 4 6 2 6 4 6
     7 7 7 7 7 7 7 7
     5 6 5 6 5 6 5 6
     7 7 7 7 7 7 7 7
     3 6 4 6 3 6 4 6
     7 7 7 7 7 7 7 7
     5 6 5 6 5 6 5 6
     7 7 7 7 7 7 7 7
  

Interlacing consists of 7 distinct passes as shown above, and each pass contains at least 1 scan line. The algorithm can be seen as groupings of 8x8 pixels, so for pictures greater than 8 pixels, the pattern repeats itself starting from a new grid in a way. If the picture uses less than 8 pixels, or is not evenly divisible by 8, then some passes will be empty! If the pass is empty, then there are no scanlines, so there will not be any filter-byte! Expect this!!

For a 16 x 16 image, with coordinates (0, 0) at the top-left, here's how an interlace would run:

         0   1   2   3   4   5   6   7    8   9  10  11  12  13  14  15
     0   1   6   4   6   2   6   4   6 |  1   6   4   6   2   6   4   6
     1   7   7   7   7   7   7   7   7 |  7   7   7   7   7   7   7   7
     2   5   6   5   6   5   6   5   6 |  5   6   5   6   5   6   5   6
     3   7   7   7   7   7   7   7   7 |  7   7   7   7   7   7   7   7
     4   3   6   4   6   3   6   4   6 |  3   6   4   6   3   6   4   6
     5   7   7   7   7   7   7   7   7 |  7   7   7   7   7   7   7   7
     6   5   6   5   6   5   6   5   6 |  5   6   5   6   5   6   5   6
     7   7   7   7   7   7   7   7   7 |  7   7   7   7   7   7   7   7
         ------------------------------+-------------------------------
     8   1   6   4   6   2   6   4   6 |  1   6   4   6   2   6   4   6
     9   7   7   7   7   7   7   7   7 |  7   7   7   7   7   7   7   7
     10  5   6   5   6   5   6   5   6 |  5   6   5   6   5   6   5   6
     11  7   7   7   7   7   7   7   7 |  7   7   7   7   7   7   7   7
     12  3   6   4   6   3   6   4   6 |  3   6   4   6   3   6   4   6
     13  7   7   7   7   7   7   7   7 |  7   7   7   7   7   7   7   7
     14  5   6   5   6   5   6   5   6 |  5   6   5   6   5   6   5   6
     15  7   7   7   7   7   7   7   7 |  7   7   7   7   7   7   7   7
  

Which means the data would be laid out like so (in pixels):

    Pass 1
  filterType, 1, 1, filterType, 1, 1

    Pass 2
  filterType, 2, 2, filterType, 2, 2

    Pass 3
  filterType, 3, 3, 3, 3, filterType, 3, 3, 3, 3

    Pass 4
  filterType, 4, 4, 4, 4, filterType, 4, 4, 4, 4, filterType, 4, 4, 4, 4, filterType, 4, 4, 4, 4

    Pass 5
  etc.
  

So the interlaced data still flows left->right, top->down, but is made up of distinct passes. This allows progressive image-displaying since you can read all the data for pass 1, unfilter and then undo interlacing and then display those pixels.

For an algorithm to unfilter interlaced data (since the data was filtered as though each pass was its own picture):

/* source is the source data already inflated (decompressed)
*  startingWidth is the horizontal pixel number (1-based indexing) the pass starts at. Pass 1 = 1, Pass 2 = 5, Pass 3 = 1, Pass 4 = 3
*  strideWidth is the number of pixels horizontally until the next pixel. Pass 1 = 8, Pass 2 = 8, Pass 3 = 4, Pass 4 = 4
*  startingSL is the vertical pixel number (1-based) the pass starts at. Pass 1 = 1, Pass 2 = 1, Pass 3 = 5, Pass 4 = 1
*  strideSL is the number of pixels vertically until the next pixel in the pass. Pass 1 = 8, Pass 2 = 8, Pass 3 = 8, Pass 4 = 4
*

byte[] undoInterlaceStepFilter(PNGImage image, byteArrayInputStream source, int startingWidth, int strideWidth, int startingSL, int strideSL)
{
    int lengthSL = (image.width + strideWidth - startingWidth) / strideWidth; // Calculate length of scanlines for this pass.
    int numSLs = (image.height + strideSL - startingSL) / strideSL;           // Calculate how many scanlines are in this pass.
    byte[] output = new byte[image.getBytesPerPixel() * lengthSL * numSLs];
    byte[] scanLinePrevious = new byte[lengthSL];
    byte[] scanLineCurrent = new byte[lengthSL];
    for (int i = 0; i < numSLs; ++i)
    {
        byte filterType = (byte) source.read();
        source.read(SLC, (i * lengthSL), lengthSL); // read a full Scan-line's worth into SLC
        unfilter(filterType, scanLinePrevious, scanLineCurrent);
        if (i != 0) // scanLinePrevious is not the beginning array... We don't want to return a scan-line's worth of 0s that's not apart of the picture.
        {
            arraycopy(scanLinePrevious, 0, output, (i * lengthSL), scanLinePrevious.length);
        }
        arraycopy(scanLineCurrent, 0, SLP, 0, scanLineCurrent.length);
    }
    return output;
}

PNG only has 1 filter type, but that type (value 0, defined in IHDR) has 5 different options which can be different for each scanline:

  1. None
  2. Sub
  3. Up
  4. Average
  5. Paeth

It's very important to note that when creating the filtering, you have to go in reverse order (right-left, bottom-up) so that the previous raw data (raw[x - bytesPerPixel]) hasn't been filtered yet! Or you could go forwards and have a copy of the data (waste).

Thus, when undoing filtering you go in the normal direction (left-right, top-down). This is because the data at the top is filtered against nothing (0s), so you can turn that back into raw data, so the next scanline/data can use that now raw data to undo more filtering!

Also, when dealing with data that doesn't use a whole byte or more for a single channel (like color type 0 and channel bit-depth of 2, so bytes per pixel is only 1/4) we round up to 1 byte. This is purely to simplify filtering so that it's done byte-against-byte and not bits against bits (unhappy programming!).

When it comes to doing arithmetic in filtering, you have to apply modulus 256 so that the values are always within 1 byte and so the data is still unsigned. For example:

Signed: 4 - 8 = -4 00000100 - 00001000 = 11111100
Unsigned: 5 - 7 = 254 00000101 - 00000111 = 11111110
Now undoing that (unfiltering sub)
Signed: -4 + 8 = 4 11111100 + 00001000 = 00000100
Unsigned: (254 + 7) % 256 = 5 (11111110 + 00000111) % ... = 00000101
Nifty! Just when creating through filtering, keep it unsigned and save the last byte. Only when you're unfiltering should you apply mod-256 to force the values to stay in that byte and properly "wrap back around."

None
No filtering is done; the data is raw.
Sub
Difference between each byte and the value of the corresponding byte of the prior pixel.
Filtered[i] = Raw[i] - Raw[i - bytesPerPixel]; or to reverse it:
Raw[i] = (Filtered[i] + Raw[i - bytesPerPixel]) % 256;.
Sub is a kind of difficult filter since it needs to access the same scanline as the one you're editing, so you need to be sure to reference the same array (but the correct distance behind) or if you're at the beginning you need to use 0 until you've moved ahead one pixel's worth of bytes/data.
Up
Difference between each byte and the value of the pixel above. For a pixel-Bit-depth of 3, for example, the first byte in the pixel would be used with the first byte of the pixel above, and so on.
Filtered[i] = Raw[i] - RawScanlineAbove[i]; or to reverse it:
Raw[i] = (Filtered[i] + RawScanlineAbove[i]) % 256;.
Average
Uses the average of the above and left pixels.
Filtered[i] = Raw[i] - ((Raw[i - bytesPerPixel] + RawScanlineAbove[i]) / 2);
Raw[i] = (Filtered[i] + ((Raw[i - bytesPerPixel] + RawScanlineAbove[i]) / 2)) % 256;
Paeth
A linear function based on the 3 neighboring pixels (left, up, and upper-left), then chooses the pixel that's closest to the calculated value.
Filtered[i] = Raw[i] - PaethPredictor(Raw[i - bpp], RawSLA[i], RawSLA[i - bpp]);
Raw[i] = (Filtered[i] + PaethPredictor(Raw[i - bpp], RawSLA[i], RawSLA[i - bpp])) % 256;
paethPredictor(byte left, byte up, byte upLeft)
{
  int estimate = left + up - upLeft;
  int diffLeft = Math.abs(estimate - left);
  int diffUp = Math.abs(estimate - up);
  int diffUpLeft = Math.abs(estimate - upLeft);
  
  if (diffLeft <= diffUp && diffLeft <= diffUpLeft)
    return diffLeft;
  else if (diffUp <= diffUpLeft)
    return up;
  else
    return upLeft;
}

That's the meat of a PNG file! Congratulations!

Example PNG File

That was a lot of abstract ideas to take in at once, so let's look at a simple PNG file and its byte-contents.

This is actually an 8 by 8 pixel big image, but it's been extremely scaled in this picture so you can easily see each pixel color. Note the colors with alpha have a black-grey checkered pattern in the background so you know it's transparent. Here's the original 8 by 8 pixel image. Feel free to download it and open it in a hex editor program.

Here's the complete file written in hexadecimal:

Not the easiest thing to read, but let's remember! First up should be set-and-standard header, which should be the numbers: 89 50 4E 47 0D 0A 1A 0A. Indeed it is!

Now we begin the chunks. The first chunk needs to be the IHDR chunk. Remember that any chunk first has 4 bytes indicating its length, 4 more bytes for its name/type, <length> bytes data, and finally 4 more bytes for the CRC value.

If we look at the next 4 bytes after the header, we have 00 00 00 0D, or the length of this chunk: 13 bytes. Next up is its name, 49 48 44 52. In ASCII this turns into IHDR! We now read the next 13 bytes of data and get ready to interpret it: 00 00 00 08 00 00 00 08 08 06 00 00 00.

Since this is the IHDR chunk, let's check what the PNG rules say for us to interpret this as.

First 4 bytes, length:
00 00 00 08, or 8 pixels long
Next 4 bytes, height:
00 00 00 08, or 8 pixels tall
Next byte, channel bit-depth:
08, or 8 bits per channel
Next byte, color type:
06, color type 6, which is Truecolor with Alpha. This uses 4 channels per pixel. Which means we now know each pixel needs 4 bytes.
Next byte, compression method:
00, this is the only acceptable compression method.
Next byte, Filter method:
00, also the only acceptable filter type.
Final byte, Interlace Method:
00, false meaning this image is not interlaced.

The final 4 bytes of this chunk are for the CRC, which is a whole other topic I don't cover.


That was a big first step. Now we can zoom through the rest of this file since most of the chunks are ancillary actually. This png file was made in GIMP and it added a bunch of useful information to the file like a comment, when it was made, background information, and more. Let's explore!

Here's the same file in hexadecimal, but now color-coded for each chunk.

The first green chunk is the header. The Following red chunk is the IHDR chunk we just read. Meaning the first blue chunk is our first "this could be anything" chunk. Let's see what it is.

It's 1 byte long, its name is 0x73524742 meaning sRGB. Taking a look at the PNG naming scheme, we see this is an ancillary chunk because its first character is lowercase, but it is defined by PNG based on its 2nd character being uppercase.

sRGB is a fancy method for standardizing colors amongst multiple monitors and printers, at least back in 1996 with CRT (Cathode Ray Tube) monitors. Anyway, I'll just ignore it.

Next is a green colored 6-byte chunk named 0x624B4744 == bKGD. Another ancillary chunk. This one defines a color to use for the background. Its data depends on the color type, and in our case it uses 6 bytes, 2 each for red, green, and blue. This color should be used for the background for transparent pixels.

Now the red colored 9-byte chunk named 0x70485973 == pHYs. This is useful for matching pixel-size to a real-world size like printable centimeters or inches. Again, ignore.

The 2nd blue colored chunk is 7 bytes named 0x74494D45 == tIME. It details the last time the image data was modified. First 2 bytes is the year 0x07DD = 2013 decimal, then a byte for each the month, day, hour, minute, and second.

The final boring chunk is green and 12 bytes long named 0x69545874 == iTXt. There may be multiple text chunks in a single file, each of these chunk's data starts with a keyword stating what the text is about. Some standardized descriptors are Title, Author, or in this case, Comment.


Now to the real meat of the image! The IDAT chunk. It's a whopping 0x8C or 140 decimal bytes long! This is the only chunk requiring real messing with to get the raw data because the raw image data is filtered, compressed, and then saved. The filtering process helps the deflate compress the file even more.

Knowing that, we now need to decompress the entire data portion of the IDAT chunk. Again, I don't cover the deflate algorithm, so here's the new data:

So the way IDAT's data is filtered is by separate "scanlines". A scanline is one width's worth of pixels. In this case, it's the top row of 8 pixels. So what we need is the first 8 pixels of data, and there are 4 bytes per pixel, which works out to 32 bytes plus one for the filter-type meaning a total of 33 bytes. This works out to: 01 00 00 00 FF 2D 2D 2D 00 32 32 32 00 29 29 29 00 1A 1A 1A 00 1C 1C 1C 00 17 17 17 00 2A 2A 2A 00. Each scanline is preceded by 1 byte containing how the scanline was filtered.

The first scanline was filtered by type '1' meaning "Sub" or Subtraction. This means each pixel was subtracted by the next pixel, so we need to add to get things straight again:

// Pretend Pixel is a class with constructor Red Green Blue Alpha.
pixelFiltered = new Pixel(00, 00, 00, FF);

// Now we add these values by the pixel before this one, but wait!
// There is no pixel before this very first pixel, so we add each value by 0.
pixelRaw = pixelFiltered + new Pixel(0, 0, 0, 0);
  00 + 0 = 0
  00 + 0 = 0
  00 + 0 = 0
  FF + 0 = FF
// We now have our first raw pixel!
// Let's try this with the 2nd pixel:

pixelFiltered = new Pixel(2D, 2D, 2D, 00);

// lastRawPixel is that one we just solved, 00 00 00 FF
pixelRaw = pixelFiltered + lastRawPixel;
  2D + 0 = 2D
  2D + 0 = 2D
  2D + 0 = 2D
  00 + FF = FF;
  
//Wonderful! Let's continue this trend with the 3rd pixel:

pixelFiltered = new Pixel(32 32 32 00);
pixelRaw = pixelFiltered + lastRawPixel;
// lastRawPixel is now 2D 2D 2D FF
  32 + 2D = 5F
  32 + 2D = 5F
  32 + 2D = 5F
  00 + FF = FF;
  
//I'll finish up the rest of this scanline without writing everything out...

We're now left with this completely raw, unfiltered, uncompressed, color data:

00 00 00 FF, 2D 2D 2D FF, 5F 5F 5F FF, 88 88 88 FF, A2 A2 A2 FF, BE BE BE FF, D5 D5 D5 FF, FF FF FF FF

Let's see how this compares to the original image. By this data, the colors for the top scanline should start in black and gradually get whiter until pure-white at the end.

Looks like we got it right! Feels like magic doesn't it?


Give it a try with the next scanline: Filter type is 0x01 also, and the filtered data: FF 00 00 FF 00 00 00 F1 00 00 00 DB 00 00 00 DB 00 00 00 D9 00 00 00 DB 00 00 00 DA 00 00 00 DB

You should end up with: FF 00 00 FF, FF 00 00 F0, FF 00 00 CB, FF 00 00 A6, FF 00 00 7F, FF 00 00 5A, FF 00 00 34, FF 00 00 0F


Congratulations! You now should be able to parse and read PNG image data! After this mega IDAT chunk comes the IEND chunk. It has a length of 0 bytes and is remedial to read, so I'll skip it.

Our PNG example is done here! I didn't show any other filter types despite the 3rd scanline using number-4 or Paeth, but how to undo that is covered in the filter section. It does get more complicated, but not overwhelming certainly. Read the next section for a complete PNG decoder, which does cover all filter types.

Example PNG Decoder

All of the theory and reasoning is covered in the above sections. This is an example PNG reader I used in a game engine I wrote to grab the image data to write directly to the graphics card. Feel free to copy and reuse this as much as you want, or even better: parse through this to learn about how PNG images are decoded.

PNGImage.java
A complete class in Java for reading PNG files.