Learn With Me: Julia - Bitwise Operators (#4)

This articles covers bitwise operators in Julia and uses them to implement a CRC32 algorithm to validate PNG data chunks.

In part 3, we learned about structs and binary I/O in Julia. We looked into how the PNG format stores metadata and how it is represented using chunks. Each chunk consisted of four components: length, type, data, and a four-byte CRC.

CRC stands for "Cyclic Redundancy Check," and these checks are commonly used in digital networks and storage devices to detect transmission or read errors in data. Sometimes one or more bits can get flipped during data transmission, and CRCs are one of the tools employed to detect this.

CRC is a so-called error detection algorithm. The simplest algorithm in this class is called the parity bit. The idea with the parity bit is that you count all the 1s in a binary string, and if the number of 1s is even, the parity bit is also even. With this technique, you can detect any odd-number bit flips but are basically out of luck when an even number of bits are flipped.

In this article, we'll look at implementing CRC32 using Julia bitwise operators. The Wikipedia page is a great starting point if you want to learn a bit more before jumping into the code.

Bitwise Operators in Julia

Like all other modern programming languages, Julia comes with bitwise operators. They apply to all primitive integer types. I don't think it's necessary for me to cover all of them here, so I'll focus on the ones we need for this post:

AND denoted in Julia by an ampersand & is true only if both operands are true.

a  b  | a & b
0  0  | 0
0  1  | 0
1  0  | 0
1  1  | 1

Bitshift is an operation that moves a sequence of bits either left or right. In Julia, the operator for this is >> or << depending on whether you want to shift the bits right or left. Shifting left by n bits has the effect of multiplying a binary number by \(2^n\). Likewise shifting right by n bits will do the inverse and divide by \(2^n\). When used, it's important to know that the first operand is always the bits operated on, and the second denotes the number of bits to shift. 2 << 5 means "shift the binary representation of 2 (10) left by 5 bits (100000).

XOR, also called exclusive or is an operator that's true if and only if the operands are different. Most languages use the ^ (caret) to symbolise XOR, Julia uses (which can be entered using LaTeX notation \xor) but also offers the xor function.

a  b  | a ⊻ b
-----------------
0  0  | 0
0  1  | 1
1  0  | 1
1  1  | 1

With the operators covered, let's now looks at implementing the CRC verification algorithm for our PNG chunks.

Bitwise CRC32

The simplest implementation of the CRC algorithm is operating bitwise. CRC is essentially an implementation of polynomial division where our data is the dividend, and the result of the division is the CRC for our data. Don't worry, we don't have to get super mathematical here - in binary polynomial division can easily be done using XOR.

For CRC32 the generator polynomial, or divisor, is $$x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1$$ which can also be written in hex as 0x04C11DB7. In practice this is often reversed since it makes the algorithm easier to implement in software. The resulting constant is 0xEDB88320.

unsigned int crc32(unsigned char *message) {
    int i, j;
    unsigned int byte, crc, mask;
    
    i = 0;
    crc = 0xFFFFFFFF;
    while (message[i] != 0) {
    	byte = message[i];          // Get next byte.
        crc = crc ^ byte;
        for (j = 7; j >= 0; j--) {  // Do eight times.
            mask = -(crc & 1);
            crc = (crc >> 1) ^ (0xEDB88320 & mask);
        }
        i = i + 1;
    }
    return ~crc;
}
Basic bitwise CRC32 algorithm, taken from Hacker's Delight

The above implementation appears in one of my favorite algorithm books written by Henry S. Warren Jr., called Hacker's Delight. It features many other algorithms and techniques, and its most recent edition has a whole section on CRC32.

We can start by converting the C implementation from Hacker's Delight to Julia. Julia strings contain variable-length characters, so it is necessary to ensure they are converted to UInt8 first, since the algorithm can only operate on individual bytes.

function crc32(message::Vector{UInt8})
    crc = 0xFFFFFFFF
    for byte in message
        crc = crc ⊻ byte
        for _ in 1:8
            mask = -UInt32(crc & 1)
            crc = (crc >> 1) ⊻ (0xEDB88320 & mask)
        end
    end
    ~crc
end

We can trivially test this by picking any chunk from a PNG file and passing the type and data fields in as the message. The easiest one I found was the IEND chunk since it doesn't even have any data. This will always have the same CRC of 0xAE426082. Running the following snippet, we can see our code working:

message = "IEND"
crc = crc32(Vector{UInt8}(message))  # Returns a UInt32
print(string(crc, base=16)) # Prints: ae426082

Done! Or are we...?

Bytewise CRC32

For PNG, there's a reference implementation available in C that operates byte-wise instead of bitwise, allowing for a drastic speedup in CRC computation. We can adapt it to work in Julia as well.

The reference implementation uses a pre-built table to speed up performance. This table stores all possible combinations for a single byte resulting in a table of 256 elements. There are other implementations out there that extend this to 2 bytes or 65,536 elements.

function maketable()
    crc_table = Vector{UInt32}(undef, 256)

    for n in 0:255
        c = convert(UInt32, n)
        for _ in 1:8
           if c & 1 == 1
               c = 0xEDB88320 ⊻ (c >> 1)
           else
               c = c >> 1
           end
        end
        crc_table[n+1] = c
    end
    crc_table
end

Note that since Julia arrays are 1-indexed, we have to adapt the algorithm to take care of this.

Since the table is reusable (as long as we use the same generator polynomial), it's advisable to cache it somehow. One way of doing this is using a const variable in the global scope:

const crc_table = maketable()

The main CRC32 algorithm can now use this table to conveniently iterate over the input data byte-by-byte and save many compute cycles. Note that we had to + 1 the lookup here (as compared with the C version) to accommodate Julia's 1-indexed arrays.

function crc32(data::Vector{UInt8}, crc::UInt32)
    c = crc
    for byte in data 
        c = crc_table[((c ⊻ UInt8(byte)) & 0xff) + 1] ⊻ (c >> 8)
    end
    return c
end

With some simple multi-dispatch, we can even add a version that takes Julia strings:

crc32(data::AbstractString, crc::UInt32) = crc32(Vector{UInt8}(data), crc)

Finally, we can now apply this to our PNG chunk. The PNG spec mentions that the CRC is computed over the bytes from the type and data field and not the length field. Since we're still dealing with Vector{UInt8} we can use vcat to concatenate those two fields.

Additionally, the PNG spec demands that we initialize the CRC with 0xFFFFFFFF and XOR the result again afterward. We can implement a crc32 function for our PNGChunk type as follows:

crc32(c::PNGChunk) = crc32(vcat(c.type, c.data), 0xFFFFFFFF) ⊻ 0xFFFFFFFF

We can now validate the crc32 for each chunk against the CRC we read from the file. First, we need to fix the byte order and ensure we're comparing values of the same type. The CRC in our PNGChunk is represented as a Vector{UInt8} with the least significant byte coming first. We can reverse the order using reverse and turn the vector into a single UInt32 by using reinterpret again.

isvalid(c::PNGChunk) = crc32(c) == reinterpret(UInt32, reverse(c.crc))[1]

I've modified our Base.show function from last time to output the data a bit more clearly, and it now looks like this:

function Base.show(io::IO, c::PNGChunk)
    println(io, "Length: ", length(c))
    println(io, "Type: ",  type(c))
    println(io, "Data: ", datastr(c))
    println(io, "CRC: ", crc32(c), "\t", isvalid(c) ? "OK" : "INVALID")
    println(io, "-----")
end

That's it - we have successfully implemented a basic CRC32 algorithm in Julia.

Julia comes with a core package called CRC32c using the same algorithm but uses a different generator polynomial, so we, unfortunately, can't use it. There is, however an excellent third-party library called CRC.jl that you might want to check out.

💡
If you like articles like this one, please consider subscribing to my free newsletter where at least once a week I send out my latest work covering Julia, Python, Machine Learning, and other tech.

You can also follow me on Twitter.