Learn With Me: Julia - Structs and Binary I/O (#3)
Diagrams.net (formerly draw.io) is a fantastic website and tool that allows you to create rich diagrams. The service is entirely free and diagrams can be saved to your Google Drive, Dropbox, or downloaded to your computer. Additionally, diagrams.net allows you to export your diagrams to various formats such as SVG, JPG, and PNG.
Recently it was pointed out to me that you can actually load an exported PNG diagram back into the tool and edit it again. This got me thinking - how are they doing this? Surely they aren't using image recognition techniques to identify objects in the image?
You may wonder: What does all of this have to do with the title of this post? Let's talk about the PNG format a little.
PNG File Format
PNG stands for Portable Network Graphics. The format has existed in some form since the mid-nineties. Like all binary file formats, it follows a specification. From the specification, we can learn a lot about how the image and its metadata is represented on disk.
File Header
A PNG file starts with an 8-byte signature. This signature tells a decoder that all bytes that follow are to be interpreted based on the PNG spec. In hexadecimal representation the header is: 89 50 4e 47 0d 0a 1a 0a
Chunks
The remainder of the PNG format follows a very simple structure. Data is represented in chunks. Each chunk starts with 4 bytes describing the length of chunk data. Then follow 4 bytes for the chunk type. This again is followed by length bytes of chunk data and finally 4 more bytes for a CRC (cyclic-redundancy check). The CRC can be computed over the chunk type and chunk data.
Length | Chunk type | Chunk data | CRC |
---|---|---|---|
4 bytes | 4 bytes | Length bytes | 4 bytes |
The file specification mentions that while the length is represented using 4 bytes or 32bits the maximum length of chunk data is actually 2^31.
The chunk types are more interesting as there are plenty of them. I won't go into much detail here and instead only cover the relevant bits for this post. I encourage you to go read the specification for yourself to understand the nifty encoding techniques used here.
Since the chunk type is represented by 4 bytes, they can (mostly) be represented using 4 ASCII characters. Chunk types are split into critical and ancillary chunks - a decoder must understand all critical chunks but can safely ignore the ancillary chunks.
The critical chunks are as follows:
IHDR
must be the first chunk in the file. It contains in specific order the image width, height, bit depth, color type, compression method, filter method and interlace method.PLTE
contains information about the color palette usedIDAT
contains the actual image. There can be multipleIDAT
chunks which is what allows PNG to be a streamable format in which the first smaller IDAT chunk allows a pre-render of the full image before all data is received.IEND
marks the end of the file
A selection of ancillary chunks:
tIME
stores the time the image was last changedtEXt
stores key-value metadata. The text is encoded in ISO 8859-1. The key must be between 1 and 79 characters long and is terminated by a null character. The remainder of chunk data is the value.
Exporting a diagram from diagrams.net
Before we get started with writing some Julia code, let's first export a PNG file from diagrams.net.
This is fairly straightforward, just head over to diagrams.net, click together a diagram and hit File > Export and choose PNG. Make sure to keep the "Include a copy of my diagram" checkbox checked.
File IO in Julia
With everything prepared we can start looking into I/O. We're not going to do anything advanced here so we'll just look at the basics.
Interacting with files, regardless of the language, always follows the same pattern:
- Open a file for reading/writing
- Read/write
- Close the file descriptor
Julia is no exception to this. We can use Base.open to open a file. This will give us an IOStream instance which in turn wraps the OS file descriptor. We can either do it in a block, in which case the file will be closed automatically at the end of the block, or we call open/close separately.
open("myfile.txt", "r") do io
# ...
end;
Furthermore, there are multiple ways to read data from a file.
We'll need read
and readbytes!
. They both take an IOStream
(the result of the open call) as the first argument. read
takes a primitive type as a second argument telling it to read a single value of that type from the IO
and return it. I.e. read(io, UInt32)
will read the 4 bytes it takes to represent a UInt32.
readbytes!
requires a vector-like object to be passed as its second argument. It will read as many bytes as the vector can hold as long as there's data to read.
Reading in the PNG file
Let's put what we've just learned together. Here's the plan:
- Open the PNG file
- Check for the file header (remember those 8 bytes mentioned above?)
- Read in PNG chunks by first consuming the length, the type, the data based on the length field and finally the CRC.
We can represent PNG chunks using a struct with named fields for each of the elements. The easiest way to represent a sequence of bytes is using a Vector{UInt8}
. Here's the struct I came up with:
struct PNGChunk
length::UInt32
type::Vector{UInt8}
data::Vector{UInt8}
crc::Vector{UInt8}
end
It's also useful to declare a constant for holding the PNG header:
const PNG_HEADER = [0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]
Let's now open the PNG file and read in the first 8 bytes for the header:
io = open("Diagram.png", "r")
header = Vector{UInt8}(undef, 8)
readbytes!(io, header)
readbytes!
takes an IOStream
handle and a variable that it will try to fill. You can pass it an additional integer to indicate the number of bytes to read but it defaults to the length of the second argument which we've declared as a vector of UInt8
s with 8 elements.
By simply comparing header
with PNG_HEADER
we can determine whether we're dealing with a valid PNG file:
if header ≠ PNG_HEADER
throw(ArgumentError("File is not a PNG"))
end
Assuming our file is valid we can now attempt to read in all the chunks in the file. It's easiest to do this iteratively with a loop and consume the file until we hit EOF. Luckily Julia provides an eof
function that takes an IOStream
and returns whether or not we've reached the end of the file.
while !eof(io)
length = hton(read(io, UInt32))
type = Vector{UInt8}(undef, 4)
readbytes!(io, type)
data = Vector{UInt8}(undef, length)
readbytes!(io, data)
crc = Vector{UInt8}(undef, 4)
readbytes!(io, crc)
push!(chunks, PNGChunk(length, type, data, crc))
end
I'm calling hton
to get the length represented properly. This is because my system (Intel-based MacBook Pro) is a little-endian system (meaning the least significant byte comes first) but PNG represents all data in big-endian requiring us to reorder bytes.
The loop will continue to consume bytes for the chunk type, data, and the CRC and construct a PNGChunk that will then be pushed into a vector.
Note: The above code will work for a valid PNG file. There's no error checking at all so if one of the fields is corrupted or the file ends prematurely this will throw an error and fail.
Displaying chunks
Now that we're done reading the file we should take a look at its contents. For this, we can add a bunch of helper functions.
We essentially want to run something like:
for chunk in chunks
print(chunk)
end
but executing this will result in a lot of gibberish being displayed. To tell Julia how to display a PNGChunk
we need to implement Base.show
for our type. Base.show
takes an IO
object and an instance of a type. You can compare this with __repr__
in Python. An implementation that will display the length and the type of a chunk might look as follows:
function Base.show(io::IO, c::PNGChunk)
println(io, length(c), "\t", type(c))
end
Where in other languages you declare methods on classes, in Julia you simply declare a function that operates on a type. To make the implementation of Base.show
work we need to define length and type:
length(c::PNGChunk) = c.length
type(c::PNGChunk) = String(Char.(c.type))
While we could simply access chunk.length
directly it's common practice to consider struct fields "private" and write functions to access them. This way you get a layer of abstraction and can easily change the layout of structs without breaking code all over the place.
To deconstruct what's going on in the second line let's start by looking at c.type
. We declared the type to be a Vector{UInt8}
and we consumed 4 bytes while reading the PNG file. The first thing we want to do is convert each item in the vector to its ASCII character representation. Julia provides the Char
data type to represent 32-bit characters. Simply calling Char(c.type)
would result in Julia attempting to consume all 4 bytes (32 bit) and won't give us the desired result.
Instead, we can iterate over the items in the vector and convert each item to a Char
. This could be written using a list comprehension like [Char(ch) for ch in c.type]
which is rather lengthy but standard if you're coming from Python. Julia conveniently offers the dot-operator (also called broadcast) which can be applied to any function. By writing Char.(c.type)
we're essentially expressing "apply each element in c.type to the Char function".
Finally, we wanted to obtain the string representation of those characters and by passing a Vector{Char}
to the String function we can cast it into a string.
More tenured Julia developers would probably write all of the above simply as collect(Char, c.type) |> join
, but we're going to ignore this for now.
Ok, back to displaying the chunk. With Base.show
and our two functions out of the way we can loop over the chunks and see what's inside our file:
13 IHDR
970 tEXt
3379 IDAT
0 IEND
So that's cool - we've got three chunks with data. IHDR contains height, width, color depth and some other metadata about the file and IDAT contains the actual image. This leaves tEXt
which could contain anything.
Extracting information from IHDR
Let's see if we can make sense of the data in the IHDR chunk. First we're going to modify our Base.show
implementation to also display the data field when we recognize the chunk type.
function Base.show(io::IO, c::PNGChunk)
println(io, length(c), "\t", type(c) ,"\t", datastr(c))
end
The specification tells us that there are 13 bytes reserved for the IHDR data field and how many bytes are reserved for different properties.
The IHDR chunk must appear FIRST. It contains:
Width: 4 bytes
Height: 4 bytes
Bit depth: 1 byte
Color type: 1 byte
Compression method: 1 byte
Filter method: 1 byte
Interlace method: 1 byte
The multi-byte fields will require endian conversion. Since we have already read in all data we need to reinterpret the data from our Vector{UInt8}
. That's exactly the name of a Julia function that helps with reinterpreting data into another type:
hton(reinterpret(UInt32, c.data[1:4])[1])
This will take the first four bytes of chunk data and reinterpret them into a UInt32. The wrapping hton
will make sure to convert from host byte order to big endian. We can repeat this for the height field and then read all the individual bytes.
function datastr(c::PNGChunk)
if type(c) == "IHDR"
height = hton(reinterpret(UInt32, c.data[1:4])[1])
width = hton(reinterpret(UInt32, c.data[5:8])[1])
depth, ct, cm, fm, im = c.data[9:13]
return "h=$height, w=$width, d=$depth, color type=$ct, compression method=$cm, filter method=$fm, interlace method=$im"
end
""
end
For my diagram I get the following output:
h=121, w=201, d=8, color type=6, compression method=0, filter method=0, interlace method=0
Obtaining the original diagram from tEXt
Finally, let's peek inside the tEXt
chunk. We can first extend our datastr(c::PNGChunk)
function to also have a branch to catch the tEXt
type and simply print the contents of the data field:
mxfile %3Cmxfile%20host%3D%22app.diagrams.net%22%20modified%3D%222021-05-24T09%3A22%3A42.489Z%22%20agent%3D%225.0%20
(Macintosh%3B%20Intel%20Mac%20OS%20X%2010_15_7)....
That's a bunch of gibberish. Consulting the specification tells us that the data field for tEXt
consists of a key and value pair separated by a null-byte. That should be easy to parse:
key, value = split(String(Char.(c.data)), '\0')
But that's only half the equation. It looks like the value part may be URL encoded and so we need to find a way to decode it. I couldn't find this functionality in the standard library and so I ended up installing URLParser.jl which implements unescape
.
(@v1.6) pkg> add URLParser
Putting everything together we can complete our datastr
function by adding tEXt
handling:
elseif type(c) == "tEXt"
key, value = split(String(Char.(c.data)), '\0')
value = unescape(value)
return "$key, $value"
end
And so the final output is:
13 IHDR h=121, w=201, d=8, color type=6, compression method=0, filter method=0, interlace method=0
970 tEXt mxfile, <mxfile host="app.diagrams.net" modified="2021-05-24T09:22:42.489Z" agent="5.0 (Maci
ntosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36" et
ag="MSmUq0enpJxQ3pDGyP_L" version="14.3.0"><diagram id="py8BCTe_me7SkJGnhe6H" name="Page-1">zZRNb4MwDEB/DcdJ
EDbaHruOdYdNm9TDdo2IC5kCRsF89dcvjFCKWKvtUGmXijw7dfxwcPxN2mw1z5MXFKAc5orG8R8cxpZLZn470FqwcHsQayl65I1gJw9g4ZBW
SgHFJJEQFcl8CiPMMohowrjWWE/T9qimVXMewwzsIq7m9F0KSiz1gtUYeAIZJ7b0ki36QMqHZNtJkXCB9QnyQ8ffaETqn9JmA6pzN3jp9z2e
iR4PpiGj32wItquqWN+tgjC+3fPXj0Ks4cbv/6XiqrQN28NSOxgAYYTYJWpKMMaMq3Ck9xrLTEBXxjWrMecZMTfQM/ATiFr7dnlJaFBCqbLR
vmZX6GxvFhVY6gguNDTMCNcx0IU8dnwDZnIBUyDdmn0aFCdZTc/B7QzFx7xRs3mwpv9g3ZtZX6f8ILN4Jn9U23mqE0mwy/m3gdrct580VqAJ
mssi543bDcy102qvqzdc1/pk+IeJTk7mPnCv5IrNXL0ppOLfmfK965kyy/E78R07+dj64Rc=</diagram></mxfile>
3379 IDAT
0 IEND
The secret to how diagrams.net embeds the diagram is solved. It's urlencoded XML embedded into a tEXt
chunk inside the PNG file (now that's a fun sentence to say!).
The full code can be found at https://github.com/halfdan/geekmonkey/tree/main/julia/lwm-03
Summary
In this article, we've covered a lot of different concepts in Julia. If you struggled to keep up - don't worry I'll go over all the concepts mentioned here in more detail in future posts. My approach to learning is often guided by the projects I want to do and so I often jump in at the deep end. As a result, this article introduced concepts rather rapidly without spending too much time on the mechanics.
It's always fascinating when you think about how many things we take for granted in tech without thinking about the underlying mechanics. I was definitely surprised by how easy it was to extract some metadata from a binary format like PNG. I've used PNG files for decades without ever thinking about their inner structure. Clearly, we've only scratched the surface and haven't looked at the IDAT chunk containing all the image information, but we'll get there with time.
You can also follow me on Twitter.