Zig on the Web: 2

Compiling Zig to WASM

TCombinator published on January 04, 2024

7 min, 1384 words

Zig is an actual modern alternative to C. It is a new systems programming language that provides native C inter-op with better type security. Learn how to build Zig applications for the Web and make them future-proof.

Recap

In the previous blog about Zig we covered the basics of the language, what WASM is and what project we will be building throughout this series. The project is a simple multi-channel (2 to be specific) WAV audio visualizer, You give the application a WAV file and it will display the waveforms like:

(The above image represents only a single channel for those confused)

WAV file format

Let's again take a look at the WAV file format.

The data inside a WAV file is represented like this

Source: https://www.codeproject.com/KB/audio-video/Circular_Buffers

A quick look tells us that after the first 44 bytes (Header bits) we have the actual audio data. The size of this data is present inside the Subchunk2Size field of the Header.

A WAV file can exists in multiple variants like with 1 channel or 2 channels, with a sampling rate of 44.1 khz or 48 khz, for the sake of simplicity we will be focusing on the PCM format which is a well known standard for WAV files which means that the AudioFormat section of the header will have the value 1 (given in the standard) and the audio itself will have a 16 bit depth which means that individual samples can at max have a value of 2¹⁶ = 65536 for lowest = 0

Why does 16 bit depth implies max value is 65536 ?

if a data type is represented as a sequence of n bits then the total number of values it can uniquely represent is 2 (0 or 1 for binary) times the number of bits. Remember this is why ASCII encoding which is 7 bits can store upto 128 unique symbols.

or we can split the value into half i.e. lowest = - 32768 and highest = + 32768. This is what most audio systems use as negative values represent downward displacement of particles of the medium, if we were to just use positive values we would love information about this direction, although mathematically it doesn't make a difference but this mimics the physical world.

This is all that I will cover as far as multimedia side of things go, for a more in depth look at WAV go here

Coding the parser

We follow from the file format above that there is order to how the data inside a WAV is stored and a program that analyzes some information which follows a rule is called a Parser.

For our case the parser itself is surprisingly easy, remember at the end of the day it's all just a contiguous sequence of binaries, if we know the format of the WAV file we can just interpret the sequence of binaries. The pseudo code for this parser would look like :

Verify() <- WAV // verify if the file is a valid WAV

readNumberOfSamples() <- WAV // read the total number of samples from the header

readLeftAudio() <- // read the audio data for the left channel

readRightAudio() <- // read the audio data for the right channel

readNumberOfSamples in C would look like :

//C
int readNumberOfSamples(uint_8 * data)
{

int a = data[40];
int b = data[41];
int c = data[42];
int d = data[43];

int sampleSize = a | b << 8 | c << 16 | d << 32 ; 

int numberOfSamples = sampleSize / 4;

return numberOfSamples;
}

The Subchunk2Size part of the header tells us how big the audio data is. If we don't have this number then our program wouldn't know when to stop reading .

Metadata

Another reason why we need Subchunk2Size is to allow us to put metadata in the file, Say if you wanted to add the name of the artist who wrote the song, you cannot put that information inside the header right ? The WAV header standard doesn't have an artist name data field so where do we put that information ? We put this information at the end of the audio data. Subchunk2Size tells us how many bytes are reserved for the actual audio data following which we can just put out metadata.

The total size of the file will be based on : Header + audioData + metaData

The total size of the audio data is split into 4 bytes and so we put them back. Now we get the total size of the audio data and not the number of samples, to get it we divide by 4 but why 4 ? it's because for the PCM format in our case each sample is 4 bytes (2 bytes left + 2 bytes right).

Let's convert the above C code to Zig :

// Zig

export fn returnSamples(data: [*]i8) i32 {
    var val1: i32 = @intCast(data[40]);
    var val2: i32 = @intCast(data[41]);
    var var3: i32 = @intCast(data[42]);
    var var4: i32 = @intCast(data[43]);
    var subchunk2d: i32 = val1 | val2 << 8 | var3 << 16 | var4 << 24;
    var samples: i32 = @divFloor(subchunk2d, 4);
    return samples;
}

The above code in Zig looks really similar to the code of C (well of-course, Zig was built as a better C and so resembles a lot of things).

data: [*]i8 is the pointer to the data
@intCast is a built-in function which converts between integer types. We have an i8 and want to cast it to an i32
For division we won't be using the \ operator as it's ambiguous, you don't know if the developer wanted to truncate or floor the numerical value . This distinction helps write concise code. Here will be using @divFloor (the @ is a naming convention in Zig for built in functions)
The export identifier is necessary when targetting WASM, we will cover this later.

Seems chill so far right ? Let's move on to find the audio samples now. Ready ? 1 2 3

export fn returnLeftSection(data: [*]i32, data_len: usize) [*]i32 {
    var len: usize = 44;
    var pol: i32 = 0;
    var array_val: usize = 0;
    var val1: i32 = @intCast(data[40]);
    var val2: i32 = @intCast(data[41]);
    var var3: i32 = @intCast(data[42]);
    var var4: i32 = @intCast(data[43]);
    var subchunk2d: i32 = val1 | val2 << 8 | var3 << 16 | var4 << 24;
    var samples: i32 = @divFloor(subchunk2d, 4);
    var values = [_]i32{0} ** 65536;
	// for simplicity I decided to use a static array of size 65536, feel free to make it a dynamic array.
    while (len + 4 < samples * 4 and len < data_len) {
        var byte1: i16 = @intCast(data[len]);
        var byte2: i16 = @intCast(data[len + 1]);
        
        leftValue = byte1 | byte2 << 8;
        values[array_val] = leftValue;
        array_val += 1;
        len += 4;
    }
    return @ptrCast(&values);
}

Let's break the code.

We find the number of samples the same way we did above.
Create an array to store the values
Iterate through the audio section's data and extract the left channel's data from each sample. (See how we are multiplying samples by 4 inside the loop, it is because each sample is 4 bytes and we are traversing byte by byte)
- The left channel's data is spread across 2 bytes so we read them and cast them together to get a single left channel sample

The code for Right channel is almost the same except we just increment the pointer by 2 before-hand so that it constructs the Right channel value inside each sample instead of the Left.

...
    len += 2;
    while (len + 4 < samples * 4 and len < data_len) {
        var byte1: i16 = @intCast(data[len]);
        var byte2: i16 = @intCast(data[len + 1]);
...

Wrapping Up

Great Job on making it this far !

In this chapter we dived into the WAV format and how we can build a parser for it in Zig. In the next blog we will be incorporating we built so far to the WASM target and bootstrap it with some HTML and CSS.

Helpful Links:

Entire Code
More on WASM

Zig on the Web: 2

Table of contents

Recap

WAV file format

Why does 16 bit depth implies max value is 65536 ?

Coding the parser

Metadata

Wrapping Up

Helpful Links: