Zig on the Web: 2
Compiling Zig to WASM
Zig is an actual modern alternative to C. It is a new systems programming language that provides native C inter-op with better type security. Learn how to build Zig applications for the Web and make them future-proof.
Table of contents
Recap
In the previous blog about Zig we covered the basics of the language, what WASM is and what project we will be building throughout this series. The project is a simple multi-channel (2 to be specific) WAV audio visualizer, You give the application a WAV file and it will display the waveforms like:
(The above image represents only a single channel for those confused)
WAV file format
Let's again take a look at the WAV file format.
The data inside a WAV file is represented like this
Source: https://www.codeproject.com/KB/audio-video/Circular_Buffers
A quick look tells us that after the first 44 bytes (Header bits) we have the actual audio data. The size of this data is present inside the Subchunk2Size
field of the Header.
A WAV file can exists in multiple variants like with 1 channel or 2 channels, with a sampling rate of 44.1 khz or 48 khz, for the sake of simplicity we will be focusing on the PCM format which is a well known standard for WAV files which means that the AudioFormat
section of the header will have the value 1 (given in the standard) and the audio itself will have a 16 bit depth which means that individual samples can at max have a value of 216 = 65536 for lowest = 0
Why does 16 bit depth implies max value is 65536 ?
if a data type is represented as a sequence of n bits then the total number of values it can uniquely represent is 2 (0 or 1 for binary) times the number of bits. Remember this is why ASCII encoding which is 7 bits can store upto 128 unique symbols.
or we can split the value into half i.e. lowest = - 32768 and highest = + 32768. This is what most audio systems use as negative values represent downward displacement of particles of the medium, if we were to just use positive values we would love information about this direction, although mathematically it doesn't make a difference but this mimics the physical world.
This is all that I will cover as far as multimedia side of things go, for a more in depth look at WAV go here
Coding the parser
We follow from the file format above that there is order to how the data inside a WAV is stored and a program that analyzes some information which follows a rule is called a Parser.
For our case the parser itself is surprisingly easy, remember at the end of the day it's all just a contiguous sequence of binaries, if we know the format of the WAV file we can just interpret the sequence of binaries. The pseudo code for this parser would look like :
Verify() <- WAV // verify if the file is a valid WAV
readNumberOfSamples() <- WAV // read the total number of samples from the header
readLeftAudio() <- // read the audio data for the left channel
readRightAudio() <- // read the audio data for the right channel
readNumberOfSamples
in C would look like :
//C
int readNumberOfSamples(uint_8 * data)
{
int a = data[40];
int b = data[41];
int c = data[42];
int d = data[43];
int sampleSize = a | b << 8 | c << 16 | d << 32 ;
int numberOfSamples = sampleSize / 4;
return numberOfSamples;
}
The Subchunk2Size
part of the header tells us how big the audio data is. If we don't have this number then our program wouldn't know when to stop reading .
Metadata
Another reason why we need
Subchunk2Size
is to allow us to put metadata in the file, Say if you wanted to add the name of the artist who wrote the song, you cannot put that information inside the header right ? The WAV header standard doesn't have an artist name data field so where do we put that information ? We put this information at the end of the audio data.Subchunk2Size
tells us how many bytes are reserved for the actual audio data following which we can just put out metadata.The total size of the file will be based on : Header + audioData + metaData
The total size of the audio data is split into 4 bytes and so we put them back. Now we get the total size of the audio data and not the number of samples, to get it we divide by 4 but why 4 ? it's because for the PCM format in our case each sample is 4 bytes (2 bytes left + 2 bytes right).
Let's convert the above C
code to Zig
:
// Zig
export fn returnSamples(data: [*]i8) i32 {
var val1: i32 = @intCast(data[40]);
var val2: i32 = @intCast(data[41]);
var var3: i32 = @intCast(data[42]);
var var4: i32 = @intCast(data[43]);
var subchunk2d: i32 = val1 | val2 << 8 | var3 << 16 | var4 << 24;
var samples: i32 = @divFloor(subchunk2d, 4);
return samples;
}
The above code in Zig
looks really similar to the code of C
(well of-course, Zig
was built as a better C
and so resembles a lot of things).
data: [*]i8
is the pointer to the data@intCast
is a built-in function which converts between integer types. We have ani8
and want to cast it to ani32
- For division we won't be using the
\
operator as it's ambiguous, you don't know if the developer wanted to truncate or floor the numerical value . This distinction helps write concise code. Here will be using@divFloor
(the@
is a naming convention in Zig for built in functions) - The
export
identifier is necessary when targetting WASM, we will cover this later.
Seems chill so far right ? Let's move on to find the audio samples now. Ready ? 1 2 3
export fn returnLeftSection(data: [*]i32, data_len: usize) [*]i32 {
var len: usize = 44;
var pol: i32 = 0;
var array_val: usize = 0;
var val1: i32 = @intCast(data[40]);
var val2: i32 = @intCast(data[41]);
var var3: i32 = @intCast(data[42]);
var var4: i32 = @intCast(data[43]);
var subchunk2d: i32 = val1 | val2 << 8 | var3 << 16 | var4 << 24;
var samples: i32 = @divFloor(subchunk2d, 4);
var values = [_]i32{0} ** 65536;
// for simplicity I decided to use a static array of size 65536, feel free to make it a dynamic array.
while (len + 4 < samples * 4 and len < data_len) {
var byte1: i16 = @intCast(data[len]);
var byte2: i16 = @intCast(data[len + 1]);
leftValue = byte1 | byte2 << 8;
values[array_val] = leftValue;
array_val += 1;
len += 4;
}
return @ptrCast(&values);
}
Let's break the code.
- We find the number of samples the same way we did above.
- Create an array to store the values
- Iterate through the audio section's data and extract the left channel's data from each sample. (See how we are multiplying samples by 4 inside the loop, it is because each sample is 4 bytes and we are traversing byte by byte)
The code for Right channel is almost the same except we just increment the pointer by 2 before-hand so that it constructs the Right channel value inside each sample instead of the Left.
...
len += 2;
while (len + 4 < samples * 4 and len < data_len) {
var byte1: i16 = @intCast(data[len]);
var byte2: i16 = @intCast(data[len + 1]);
...
Wrapping Up
Great Job on making it this far !
In this chapter we dived into the WAV format and how we can build a parser for it in Zig. In the next blog we will be incorporating we built so far to the WASM target and bootstrap it with some HTML and CSS.
Helpful Links:
- Entire Code
- More on WASM