Day 12: Beat ‘dis: Re-addressing Audio Analysis

Happy first 2017 Friday! A long while ago, back in one of my actual real jobs, around 4pm every Friday I used to pick a random disco track and blast it into the office and dance around, making sure everyone was ready for the weekend. I think everyone really appreciated it… really 😎

Here we are again – it’s Friday, the first full weekend of 2017 is nearly upon us, you made it through the first week! Let’s get some music going, add some disco visuals, party, and whilst we do all of this we get to do some maths too. It’s like a web devs dream over here 🤓

Really Ruth what is this all about?

I want to introduce a concept I’ve been considering for a while and something that needs me to start work on it. So this is the start, there’ll be more work to be done after this article, but it’s the new year, so let’s begin at the beginning.

If you don’t know me by now, I’m Ruth 👋 I would say ‘in my spare time’ here, but actually recently it’s ‘in my real time’ I create visualisations to music, in a browser, mostly with browser code (aka JavaScript). In 2016 I started a collective, it’s called { Live : JS }, it brings together a bunch of artists, like myself, who all build software and perform live shows with music, lights, projections, hardware, and, well JavaScript.

Yeh it’s crazy, it’s a little cool too, if you want to chat to us we have an open Slack Channel and we open source our code in our Github org too.

You were talking about audio?

Yes! Yes I was 😆 Specifically analysing audio, which is pretty important for what I do. Let’s take the analyser node of the web audio API for example – that’s pretty sweet. It returns velocity values for frequencies, (or volumes of individual pitch if that’s easier to understand). Up until now I’ve just been using those velocity values just as they are returned; let’s take this visual for example:

I’ve done it by creating a pretty small array to represent the frequencies, each ‘flare’ represents an item in the array, and as the velocity changes so does the size of that flare.

The thing is, it does that with frequencies between 0 Hertz and 20,000 Hertz. (Hertz is what we measure frequency in btw). OK that’s alright, that’s what the human ear hears.

That other thing is though – those super high frequencies don’t factor a lot in day to day sounds and not really in music, and I really want to start dealing specifically with music. So if we bear this in mind, we can reduce the range of frequencies we want to analyse.

And there is one more thing (probably the most significant). The web audio API means we receive back a linear distribution of those frequencies. Now that’s cool and everything but, let’s take a brief look at music frequencies… (have you read the word frequencies enough yet?).

Music notes are depicted by letters, namely A-G (there are sharps and flats, but for the purposes of this article we can concentrate on whole tones). The A note above the middle C note on a piano is 440 Hertz, the A an octave (8 notes/whole tones) below that is 220 Hertz, the A an octave above it is 880 Hertz. See what happened there, it halved and it doubled. That is a logarithmical scale, not a linear one.

So what’s really happening with this analysis is we’re getting data, but if we start being a little bit clever with that data, we can get better data.

I know right. Some days you gotta luv our jobs!

Why is this so important? Well let’s look at that demo again ⬆️ Yes it looks pretty cool, I made it myself, thank you ☺️ But I think it could look better. As the track plays, the vis itself stays pretty flat and there’s more activity around the bottom end of the spectrum (the top right in the case of the flares), and literally none at the top end (top left). This is all really due to our linear distribution and large range of frequencies detected.

Now I think I can pimp this visual, and not just this vis either. I’m currently knee deep in building out all my analysis, MIDI, visualisation hacks into a much more workable piece of software. A really great piece of functionality would be more control over the sound analysis.

Think about this: Beat detection – cool. High, mid and low frequencies – cool. Brightness – cool. We could even go as far as detecting what type of music is being played and adjusting things accordingly. Like if a piece of music is very bass-y then our high, mid and low frequency bands may differ to a piece of music that has overall higher sounds.

Now we can do this with the analysis node as is, but it would be better if the data we start with relates better to that which we need it to.

Oh I see your theory! How do we start?

Great, now we’re all on the same page. First thing’s first – reducing our frequency array. Instead of starting with a small array like before (we can end up choosing our array resolution later), we’ll chuck the item count up to get a more accurate analysis.

var freqDataArray = new Uint8Array(2048);

Now we could simply cut that array and just use the items from the first quarter, with the idea that up to 5,000 Hertz is the first 25% of an array representing 0 – 20,000 Hertz:

freqDataArray.slice(0,512);

Why 5,000 Hertz? This is where the majority of sound lies on the spectrum. We can adjust this value later depending on music we’re listening to and what instruments are used. I chose this value via an educated assumption, just as a starting point. (You can bear in mind for instance that speaking conversation sits under 3,000 Hertz).

Or we could filter the sound with the audio API considering we have that functionality anyway. Here I use the lowpass type of the biquadfilter which only let’s through frequencies in the sound up to 5,000 Hertz, at which I set.

const biQuadNodeFil = audioAPI.createBiquadFilter(),
biQuadNodeFil.type = "lowpass";
biQuadNodeFil.frequency.value = 5000;

audioTrack.connect(biQuadNodeFil);
biQuadNodeFil.connect(analyserNode);
analyserNode.connect(audioAPI.destination);

Check out the below CodePen, I’m spitting out the array of frequency velocities onto the page. When you hit the play button those values constantly start updating. If you toggle the filter on, you’ll notice the values in the top part of the array diminish. Also notice that the track playing hasn’t changed that much (as we’re still mostly within the range used in music).

If you’re interested there’s a cutOffVal var sitting in the JavaScript, you can reduce this down even more and see and hear more of a difference in frequencies being filtered out.

But it’s still a linear thingy bob…

Yes it is! We’ve only done the first part; capped the frequencies at 5,000 Hertz – yeah we could be missing a few high noises in certain musics, but for the purposes of this exercise this bit was the easy bit, so let’s keep it simple.

Caveat! ⚠️ I know your brain is flagging a little warning, the data is still going to be the data – regardless now of how we represent it in an array. Let’s take beat detection for example. One way of doing this is to look for a spike in the levels – which a certain number of array items go over a certain velocity (let’s say 75% of them go above the value 150), then there’s a spike in the music and that could be construed as a beat. We can detect that kind of thing on the array as it stands now or after we do the next stage of manipulation and we’d get very similar, if not the same, results. So if that’s all you were looking to do, you could probably just stop here and work with a smaller array and just be like 👍, but if we wanted to look more into different ranges of frequencies (say I wanted my whole screen to ‘bulge’ on the bass) the next stage would be important. Read: There’s only so much we can do, this is essentially another stage of filtering, not re-analysing from the source.

So how do we spread apart these values and try to mimic the musical log scale. Here I’m going to pose one way and as I said at the top of the article this is the beginning of my studies in this area. I’m hoping as I progress with implementing better analysis within my applications, further ways will present themselves.

This method groups these array items, and averages out the values returned within them. The groups get bigger and bigger as the frequencies get higher, much like the gaps within frequencies get bigger the further up the piano we play.

function adjustFreqData() {
    // get frequency data, remove obsolete
  analyserNode.getByteFrequencyData(frequencyData);
  frequencyData.slice(0,512);

  var newFreqs = [], prevRangeStart = 0, prevItemCount = 0;
  // looping for my new 16 items
  for (let j=1; j<17; j++) {
      // define sample size
    var pow, itemCount, rangeStart;
    if (j%2 === 1) {
      pow = (j-1)/2;
    } else {
      pow = j/2;
    }
    itemCount = Math.pow(2, pow);
    if (prevItemCount === 1) {
      rangeStart = 0;
    } else {
      rangeStart = prevRangeStart + (prevItemCount/2);
    }

        // get average value, add to new array
    var newValue = 0, total = 0;
    for (let k=rangeStart; k < rangeStart+itemCount; k++) {
      // add up items and divide by total
      total += frequencyData[k];
      newValue = total/itemCount;
    }
    newFreqs.push(newValue);
    // update
    prevItemCount = itemCount;
    prevRangeStart = rangeStart;
  }
  return newFreqs;
}

Let’s take our modified array and use that in our original visualisation. Each part still represents an item in the array, it’s just you can visually see how better balanced the animation now is. Originally it was very bottom heavy, now the spread is more even and it moves more dramatically. Awesome.

Have a post NY party!

Chuck the music up, fullscreen your browser, celebrate 2017! It’s going to be a better year, surely 🎉✨

So what’s next? Well back in November I started doing a bunch of experiments around audio vis in a browser for Codevember. This work is a little side step to add to the bigger picture, and make these vis more awesome. I’ll be sure to add some of this into the VJ software I’m starting to build too and any more work in this area I’m sure I will share.

Want to discuss anything in this article further? Just tweet me I’m always happy to chat about the things I write about, especially when they involve the audios 🎶

Credits: Big props to my dad for pairing with me on both the theory and code for this, there were times when I don’t think I understood myself but he still did. Also thanks to Kinnoha for happily providing some musical goodness for me to serve up, you can check out more of her ear magic on Soundcloud.

I’m glad to see that others also recognize that music theory is the way to go for audio reactive visualizations. Hybrid systems that measure pure musical properties along with audio energy to pick up patterns. This is what prompted me to write clubber.js and the results are awesome. Here are some examples https://goo.gl/7tDFmr https://goo.gl/411PTg https://goo.gl/9yBZnJ https://wizgrav.github.io/clubber/ . You’re gonna love the tool