Monthly Archives: March 2007

Minim: An Audio Library for Processing

It’s here, the first release of my audio library for Processing: Minim.

Here are some of the cool features:

  • Released under the GPL, source is included in the full distribution.
  • AudioFileIn: Mono and Stereo playback of WAV, AIFF, AU, SND, and MP3 files.
  • AudioFileOut: Mono and Stereo audio recording either buffered or direct to disk.
  • AudioInput: Mono and Stereo input monitoring.
  • AudioOutput: Mono and Stereo sound synthesis.
  • AudioSignal: A simple interface for writing your own sound synthesis classes.
  • Comes with all the standard waveforms, a pink noise generator and a white noise generator. Additionally, you can extend the Oscillator class for easy implementation of your own periodic waveform.
  • AudioEffect: A simple interface for writing your own audio effects.
  • Comes with low pass, high pass, band pass, and notch filters. Additionally, you can extend the IIRFilter class for easy implementation of your own IIR filters.
  • Easy to attach signals and effects to AudioInputs and AudioOutputs. All the mixing and processing is taken care of for you.
  • Provides an FFT class for doing spectrum analysis.
  • Provides a BeatDetect class for doing beat detection.

Visit the download page, then take a look at the Quickstart Guide or dive right into the Javadocs.

Why Make Video Games?

I’m pretty new to this whole video game programming thing and because of that I’ve been asked in a couple of interviews why I want to make games. I think it’s an important enough point to address here.

Large Problem Space

Writing video games means knowing at least a little bit about a whole lot of stuff: graphics, sound, physics, collision detection and response, decision-trees, state machines, animation, UI, character control, camera control, etc. If you are making a game that focuses on a unique mechanic or some kind of real-world modeling you might have to learn a bit about something that isn’t normally included in a game.

For example, when Heather and Phil wrote the design for Glee, they initially decided that they wanted actors in the game to respond to individual instruments in the music. That would have meant diving into the complicated field of music information retrieval. I told them tracking individual instruments would be way too hard but that we could probably track beats. So, we went with that and I spent about a month reading up on beat detection and implementing an algorithm in Processing (which has since been improved upon and incoporated into my audio library for Processing, Minim).

Getting to learn about all these different problem domains and implementing workable solutions (even if they are simplistic, like using only circles for collision detection) is a lot of fun. I love learning about new things and I love it even more when I feel like I have a strong enough grasp on the subject to use it in some meaningful way. Since the problem space will likely change from game to game, it means that there will always be something new to sink my teeth into or an opportunity to revisit a problem that I haven’t thought about for a while.

Pretty Pictures

Games never use the standard Windows GUI. That’s cool. It’s nice to work on something that doesn’t look like every other desktop application. It’s nice to work in 3D where I feel like there’s something to explore out there. It’s nice to just be able to rotate around 3D models, even. Making a game feels so much different than writing a web app or other standard GUI based app. It really changes my relationship to the software.

Interactivity

Games are interactive, but more than that, games are usually highly reactive. A game responds to user input in more ways, and in more unpredictable ways, than a spreadsheet or a word processor or an audio editor. Not only is it a lot of fun to program that reactivity, but it is also a lot of fun to test it, to find the quirks in it, to iron those out, to be surprised by something you’d never imagined happening. It’s a huge payoff to design a procedural animation or actor state system with lots of different behaviors and transition cases and then to see it actually working the way I planned. I mean, it’s way cooler than seeing the chunk of HTML I requested from a web server show up in the place on the page where I wanted it to appear, even if I’m using the AJAX technique to make it happen.

Fun

Games are fun (if they’re good, anyway). I like making something that is fun to use. I like watching other people play the game I’ve made, with a smile on their face. This was really brought home to me at GAMMA 01. I got to see the game I’d programmed up on a big projection screen, vector graphics smiley faces pulsing to the music, being played by all sorts of people. Every time I went by the computer, someone was playing the game and enjoying it. That’s awesome! When I get to work on a game that is commercially released and hopefully get to see people enjoying it in their own homes, investing time in it like I’ve invested time in great games like Katamari Damacy, Guitar Hero, Elebits, and Okami, that’s going to be even more awesome.

FFT Averages

This post assumes you already know what an FFT is, so if you don’t, I suggest reading Chapter 8 and Chapter 12 of The Scientist and Engineer’s Guide to Digital Signal Processing. What I’m going to discuss is two different ways of averaging contiguous bands of an FFT. Before I do that, I’d like to review exactly what the frequency bands of an FFT represent.

What’s In An FFT?

Let’s say you have a sample buffer that has 1024 samples in it. If you run this through an FFT you will get a frequency domain described by two arrays that are each 1024 values long. However, typically the values in these arrays are not used directly. Instead, each pair of values (real[0] and imag[0], real[1] and imag[1], etc) is used to calculate the amplitude of each point in the FFT and that value is then used as the value of the spectrum at that point. Even more confusing to those new to the FFT is that the spectrum of 1024 samples is only 513 values long (the array runs from 0 to 512). The reason for this is because the values above spectrum[512] are, in practice, meaningless because they represent frequencies above the Nyquist frequency (one-half the sampling rate). So now we’ve simplified our two 1024 value arrays down to one array that is 513 values long. What do these 513 values represent, exactly?

Each point of the FFT describes the spectral density of a frequency band centered on a frequency that is a fraction of the sampling rate. Spectral density describes how much signal (amplitude) is present per unit of bandwidth. In other words, each point of an FFT is not describing a single frequency, but a frequency band whose size is proportional to the number of points in the FFT. The bandwidth of each band is 2/N, expressed as a fraction of the total bandwidth (i.e. N/2, which corresponds to the point at one-half the sampling rate), where N is the length of the time domain signal (1024 in our example). The exceptions to this are the first and last bands (spectrum[0] and spectrum[512]), whose bandwidth is 1/N. This makes more sense when talking about actual frequencies, so:

Given a sample buffer of 1024 samples that were sampled at 44100 Hz, a 1024 point FFT will give us a frequency spectrum of 513 points, with a total bandwidth of 22050 Hz. Each point i in the FFT represents a frequency band centered on the frequency i/1024 * 44100 whose bandwidth is 2/1024 * 22050 = 43.0664062 Hz, with the exception of spectrum[0] and spectrum[512], whose bandwidth is 1/1024 * 22050 = 21.5332031 Hz. Knowing this, we can get on to the business of averaging contiguous bands.

Linear Averages

We’ve got an FFT with 513 spectrum values, but we want to represent the spectrum as 32 bands, so we’ve decided to simply group together frequency bands by averaging their spectrum values. The obvious way to do this is to simply break the 513 values into 32 chunks of (nearly) equal size:

int avgWidth = (int)513/32;
for (int i = 0; i < 32; i++)
{
  float avg = 0;
  int j;
  for (j = 0; j < avgWidth; j++)
  {
    int offset = j + i*avgWidth;
    if ( offset < 513 )
    {
      avg += spectrum[offset];
    }
    else break;
  }
  avg /= j;
  averages[i] = avg;
}

The problem with this method is that most of the useful information in the spectrum is all below 15000 Hz. By creating average bands in a linear fashion, important detail is lost in the lower frequencies. Consider just the first average band in this example: it corresponds roughly to the frequency range of 0 to 689 Hz, that’s more than half of the keyboard of a piano!

Logarithmic Averages

A better way to group the spectrum would be in a logarithmic fashion. A natural way to do this is for each average to span an octave. Starting from the top of the spectrum, we could group frequencies like so (this assumes a sampling rate of 44100 Hz):

11025 to 22050 Hz
5512 to 11025 Hz
2756 to 5512 Hz
1378 to 2756 Hz
689 to 1378 Hz
344 to 689 Hz
172 to 344 Hz
86 to 172 Hz
43 to 86 Hz
22 to 43 Hz
11 to 22 Hz
0 to 11 Hz

This gives us only 12 bands, but already it is more useful than the 32 linear bands. We could now easily track a kick drum and snare drum, for example. If we want more than 12 bands, we could split each octave in two, or three, or whatever, the fineness would be limited only by the size of the FFT.

Knowing what frequency each point in the FFT corresponds to, and also how wide the frequency band for that point is, allows us to compute the logarithmically spaced averages. First we need to be able to map a frequency to the FFT spectrum. These functions will accomplish that (timeSize is N):

public float getBandWidth()
{
  return (2f/(float)timeSize) * (sampleRate / 2f);
}
 
public int freqToIndex(int freq)
{
  // special case: freq is lower than the bandwidth of spectrum[0]
  if ( freq < getBandWidth()/2 ) return 0;
  // special case: freq is within the bandwidth of spectrum[512]
  if ( freq > sampleRate/2 - getBandWidth()/2 ) return 512;
  // all other cases
  float fraction = (float)freq/(float) sampleRate;
  int i = Math.round(timeSize * fraction);
  return i;
}

This may not seem clear at first, but it is simply the inverse of mapping an index to a frequency, which was mentioned above: Freq(i) = (i/timeSize) * sampleRate. Here’s how we’d use these functions to compute the logarithmic averages listed above:

for (int i = 0; i < 12; i++)
{
  float avg = 0;
  int lowFreq;
  if ( i == 0 ) 
    lowFreq = 0;
  else
    lowFreq = (int)((sampleRate/2) / (float)Math.pow(2, 12 - i));
  int hiFreq = (int)((sampleRate/2) / (float)Math.pow(2, 11 - i));
  int lowBound = freqToIndex(lowFreq);
  int hiBound = freqToIndex(hiFreq);
  for (int j = lowBound; j <= hiBound; j++)
  {
    avg += spectrum[j];
  }
  // line has been changed since discussion in the comments
  // avg /= (hiBound - lowBound);
  avg /= (hiBound - lowBound + 1);
  averages[i] = avg;
}

This is hard coded to compute only 12 averages, which is not ideal, but it would be easy enough to determine the number of octaves based on the sample rate and the smallest bandwidth desired for a single octave.

LinLogAverages is an applet demonstrating the difference between linear averages and logarithmic averages.

Prototype: Cannon Fodder (v03)

Cannon Fodder (v03) is the third iteration on a prototype for a game inspired by the barrel cannons in Donkey Kong Country. I recently downloaded DKC for my Wii and was reminded of how much fun the levels that use the cannons are. The point of this prototype was to get the rotation and movement of cannons working and to see how much fun it is to just shoot the comet around. The only controls are the space bar, which fires a cannon once it’s loaded (sometimes the comet sticks, just keep pressing!). My idea is to keep that as the only player input and simply work in a lot of variety in the level design: moving cannons, rotating cannons, auto-fire cannons, maybe surfaces with different amounts of bounce.

Heather thought that it was kind of boring because sometimes you have to wait a long time for the comet to intersect with a cannon. But I find it pretty interesting to just watch the comet bounce around the stage. It is sometimes a bit like listening to one of Steve Reich’s phase pieces: the comet will get into a bouncing loop that encloses the moving cannon but the period of the loop is not the same as the period of the moving cannon and the shape of the loop is slowing changing. It’s possible to watch an interaction like that occur and then start guessing how many more loops it will take before the comet intersects the cannon.

WordPress Themes

So I’ve been browsing the WordPress Theme Viewer looking for a nice, understated, mildly geeky theme for this blog. I really don’t want to have to design one myself. However, judging by the standard fare found in the theme viewer, it’s starting to feel like I might be doing just that. There really seems to be a problem with over-designing, putting the design above content (themes with a ridiculously thin column for blog posts, for example), bad color combinations, etc etc. I’m not saying I’m a design wiz, because I’m not, but some of these people seem to be designing with their eyes closed. By far the coolest theme I found is CLI 2.1.2 by Rod McFarland. Like the name suggests, it is a command line interface to a WordPress blog. Totally cool and also almost totally unusable. I mean, if you don’t want people to read your blog, ever, use that theme. Even still, I’m impressed by the execution and the sheer geeky opaqueness of the concept.

Drum Machine

Drum Machine is a sketch I made to test out the AudioSample class in the audio library I’m working on. It is a really simple grid-style drum machine. There are three rows of sixteen buttons each. The rows are, from top to bottom: hi-hat, snare, and kick drum. The buttons correspond to sixteenth notes. So, you only get one measure to make your sick-ass beat. You can change the tempo by clicking in the BPM box and dragging the mouse up and down. It does a pretty good job staying in tempo, even though I didn’t do anything particularly fancy to calculate beat length.

Unfortunately, there is no way to load or save beats that you make and also no way to choose different samples. Perhaps these features will be added in a future version. For now, it’s just a fun little sketch to play with.