Archive for the ‘DSP’ Category

Minim: An Audio Library for Processing

Tuesday, March 27th, 2007

It’s here, the first release of my audio library for Processing: Minim.

Here are some of the cool features:

  • Released under the GPL, source is included in the full distribution.
  • AudioFileIn: Mono and Stereo playback of WAV, AIFF, AU, SND, and MP3 files.
  • AudioFileOut: Mono and Stereo audio recording either buffered or direct to disk.
  • AudioInput: Mono and Stereo input monitoring.
  • AudioOutput: Mono and Stereo sound synthesis.
  • AudioSignal: A simple interface for writing your own sound synthesis classes.
  • Comes with all the standard waveforms, a pink noise generator and a white noise generator. Additionally, you can extend the Oscillator class for easy implementation of your own periodic waveform.
  • AudioEffect: A simple interface for writing your own audio effects.
  • Comes with low pass, high pass, band pass, and notch filters. Additionally, you can extend the IIRFilter class for easy implementation of your own IIR filters.
  • Easy to attach signals and effects to AudioInputs and AudioOutputs. All the mixing and processing is taken care of for you.
  • Provides an FFT class for doing spectrum analysis.
  • Provides a BeatDetect class for doing beat detection.

Visit the download page, then take a look at the Quickstart Guide or dive right into the Javadocs.

FFT Averages

Wednesday, March 21st, 2007

This post assumes you already know what an FFT is, so if you don’t, I suggest reading Chapter 8 and Chapter 12 of The Scientist and Engineer’s Guide to Digital Signal Processing. What I’m going to discuss is two different ways of averaging contiguous bands of an FFT. Before I do that, I’d like to review exactly what the frequency bands of an FFT represent.

What’s In An FFT?

Let’s say you have a sample buffer that has 1024 samples in it. If you run this through an FFT you will get a frequency domain described by two arrays that are each 1024 values long. However, typically the values in these arrays are not used directly. Instead, each pair of values (real[0] and imag[0], real[1] and imag[1], etc) is used to calculate the amplitude of each point in the FFT and that value is then used as the value of the spectrum at that point. Even more confusing to those new to the FFT is that the spectrum of 1024 samples is only 513 values long (the array runs from 0 to 512). The reason for this is because the values above spectrum[512] are, in practice, meaningless because they represent frequencies above the Nyquist frequency (one-half the sampling rate). So now we’ve simplified our two 1024 value arrays down to one array that is 513 values long. What do these 513 values represent, exactly?

Each point of the FFT describes the spectral density of a frequency band centered on a frequency that is a fraction of the sampling rate. Spectral density describes how much signal (amplitude) is present per unit of bandwidth. In other words, each point of an FFT is not describing a single frequency, but a frequency band whose size is proportional to the number of points in the FFT. The bandwidth of each band is 2/N, expressed as a fraction of the total bandwidth (i.e. N/2, which corresponds to the point at one-half the sampling rate), where N is the length of the time domain signal (1024 in our example). The exceptions to this are the first and last bands (spectrum[0] and spectrum[512]), whose bandwidth is 1/N. This makes more sense when talking about actual frequencies, so:

Given a sample buffer of 1024 samples that were sampled at 44100 Hz, a 1024 point FFT will give us a frequency spectrum of 513 points, with a total bandwidth of 22050 Hz. Each point i in the FFT represents a frequency band centered on the frequency i/1024 * 44100 whose bandwidth is 2/1024 * 22050 = 43.0664062 Hz, with the exception of spectrum[0] and spectrum[512], whose bandwidth is 1/1024 * 22050 = 21.5332031 Hz. Knowing this, we can get on to the business of averaging contiguous bands.

Linear Averages

We’ve got an FFT with 513 spectrum values, but we want to represent the spectrum as 32 bands, so we’ve decided to simply group together frequency bands by averaging their spectrum values. The obvious way to do this is to simply break the 513 values into 32 chunks of (nearly) equal size:

int avgWidth = (int)513/32;
for (int i = 0; i < 32; i++)
{
  float avg = 0;
  int j;
  for (j = 0; j < avgWidth; j++)
  {
    int offset = j + i*avgWidth;
    if ( offset < 513 )
    {
      avg += spectrum[offset];
    }
    else break;
  }
  avg /= j;
  averages[i] = avg;
}

The problem with this method is that most of the useful information in the spectrum is all below 15000 Hz. By creating average bands in a linear fashion, important detail is lost in the lower frequencies. Consider just the first average band in this example: it corresponds roughly to the frequency range of 0 to 689 Hz, that’s more than half of the keyboard of a piano!

Logarithmic Averages

A better way to group the spectrum would be in a logarithmic fashion. A natural way to do this is for each average to span an octave. Starting from the top of the spectrum, we could group frequencies like so (this assumes a sampling rate of 44100 Hz):

11025 to 22050 Hz
5512 to 11025 Hz
2756 to 5512 Hz
1378 to 2756 Hz
689 to 1378 Hz
344 to 689 Hz
172 to 344 Hz
86 to 172 Hz
43 to 86 Hz
22 to 43 Hz
11 to 22 Hz
0 to 11 Hz

This gives us only 12 bands, but already it is more useful than the 32 linear bands. We could now easily track a kick drum and snare drum, for example. If we want more than 12 bands, we could split each octave in two, or three, or whatever, the fineness would be limited only by the size of the FFT.

Knowing what frequency each point in the FFT corresponds to, and also how wide the frequency band for that point is, allows us to compute the logarithmically spaced averages. First we need to be able to map a frequency to the FFT spectrum. These functions will accomplish that (timeSize is N):

public float getBandWidth()
{
  return (2f/(float)timeSize) * (sampleRate / 2f);
}
 
public int freqToIndex(int freq)
{
  // special case: freq is lower than the bandwidth of spectrum[0]
  if ( freq < getBandWidth()/2 ) return 0;
  // special case: freq is within the bandwidth of spectrum[512]
  if ( freq > sampleRate/2 - getBandWidth()/2 ) return 512;
  // all other cases
  float fraction = (float)freq/(float) sampleRate;
  int i = Math.round(timeSize * fraction);
  return i;
}

This may not seem clear at first, but it is simply the inverse of mapping an index to a frequency, which was mentioned above: Freq(i) = (i/timeSize) * sampleRate. Here’s how we’d use these functions to compute the logarithmic averages listed above:

for (int i = 0; i < 12; i++)
{
  float avg = 0;
  int lowFreq;
  if ( i == 0 ) 
    lowFreq = 0;
  else
    lowFreq = (int)((sampleRate/2) / (float)Math.pow(2, 12 - i));
  int hiFreq = (int)((sampleRate/2) / (float)Math.pow(2, 11 - i));
  int lowBound = freqToIndex(lowFreq);
  int hiBound = freqToIndex(hiFreq);
  for (int j = lowBound; j <= hiBound; j++)
  {
    avg += spectrum[j];
  }
  // line has been changed since discussion in the comments
  // avg /= (hiBound - lowBound);
  avg /= (hiBound - lowBound + 1);
  averages[i] = avg;
}

This is hard coded to compute only 12 averages, which is not ideal, but it would be easy enough to determine the number of octaves based on the sample rate and the smallest bandwidth desired for a single octave.

LinLogAverages is an applet demonstrating the difference between linear averages and logarithmic averages.