DSP

You are currently browsing the archive for the DSP category.

I wrote a UGen the other day that allows you to change the tick rate (or sample rate, if you like), of any UGen that you patch to it. After making a mouse-driven example to test it out, I created this example so that the change in the tick rate could be driven by the audio itself. Here, the current loudness (RMS amplitude, to be exact) of the audio being looped directly adjusts the playback speed. Louder parts result in a higher tick rate. What you get sounds like a really broken record player. Not only is it stuck looping the same four bars, but it sounds like the timing belt is having all sorts of issues. Check it out!

I’ve been meaning to try out some digital Moog filter code. A digital Moog filter is simply a filter that attempts to model the analog filters found on Moog synthesizers. It’s got two knobs: frequency and resonance. I yanked some code from musicdsp.org, a wonderful resource, and wrote a really simple UGen with it. I meant to write more of a song, but there’s never enough time for that in these evening coding forays. So instead, you get this kind of avant-garde slow tuning thing using amplitude modulation and tweaking of the filter values. Have a listen!

Eventually, we plan to add a UGen to Minim that will allow you to control the generating speed of any other UGen, but that got me thinking about how you might do something similar with existing components. So, I struck upon the idea of using an Oscil as a looping sampler by using an audio file as the Waveform and setting the frequency of the Oscil to be very low. The fun thing about this is that if you set the Oscil to a negative frequency, the sound will play in reverse. Additionally, I thought it’d be fun to be able to automate the changing of the frequency by having an LFO control the frequency of each Oscil (left and right channels of the original audio file). So here’s a sketch that lets you play with this setup:

Screenshot of the sample_oscil sketch.

I’m picking my way through Real Sound Synthesis for Interactive Applications by Perry R. Cook (one of the authors of the STK) and decided to start making small apps to demonstrate the different kinds of synthesis he covers in the book. This first one let’s you set the variables of a mass-spring-damper system and then trigger it, which is exactly the same as setting the coefficients of a two-pole IIR filter and sending an impulse into it (as it turns out). Check it:

Sceenshot of the mass-spring-damper applet.

I’m working on adding the ability to do frequency modulation in Minim. I didn’t figure it was a very complicated thing. I knew that I’d have to do it while generating each sample, so it couldn’t be done as an AudioEffect. But when I went looking online about how to do it, I found a number of articles that described the technique in words, with somewhat dense equations, but none that had easy to understand code examples. I did find this article, but the code in it is some form of Lisp, a language I am not familiar with. I thought I had it figured out a couple times, but every time I tried something new I still wasn’t achieving the correct result.

So today I asked my friend The Mysterious H how it works because he’s a smart guy and builds synths out of Atari sound chips and shit like that. He goes, “Oh it’s easy.”

Every time you want to generate a sample of the signal you are modulating (the carrier), generate a sample of your modulator and add that to the phase you use to generate your carrier sample.

Said another way: Say you’ve got a signal called f. At every time step t (which is your phase), you generate a sample s by evaluating f(t). If you want to modulate the frequency of f with a modulator called m, then at every t you will do this:

s = f(t + m(t))

The speed of the modulation is determined by the frequency of your modulator and the amount of the modulation (how many hertz above and below the frequency of your carrier signal the audible signal will swing) is determined by the amplitude of the modulator. In my tests I found that the amplitude had to be around 0.001 to achieve what one would consider vibrato. Setting it to 0.1 made for some pretty intense alien sounds when the frequency of the modulator was up in audio signal range (20 Hz and up, say).

It’s lots of fun to play with and will be coming soon!

Tags: , ,

It’s here, the first release of my audio library for Processing: Minim.

Here are some of the cool features:

  • Released under the GPL, source is included in the full distribution.
  • AudioFileIn: Mono and Stereo playback of WAV, AIFF, AU, SND, and MP3 files.
  • AudioFileOut: Mono and Stereo audio recording either buffered or direct to disk.
  • AudioInput: Mono and Stereo input monitoring.
  • AudioOutput: Mono and Stereo sound synthesis.
  • AudioSignal: A simple interface for writing your own sound synthesis classes.
  • Comes with all the standard waveforms, a pink noise generator and a white noise generator. Additionally, you can extend the Oscillator class for easy implementation of your own periodic waveform.
  • AudioEffect: A simple interface for writing your own audio effects.
  • Comes with low pass, high pass, band pass, and notch filters. Additionally, you can extend the IIRFilter class for easy implementation of your own IIR filters.
  • Easy to attach signals and effects to AudioInputs and AudioOutputs. All the mixing and processing is taken care of for you.
  • Provides an FFT class for doing spectrum analysis.
  • Provides a BeatDetect class for doing beat detection.

Visit the download page, then take a look at the Quickstart Guide or dive right into the Javadocs.

FFT Averages

This post assumes you already know what an FFT is, so if you don’t, I suggest reading Chapter 8 and Chapter 12 of The Scientist and Engineer’s Guide to Digital Signal Processing. What I’m going to discuss is two different ways of averaging contiguous bands of an FFT. Before I do that, I’d like to review exactly what the frequency bands of an FFT represent.

What’s In An FFT?

Let’s say you have a sample buffer that has 1024 samples in it. If you run this through an FFT you will get a frequency domain described by two arrays that are each 1024 values long. However, typically the values in these arrays are not used directly. Instead, each pair of values (real[0] and imag[0], real[1] and imag[1], etc) is used to calculate the amplitude of each point in the FFT and that value is then used as the value of the spectrum at that point. Even more confusing to those new to the FFT is that the spectrum of 1024 samples is only 513 values long (the array runs from 0 to 512). The reason for this is because the values above spectrum[512] are, in practice, meaningless because they represent frequencies above the Nyquist frequency (one-half the sampling rate). So now we’ve simplified our two 1024 value arrays down to one array that is 513 values long. What do these 513 values represent, exactly?

Each point of the FFT describes the spectral density of a frequency band centered on a frequency that is a fraction of the sampling rate. Spectral density describes how much signal (amplitude) is present per unit of bandwidth. In other words, each point of an FFT is not describing a single frequency, but a frequency band whose size is proportional to the number of points in the FFT. The bandwidth of each band is 2/N, expressed as a fraction of the total bandwidth (i.e. N/2, which corresponds to the point at one-half the sampling rate), where N is the length of the time domain signal (1024 in our example). The exceptions to this are the first and last bands (spectrum[0] and spectrum[512]), whose bandwidth is 1/N. This makes more sense when talking about actual frequencies, so:

Given a sample buffer of 1024 samples that were sampled at 44100 Hz, a 1024 point FFT will give us a frequency spectrum of 513 points, with a total bandwidth of 22050 Hz. Each point i in the FFT represents a frequency band centered on the frequency i/1024 * 44100 whose bandwidth is 2/1024 * 22050 = 43.0664062 Hz, with the exception of spectrum[0] and spectrum[512], whose bandwidth is 1/1024 * 22050 = 21.5332031 Hz. Knowing this, we can get on to the business of averaging contiguous bands.

Linear Averages

We’ve got an FFT with 513 spectrum values, but we want to represent the spectrum as 32 bands, so we’ve decided to simply group together frequency bands by averaging their spectrum values. The obvious way to do this is to simply break the 513 values into 32 chunks of (nearly) equal size:

int avgWidth = (int)513/32;
for (int i = 0; i < 32; i++)
{
  float avg = 0;
  int j;
  for (j = 0; j < avgWidth; j++)
  {
    int offset = j + i*avgWidth;
    if ( offset < 513 )
    {
      avg += spectrum[offset];
    }
    else break;
  }
  avg /= j;
  averages[i] = avg;
}

The problem with this method is that most of the useful information in the spectrum is all below 15000 Hz. By creating average bands in a linear fashion, important detail is lost in the lower frequencies. Consider just the first average band in this example: it corresponds roughly to the frequency range of 0 to 689 Hz, that’s more than half of the keyboard of a piano!

Logarithmic Averages

A better way to group the spectrum would be in a logarithmic fashion. A natural way to do this is for each average to span an octave. Starting from the top of the spectrum, we could group frequencies like so (this assumes a sampling rate of 44100 Hz):

11025 to 22050 Hz
5512 to 11025 Hz
2756 to 5512 Hz
1378 to 2756 Hz
689 to 1378 Hz
344 to 689 Hz
172 to 344 Hz
86 to 172 Hz
43 to 86 Hz
22 to 43 Hz
11 to 22 Hz
0 to 11 Hz

This gives us only 12 bands, but already it is more useful than the 32 linear bands. We could now easily track a kick drum and snare drum, for example. If we want more than 12 bands, we could split each octave in two, or three, or whatever, the fineness would be limited only by the size of the FFT.

Knowing what frequency each point in the FFT corresponds to, and also how wide the frequency band for that point is, allows us to compute the logarithmically spaced averages. First we need to be able to map a frequency to the FFT spectrum. These functions will accomplish that (timeSize is N):

public float getBandWidth()
{
  return (2f/(float)timeSize) * (sampleRate / 2f);
}
 
public int freqToIndex(int freq)
{
  // special case: freq is lower than the bandwidth of spectrum[0]
  if ( freq < getBandWidth()/2 ) return 0;
  // special case: freq is within the bandwidth of spectrum[512]
  if ( freq > sampleRate/2 - getBandWidth()/2 ) return 512;
  // all other cases
  float fraction = (float)freq/(float) sampleRate;
  int i = Math.round(timeSize * fraction);
  return i;
}

This may not seem clear at first, but it is simply the inverse of mapping an index to a frequency, which was mentioned above: Freq(i) = (i/timeSize) * sampleRate. Here’s how we’d use these functions to compute the logarithmic averages listed above:

for (int i = 0; i < 12; i++)
{
  float avg = 0;
  int lowFreq;
  if ( i == 0 )
    lowFreq = 0;
  else
    lowFreq = (int)((sampleRate/2) / (float)Math.pow(2, 12 - i));
  int hiFreq = (int)((sampleRate/2) / (float)Math.pow(2, 11 - i));
  int lowBound = freqToIndex(lowFreq);
  int hiBound = freqToIndex(hiFreq);
  for (int j = lowBound; j <= hiBound; j++)
  {
    avg += spectrum[j];
  }
  // line has been changed since discussion in the comments
  // avg /= (hiBound - lowBound);
  avg /= (hiBound - lowBound + 1);
  averages[i] = avg;
}

This is hard coded to compute only 12 averages, which is not ideal, but it would be easy enough to determine the number of octaves based on the sample rate and the smallest bandwidth desired for a single octave.

LinLogAverages is an applet demonstrating the difference between linear averages and logarithmic averages.