Manual: FFT

[ javadoc | examples ]

FFT implements a Fast Fourier Transform. It is an efficient way to calculate the Complex Discrete Fourier Transform, which is a particular kind of Fourier Transform. A Fourier Transform is an algorithm that transforms a signal in the time domain, such as a sample buffer, into a signal in the frequency domain, often called the spectrum.

// Constructs an FFT that will accept sample buffers 
// that are timeSize long and have been recorded 
// with a sample rate of sampleRate. 
// timeSize must be a power of two. 
// This will throw an exception if it is not.
FFT(int timeSize, float sampleRate)

The spectrum does not represent individual frequencies, but actually represents frequency bands centered on particular frequencies. The center frequency of each band is usually expressed as a fraction of the sampling rate of the time domain signal and is equal to the index of the frequency band divided by the total number of bands. The total number of frequency bands is usually equal to the length of the time domain signal, but access is only provided to frequency bands with indices less than half the length, because they correspond to frequencies below the Nyquist frequency. In other words, given a signal of length N, there will be N/2 frequency bands in the spectrum.

As an example, if you construct an FFT with a timeSize of 1024 and and a sampleRate of 44100 Hz, then the spectrum will contain values for frequencies below 22010 Hz, which is the Nyquist frequency (half the sample rate). If you ask for the value of band number 5, this will correspond to a frequency band centered on 5/1024 * 44100 = 0.0048828125 * 44100 = 215 Hz. The width of that frequency band is equal to 2/1024, expressed as a fraction of the total bandwidth of the spectrum. The total bandwith of the spectrum is equal to the Nyquist frequency, which in this case is 22100, so the bandwidth is equal to about 50 Hz. It is not necessary for you to remember all of these relationships, though it is good to be aware of them. The getFreq allows you to query the spectrum with a frequency in Hz and the method getBandWidth will return the bandwidth in Hz of each frequency band in the spectrum.

The Forward Transform

// Performs a forward transform on buffer.
forward(float[] buffer)
forward(AudioBuffer buffer)
// Performs a forward transform on timeSize() samples of buffer, starting at offset
forward(float[] buffer, int offset)
forward(AudioBuffer buffer, int offset)
// Returns the size of the spectrum created by this transform.
// In other words, the number of frequency bands produced by this transform. 
// This is typically equal to timeSize()/2 + 1, see above for an explanation.
int specSize()
// Returns the amplitude of the requested frequency band.
float getBand(int i) 
// Gets the amplitude of the requested frequency (Hz) in the spectrum.
float getFreq(float freq)  
// Returns the index of the frequency band that contains the requested frequency.
int freqToIndex(float freq)

A typical usage of FFT is to analyze a signal so that the frequency spectrum may be represented in some way, typically with vertical lines. You could do this in Processing with the following code, where audio is an AudioSource and fft is an FFT.

fft.forward(audio.left);
for (int i = 0; i < fft.specSize(); i++)
{
   // draw the line for frequency band i, 
   // scaling it by 4 so we can see it a bit better
   line(i, height, i, height - fft.getBand(i) * 4);
}

Windowing

// A constant indicating no window
static int NONE
// A constant indicating a Hamming window
static int HAMMING
// Sets the window to use on the samples 
// before taking the forward transform.
window(int which)

Windowing is the process of shaping the audio samples before transforming them to the frequency domain. If you call the window method with one of the window constants, the sample buffers passed to the object for analysis will be shaped by the current window before being transformed. The result of using a window is to reduce the noise in the spectrum somewhat.

Code Sample (online example)

/**
  * This sketch demonstrates how to use an FFT to analyze an 
  * AudioBuffer and draw the resulting spectrum. <br />
  * It also allows you to turn windowing on and off, 
  * but you will see there is not much difference in the spectrum.<br />
  * Press 'w' to turn on windowing, press 'e' to turn it off.
  */
 
import ddf.minim.analysis.*;
import ddf.minim.*;
 
Minim minim;
AudioPlayer jingle;
FFT fft;
String windowName;
 
void setup()
{
  size(512, 200, P3D);
  textMode(SCREEN);
 
  minim = new Minim(this);
 
  jingle = minim.loadFile("jingle.mp3", 2048);
  jingle.loop();
 
  // create an FFT object that has a time-domain buffer 
  // the same size as jingle's sample buffer
  // note that this needs to be a power of two 
  // and that it means the size of the spectrum
  // will be 512. see the online tutorial for more info.
  fft = new FFT(jingle.bufferSize(), jingle.sampleRate());
 
  textFont(createFont("Arial", 16));
 
  windowName = "None";
}
 
void draw()
{
  background(0);
  stroke(255);
  // perform a forward FFT on the samples in jingle's left buffer
  // note that if jingle were a MONO file, 
  // this would be the same as using jingle.right or jingle.left
  fft.forward(jingle.mix);
  for(int i = 0; i < fft.specSize(); i++)
  {
    // draw the line for frequency band i, scaling it by 4 so we can see it a bit better
    line(i, height, i, height - fft.getBand(i)*4);
  }
  fill(255);
  // keep us informed about the window being used
  text("The window being used is: " + windowName, 5, 20);
}
 
void keyReleased()
{
  if ( key == 'w' ) 
  {
    // a Hamming window can be used to shape the sample buffer that is passed to the FFT
    // this can reduce the amount of noise in the spectrum
    fft.window(FFT.HAMMING);
    windowName = "Hamming";
  }
 
  if ( key == 'e' ) 
  {
    fft.window(FFT.NONE);
    windowName = "None";
  }
}
 
void stop()
{
  // always close Minim audio classes when you finish with them
  jingle.close();
  minim.stop();
 
  super.stop();
}

Averages

FFT also has methods that allow you to request the creation of an average spectrum. An average spectrum is simply a spectrum with fewer bands than the full spectrum where each average band is the average of the amplitudes of some number of contiguous frequency bands in the full spectrum.

// Sets the number of averages used when computing 
// the spectrum and spaces the averages in a linear manner.
void linAverages(int numAvg)
// Sets the number of averages used when computing
// the spectrum based on the minimum bandwidth for an octave           
// and the number of bands per octave.
void logAverages(int minBandwidth, int bandsPerOctave)          
// Sets the object to not compute averages.
void noAverages()          
// Gets the value of the ith average.
float getAvg(int i)
// Returns the number of averages currently being calculated.
int avgSize()
// Calculate the average amplitude of the 
// frequency band bounded by lowFreq and hiFreq, inclusive.
float calcAvg(float lowFreq, float hiFreq)

The linAverages method allows you to specify the number of averages that you want and will group frequency bands into groups of equal number. So if you have a spectrum with 512 frequency bands and you ask for 64 averages, each average will span 8 bands of the full spectrum.

The logAverages will group frequency bands by octave and allows you to specify the size of the smallest octave to use (in Hz) and also how many bands to split each octave into. So you might ask for the smallest octave to be 60 Hz and to split each octave into two bands. The result is that the bandwidth of each average is different. One frequency is an octave above another when it’s frequency is twice that of the lower frequency. So, 120 Hz is an octave above 60 Hz, 240 Hz is an octave above 120 Hz, and so on. When octaves are split, they are split based on Hz, so if you split the octave 60-120 Hz in half, you will get 60-90Hz and 90-120Hz. You can see how these bandwidths increase as your octave sizes grow. For instance, the last octave will always span sampleRate/4 - sampleRate/2, which in the case of audio sampled at 44100 Hz is 11025-22010 Hz. These logarithmically spaced averages are usually much more useful than the full spectrum or the linearly spaced averages because they map more directly to how humans perceive sound.

The calcAvg method allows you to specify the frequency band you want an average calculated for. You might ask for 60-500Hz and this function will group together the bands from the full spectrum that fall into that range and average their amplitudes for you.

If you don’t want any averages calculated, then you can call noAverages(). This will not impact your ability to use the calcAvg method, it will merely prevent the object from calculating an average array every time you use the forward method.

Code Sample (online example)

/**
  * This sketch demonstrates the difference between linearly spaced averages and logarithmically spaced averages 
  * (averages that are grouped by octave). It also draw the full spectrum so you can see how each set of averages 
  * compares to it.
  * <p>
  * From top to bottom:
  * <ul>
  *  <li>The full spectrum.</li>
  *  <li>The spectrum grouped into 30 linearly spaced averages.</li>
  *  <li>The spectrum grouped logarithmically into 10 octaves, each split into 3 bands.</li>
  * </ul>
  * 
  */
 
import ddf.minim.analysis.*;
import ddf.minim.*;
 
Minim minim;  
AudioPlayer jingle;
FFT fftLin;
FFT fftLog;
float height3;
float height23;
 
void setup()
{
  size(512, 300, P3D);
  height3 = height/3;
  height23 = 2*height/3;
 
  minim = new Minim(this);
  jingle = minim.loadFile("jingle.mp3", 2048);
  // loop the file
  jingle.loop();
  // create an FFT object that has a time-domain buffer the same size as jingle's sample buffer
  // note that this needs to be a power of two 
  // and that it means the size of the spectrum will be 1024. 
  // see the online tutorial for more info.
  fftLin = new FFT(jingle.bufferSize(), jingle.sampleRate());
  // calculate the averages by grouping frequency bands linearly. use 30 averages.
  fftLin.linAverages(30);
  fftLog = new FFT(jingle.bufferSize(), jingle.sampleRate());
  // calculate averages based on a miminum octave width of 22 Hz
  // split each octave into three bands
  // this should result in 30 averages
  fftLog.logAverages(22, 3);
  rectMode(CORNERS);
}
 
void draw()
{
  background(0);
  // perform a forward FFT on the samples in jingle's mix buffer
  // note that if jingle were a MONO file, this would be the same as using jingle.left or jingle.right
  fftLin.forward(jingle.mix);
 
  stroke(255);
  noFill();
  // draw the full spectrum
  for(int i = 0; i < fftLin.specSize(); i++)
  {
    line(i, height3, i, height3 - fftLin.getBand(i)*2);
  }
 
  noStroke();
  fill(255);
  // draw the linear averages
  int w = int(width/fftLin.avgSize());
  for(int i = 0; i < fftLin.avgSize(); i++)
  {
    // draw a rectangle for each average, multiply the value by 5 so we can see it better
    rect(i*w, height23, i*w + w, height23 - fftLin.getAvg(i)*5);
  }
 
  // draw the logarithmic averages
  fftLog.forward(jingle.mix);
  w = int(width/fftLog.avgSize());
  for(int i = 0; i < fftLog.avgSize(); i++)
  {
    // draw a rectangle for each average, multiply the value by 5 so we can see it better
    rect(i*w, height, i*w + w, height - fftLog.getAvg(i));
  }
}
 
void stop()
{
  // always close Minim audio classes when you are done with them
  jingle.close();
  // always stop Minim before exiting
  minim.stop();
 
  super.stop();
}

Inverse Transform

// Performs an inverse transform of the 
// frequency spectrum and places the result in buffer.
void inverse(AudioBuffer buffer)
// Returns the length of the time domain signal
// generated by inverse
int timeSize()     
// Sets the amplitude of the requested frequency in the spectrum to a.
void setFreq(float freq, float a)
// Scales the amplitude of the requested frequency by a.
void scaleFreq(float freq, float s)
// Sets the amplitude of the ith frequency band to a.
void setBand(int i, float a)
// Scales the amplitude of the ith frequency band by s.
void scaleBand(int i, float s)

FFT also supports taking the inverse transform of a spectrum. This means that a frequency spectrum will be transformed into a time domain signal and placed in a provided sample buffer. The length of the time domain signal will be timeSize() long. The set and scale methods allow you the ability to shape the spectrum already stored in the object before taking the inverse transform.

The sketch below demonstrates very simply how you might use the inverse FFT to modify an audio signal. Press ‘f’ to perform the forward FFT, then press ‘s’ to set one of the frequency bands to 150. Now press ‘d’ to take the inverse FFT. You will see that the wave form now looks like two sine waves that have been added together. In fact, this is exactly the case. The sine wave that has been added has the same frequency as the frequency band that we artificially changed the value of. You might wonder what the actual frequency added to the spectrum is. That frequency is a fraction of the sampling rate, which can be found with the formula f = i/N where f is the fraction of the sampling rate, i is the index of the frequency band, and N is the time-domain size of the FFT. In this case we have a 512 point FFT and we are changing the frequency band at index 20. So in our case f = 20/512 = 0.0390625 Our sampling rate is 44100 Hz, a value passed in the SineWave constructor, so the frequency in Hz that is being added to the spectrum is 44100 * 0.0390625 = 1722.65625 Hz

Code Sample (online example)

/**
 * This sketch demonstrates very simply how you might use the inverse FFT to modify an audio signal.<br />
 * Press 'f' to perform the forward FFT, then press 's' to set one of the frequency bands to 150.<br />
 * Now press 'd' to take the inverse FFT. You will see that the wave form now looks like two sine waves that have
 * been added together. In fact, this is exactly the case. The sine wave that has been added has the 
 * same frequency as the frequency band that we artificially changed the value of.<br />
 * <br />
 * You might wonder what the actual frequency added to the spectrum is.
 * That frequency is a fraction of the sampling rate, which can be found with the formula <b>f = i/N</b>
 * where <b>f</b> is the fraction of the sampling rate, <b>i</b> is the index of the frequency band, 
 * and <b>N</b> is the time-domain size of the FFT. In this case we have a 512 point FFT and we are 
 * changing the frequency band at index 20. So in our case <b>f = 20/512 = 0.0390625</b>
 * Our sampling rate is 44100 Hz, a value passed in the Sine constructor,
 * so the frequency in Hz that is being added to the spectrum is <b>44100 * 0.0390625 = 1722.65625 Hz</b>
 *
 */
 
import ddf.minim.analysis.*;
import ddf.minim.signals.*;
 
FFT fft;
SineWave sine;
float[] buffer;
int bsize = 512;
 
void setup()
{
  size(512, 300, P3D);
  // create an FFT with a time-domain size the same as the size of buffer
  // it is required that these two values be the same
  // and also that the value is a power of two
  fft = new FFT(bsize, 44100);
  sine = new SineWave(600, 1, 44100);
  buffer = new float[bsize];
  // fill the buffer with a sine wave
  sine.generate(buffer);
}
 
void draw()
{
  background(0);
  noStroke();
  fill(255, 128);
  // draw the waveform
  for(int i = 0; i < buffer.length; i++)
  {
    ellipse(i, 50 + buffer[i]*10, 2, 2);
  }
  noFill();
  stroke(255);
  // draw the spectrum
  for(int i = 0; i < fft.specSize(); i++)
  {
    line(i, height, i, height - fft.getBand(i));
  }
  stroke(255, 0, 0);
  line(width/2, height, width/2, 0);
}
 
 
void keyReleased()
{
  if ( key == 'f' ) 
  {
    println("Performing a Forward FFT on buffer.");
    fft.forward(buffer);
  }
  if ( key == 'd' ) 
  {
    println("Performing an Inverse FFT and putting the result in buffer.");
    fft.inverse(buffer);
  }
  if ( key == 's' )
  {
    // by setting frequency band 20 to a high value, we are basically mixing in a sine wave at that frequency
    // after setting the frequency band and then taking the inverse FFT, you will see the waveform change
    println("Setting frequency band 20 to 150.");
    fft.setBand(20, 150);
  }
}

12 thoughts on “Manual: FFT

  1. Pingback: Sabrina’s External Brain… | Week 7: Sound and Sound Input

  2. In the windowing sketch, the comments say:

    * It also allows you to turn windowing on and off,
    * but you will see there is not much difference in the spectrum

    Switching the vertical axis to log scale (dB) makes the difference much clearer; rectangular windowing causes lots of spread of spectral peaks, which hamming windows fix. You can see this by changing the line() command in the draw() method:

    // draw the line for frequency band i using dB scale
    line(i, height, i, height – Math.round(2*20*Math.log10(100*fft.getBand(i))));

    DAn.

  3. Thanks for pointing this out! With the next release I expect there will be a better example demonstrating how the windowing functions change the spectrum and there will also be a many more windows to choose from.

  4. “If you ask for the value of band number 5, this will correspond to a frequency band centered on 5/1024 * 44100 = 0.0048828125 * 44100 = 215 Hz.”

    So you’re saying that if you want the value of band number 1024, then it would correspond to a frequency of 44100 Hz. However, with a 44kHz sampling rate, the maximum frequency information you can obtain is 22kHz (determined by the Nyquist bandwidth calculation).

    Could you please explain this further? How would something sampled at 44kHz allow your FFT to have bands anywhere past 22kHz?

    Thanks.

  5. See the paragraph above the sentence you quote: “The total number of frequency bands is usually equal to the length of the time domain signal, but access is only provided to frequency bands with indices less than half the length, because they correspond to frequencies below the Nyquist frequency. In other words, given a signal of length N, there will be N/2 frequency bands in the spectrum.”

  6. So with a timesize of 1024 like above, you can only ask for the value of band numbers up to 512? It won’t give you anything from 513-1024?

  7. @msk That’s correct. This FFT class is meant to be audio-centric, not general purpose.

    @devant Not directly through the public methods. However, the arrays where all the data is stored are declared protected, so you could do something like extend FFT and access them in the methods of your class.

  8. Pingback: FFT + Realidad Aumentada « Emiliusvgs Projects

  9. Pingback: creating a visualization for my own media player???

  10. Pingback: CAAD workspace: general information for CAAD architects

  11. Pingback: [Midterm] Drawing Machine Playtests | The Flavorful

Comments are closed.