FFT Averages

This post assumes you already know what an FFT is, so if you don’t, I suggest reading Chapter 8 and Chapter 12 of The Scientist and Engineer’s Guide to Digital Signal Processing. What I’m going to discuss is two different ways of averaging contiguous bands of an FFT. Before I do that, I’d like to review exactly what the frequency bands of an FFT represent.

What’s In An FFT?

Let’s say you have a sample buffer that has 1024 samples in it. If you run this through an FFT you will get a frequency domain described by two arrays that are each 1024 values long. However, typically the values in these arrays are not used directly. Instead, each pair of values (real[0] and imag[0], real[1] and imag[1], etc) is used to calculate the amplitude of each point in the FFT and that value is then used as the value of the spectrum at that point. Even more confusing to those new to the FFT is that the spectrum of 1024 samples is only 513 values long (the array runs from 0 to 512). The reason for this is because the values above spectrum[512] are, in practice, meaningless because they represent frequencies above the Nyquist frequency (one-half the sampling rate). So now we’ve simplified our two 1024 value arrays down to one array that is 513 values long. What do these 513 values represent, exactly?

Each point of the FFT describes the spectral density of a frequency band centered on a frequency that is a fraction of the sampling rate. Spectral density describes how much signal (amplitude) is present per unit of bandwidth. In other words, each point of an FFT is not describing a single frequency, but a frequency band whose size is proportional to the number of points in the FFT. The bandwidth of each band is 2/N, expressed as a fraction of the total bandwidth (i.e. N/2, which corresponds to the point at one-half the sampling rate), where N is the length of the time domain signal (1024 in our example). The exceptions to this are the first and last bands (spectrum[0] and spectrum[512]), whose bandwidth is 1/N. This makes more sense when talking about actual frequencies, so:

Given a sample buffer of 1024 samples that were sampled at 44100 Hz, a 1024 point FFT will give us a frequency spectrum of 513 points, with a total bandwidth of 22050 Hz. Each point i in the FFT represents a frequency band centered on the frequency i/1024 * 44100 whose bandwidth is 2/1024 * 22050 = 43.0664062 Hz, with the exception of spectrum[0] and spectrum[512], whose bandwidth is 1/1024 * 22050 = 21.5332031 Hz. Knowing this, we can get on to the business of averaging contiguous bands.

Linear Averages

We’ve got an FFT with 513 spectrum values, but we want to represent the spectrum as 32 bands, so we’ve decided to simply group together frequency bands by averaging their spectrum values. The obvious way to do this is to simply break the 513 values into 32 chunks of (nearly) equal size:

int avgWidth = (int)513/32;
for (int i = 0; i < 32; i++)
  float avg = 0;
  int j;
  for (j = 0; j < avgWidth; j++)
    int offset = j + i*avgWidth;
    if ( offset < 513 )
      avg += spectrum[offset];
    else break;
  avg /= j;
  averages[i] = avg;

The problem with this method is that most of the useful information in the spectrum is all below 15000 Hz. By creating average bands in a linear fashion, important detail is lost in the lower frequencies. Consider just the first average band in this example: it corresponds roughly to the frequency range of 0 to 689 Hz, that’s more than half of the keyboard of a piano!

Logarithmic Averages

A better way to group the spectrum would be in a logarithmic fashion. A natural way to do this is for each average to span an octave. Starting from the top of the spectrum, we could group frequencies like so (this assumes a sampling rate of 44100 Hz):

11025 to 22050 Hz
5512 to 11025 Hz
2756 to 5512 Hz
1378 to 2756 Hz
689 to 1378 Hz
344 to 689 Hz
172 to 344 Hz
86 to 172 Hz
43 to 86 Hz
22 to 43 Hz
11 to 22 Hz
0 to 11 Hz

This gives us only 12 bands, but already it is more useful than the 32 linear bands. We could now easily track a kick drum and snare drum, for example. If we want more than 12 bands, we could split each octave in two, or three, or whatever, the fineness would be limited only by the size of the FFT.

Knowing what frequency each point in the FFT corresponds to, and also how wide the frequency band for that point is, allows us to compute the logarithmically spaced averages. First we need to be able to map a frequency to the FFT spectrum. These functions will accomplish that (timeSize is N):

public float getBandWidth()
  return (2f/(float)timeSize) * (sampleRate / 2f);
public int freqToIndex(int freq)
  // special case: freq is lower than the bandwidth of spectrum[0]
  if ( freq < getBandWidth()/2 ) return 0;
  // special case: freq is within the bandwidth of spectrum[512]
  if ( freq > sampleRate/2 - getBandWidth()/2 ) return 512;
  // all other cases
  float fraction = (float)freq/(float) sampleRate;
  int i = Math.round(timeSize * fraction);
  return i;

This may not seem clear at first, but it is simply the inverse of mapping an index to a frequency, which was mentioned above: Freq(i) = (i/timeSize) * sampleRate. Here’s how we’d use these functions to compute the logarithmic averages listed above:

for (int i = 0; i < 12; i++)
  float avg = 0;
  int lowFreq;
  if ( i == 0 ) 
    lowFreq = 0;
    lowFreq = (int)((sampleRate/2) / (float)Math.pow(2, 12 - i));
  int hiFreq = (int)((sampleRate/2) / (float)Math.pow(2, 11 - i));
  int lowBound = freqToIndex(lowFreq);
  int hiBound = freqToIndex(hiFreq);
  for (int j = lowBound; j <= hiBound; j++)
    avg += spectrum[j];
  // line has been changed since discussion in the comments
  // avg /= (hiBound - lowBound);
  avg /= (hiBound - lowBound + 1);
  averages[i] = avg;

This is hard coded to compute only 12 averages, which is not ideal, but it would be easy enough to determine the number of octaves based on the sample rate and the smallest bandwidth desired for a single octave.

LinLogAverages is an applet demonstrating the difference between linear averages and logarithmic averages.

19 thoughts on “FFT Averages

  1. two questions regarding the linlogaverages, in this line:
    return (2f/(float)timeSize) * sampleRate;

    what is 2f? f?

    secondly, in this line:
    avg /= (hiBound – lowBound)
    is there not occassions where both hiBound and lowBound are 0? hence you would be dividing by zero. is that right?

    thanks – other then these two questions the above has been a real help.

  2. 2f tells Java that the 2 should be a floating point number, rather than an integer.

    lowBound could wind up being zero, if lowFreq is zero. However, the only way that hiBound could be zero is if sampleRate is zero, which will never be the case. If your sampleRate was zero that would mean you were sampling a signal zero times per second, which would mean you’d have no digital signal.

  3. if ( freq “less” getBandWidth()/2 ) return 0;

    getbandwidth = 86.13 for 44100, 1024
    hence, hifreq is less than half this for first three bands. hence, hibound gets set to zero on three occassions.

    is this not correct? am i being stupid?
    sorry fot he incomplete threads – it doesn’t like the less than char.

  4. Using Google calculator, I have the first three hiFreq computations (for i = 0 to 2) equalling 182, 220, and 272 respectively. All of these are greater than 43, which is the value that line of code checks freq against.

  5. Ah, you are correct, I was misremembering how Math.pow() works. I just checked out the code for doing this in Minim and the potential divide by zero line looks like this:

    avg /= (hiBound – lowBound + 1);

    That probably means I ran into the divide by zero problem while developing the library. Sorry about that. I’m going to update this post.

  6. what about geometric averages or arithmetic squared averages? not sure if it even makes sense.

  7. About the low and hibound being zero;
    when I implement your code, I get the following:

    range 1=[0; 10], indexes 0-0
    range 2=[10; 21], indexes 0-0
    range 3=[21; 43], indexes 0-1
    range 4=[43; 86], indexes 1-2
    range 5=[86; 172], indexes 2-4
    range 6=[172; 344], indexes 4-8
    range 7=[344; 689], indexes 8-16
    range 8=[689; 1378], indexes 16-32
    range 9=[1378; 2756], indexes 32-64
    range 10=[2756; 5512], indexes 64-128
    range 11=[5512; 11025], indexes 128-256
    range 12=[11025; 22050], indexes 256-512

    So,should I just discard the two first ranges? How should I interpret these two ranges? Thanks for any tips :)

  8. Please erase my last comment, I was being too fast and not taking enough time to think it through. It’s impossible to get more than 10 ranges, because the bandwidth for each frequency bin is 43.0664062Hz (in the files I am using here, with a samplerate of 44100 and an fft on 1024 samples). Range 1 & 2, as in the example, will always return the index 0 because the requested frequency interval is smaller than each bin contains.

  9. hello

    I am using minim: FFT and I am testing Logarithmic Averages.
    How can I get the spectrum[j]? please help me.

  10. What I’ve noticed is that the number of bands equals the log2 of N, N being the number of samples in your FFT.
    For an FFT with size 1024 this makes log2(1024) = 10.

    Now from the second band to the last this is no problem.
    You just use the ranges from 2^(bandno-2) to twice this number. So band 2 gives 2^(2-2) = 1 to 2 and band 10 gives 2^(10-2) = 256 to 512. Exactly like in the example given.

    But the first band is somewhat of a difficult example. If u use a frequency of N/2 with amplitude 1 you get a nice 1 in your result (at the appropriate frequency sample), but if you use a constant DC with the value 1, you get 2 as a result in the FFT.

    So my question is how to use the first band exactly. The page at http://www.dspguide.com/ch8/5.htm shows the first band runs from 0Hz to f0/N Hz where f0 is the sampling rate and N the size of the FFT. When we use the example of 44100 and 1024, we get band 0 running from 0 to 22Hz. Now how do we combine this band with band 1 (running from 22 to 66 with a center of 44Hz) to get our center frequency of 33Hz?

    My guess is (0.5*band0 + band1)/2. This so we take into account the double size of band0.

    Another idea if you want a nicer output, we hear things not only with a log frequency, but with a log amplitude as well. So when you have the results, take the log10 of those. (but beware log10(0) is undefined).

    These are just my 2 cents, any comments would be really appreciated.

  11. Pingback: Beat It!
  12. When you say “Given a sample buffer of 1024 samples that were sampled at 44100 Hz”

    Q1) It means 44100 samples every second, so out of 44,100 samples did you take the first 1024 samples or randomly picked 1024 samples from 44100 samples?

    Q2) Do you mean that the maximum bandwidth of the original data is 22100Hz so 44,100 is enough as per shannon theorem?




  13. A1) The first 1024 samples, and then the next 1024, and so on. Always contiguous blocks of samples in the order they are generated.

    A2) I mean that the frequency range represented by an FFT done on audio that has a sample rate of 44100 Hz will be 22100 Hz. That audio may contain frequencies higher than 22100 Hz, but you won’t see them in the FFT. If the audio had a sample rate of 96000 Hz, you’d be able to see frequencies in the FFT as high as 48000 Hz.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">