Minim
core | ugens | analysis
 
The Analysis package contains classes for analyzing audio in real-time.

Fast Fourier Transform

A Fourier Transform is an algorithm that transforms a signal in the time domain, such as a sample buffer, into a signal in the frequency domain, often called the spectrum. The spectrum does not represent individual frequencies, but actually represents frequency bands centered on particular frequencies. The center frequency of each band is usually expressed as a fraction of the sampling rate of the time domain signal and is equal to the index of the frequency band divided by the total number of bands. The total number of frequency bands is usually equal to the length of the time domain signal, but access is only provided to frequency bands with indices less than half the length, because they correspond to frequencies below the Nyquist frequency. In other words, given a signal of length N, there will be N/2 frequency bands in the spectrum.

Beat (or Onset) Detection

The BeatDetect class allows you to analyze an audio stream for beats (rhythmic onsets). Beat Detection Algorithms by Frederic Patin describes beats in the following way:

The human listening system determines the rhythm of music by detecting a pseudo periodical succession of beats. The signal which is intercepted by the ear contains a certain energy, this energy is converted into an electrical signal which the brain interprets. Obviously, The more energy the sound transports, the louder the sound will seem. But a sound will be heard as a beat only if his energy is largely superior to the sound's energy history, that is to say if the brain detects a brutal variation in sound energy. Therefore if the ear intercepts a monotonous sound with sometimes big energy peaks it will detect beats, however, if you play a continuous loud sound you will not perceive any beats. Thus, the beats are big variations of sound energy.
In fact, the two algorithms in this class are based on two algorithms described in that paper.

BeatDetect has two modes: sound energy tracking and frequency energy tracking. In sound energy mode, the level of the buffer, as returned by level(), is used as the instant energy in each frame. Beats, then, are spikes in this value, relative to the previous one second of sound. In frequency energy mode, the same process is used but instead of tracking the level of the buffer, an FFT is used to obtain a spectrum, which is then divided into average bands using logAverages(), and each of these bands is tracked individually. The result is that it is possible to track sounds that occur in different parts of the frequency spectrum independently (like the kick drum and snare drum).

In sound energy mode you use isOnset() to query the algorithm and in frequency energy mode you use isOnset(int i), isKick(), isSnare(), and isRange() to query particular frequnecy bands or ranges of frequency bands. It should be noted that isKick(), isSnare(), and isHat() merely call isRange() with values determined by testing the algorithm against music with a heavy beat and they may not be appropriate for all kinds of music. If you find they are performing poorly with your music, you should use isRange() directly to locate the bands that provide the most meaningful information for you.