Nyquist FFT and Inverse FFT Tutorial

Nyquist provides functions for FFT and inverse FFT operations on streams of audio data. Because sounds can be of any length, but an FFT operates on a fixed amount of data, FFT processing is typically done in short blocks or windows that move through the audio. Thus, a stream of samples is converted in to a sequence of FFT frames representing short-term spectra.

Nyquist does not have a special data type corresponding to a sequence of FFT frames. This would be nice, but it would mean creating a large set of operations suitable for processing frame sequences. Another approach, and perhaps the most "pure" would be to convert a single sound into a multichannel sound, with one channel per bin of the FFT.

Instead, Nyquist violates its "pure" functional model and resorts to objects for FFT processing. A sequence of frames is represented by an XLISP object. Whenever you send the selector :next to the object, you get back either NIL, indicating the end of the sequence, or you get an array of FFT coefficients.

The Nyquist function FFT (mnemonic, isn't it?) returns one of the frame sequence generating objects. You can pass any frame sequence generating object to another function, IFFT, and turn the sequence back into audio.

With  FFT and IFFT, you can create all sorts of interesting processes. The main idea is to create intermediate objects that both accept and generate sequences of frames. These objects can operate on the frames to implement the desired spectral-domain processes. Examples of this will follow, but first, let's try something simpler.

Simple FFT Example

[Special thanks to James Valenti for detailed comments and explanations about this code. -RBD]

Just to convince ourselves that this works and to explore the FFT function, let's perform an FFT on a sinusoid. To make the outcome predictable, we will use 32 samples of sinusoid with a period of 8 samples. If we take a 32-point FFT, the spectrum should show a strong 4th harmonic (there are 4 8-sample periods) and zero everywhere else.

Before the test, we will introduce a simple function to create an FFT sequence generating object, which is called an iterator: The function is called make-fft1-iterator, and creates an object of class fft1-class (I am using fft1 rather than fft because this is a simplified version that will be improved later). Objects of this class will have the fields sound, length, and skip. Create the class by sending class (the super class of all classes) a :new message to create a new class that will be a sub-class of class:

(setf fft1-class (send class :new '(sound length skip)))
Add a method named :next to the fft1-class. The method will have a formal argument list of () and the code body of (snd-fft sound length skip nil), so this is simply a wrapper for calling snd-fft which has takes as parameters: the source sound, the fft length, the number of samples to advance for each fft, and the sound to use as a window function (in this case, no windowing is indicated by nil.) Each time snd-fft is called, it will advance by skip to the next set of samples. The function (and thus the :next method) will return an array containing the results of the next short-time fft:
(send fft1-class :answer :next '() '(
    (snd-fft sound length skip nil)))
Add a method named :isnew to the fft1-class. This is a "standard" method that is called automatically to initialize a new object of the class when the :new method is sent to the class. Any parameters sent to :new are passed on to :isnew, and in this case the formal parameter list is (snd len skp) and the body of the function is three setf's that copy the parameters into the fields (or instance variables) of the new object, which are named sound, length, and skip:
(send fft1-class :answer :isnew '(snd len skp) '(
    (setf sound snd)
    (setf length len)
    (setf skip skp)))
Make a new function named make-fft1-iterator to create an fft1-iterator with the parameters sound, length, and skip. The function makes a new object by sending the :new message to the fft1-class. Because fft1-class consumes the sound (by calling snd-fft), this function copies the sound with snd-copy to avoid any unexpected side effects on the sound parameter.
(defun make-fft1-iterator (sound length skip)
  (send fft1-class :new (snd-copy sound) length skip))

For test data, create a sinusoid. lfo generates a one second sinusoid. The control-srate-abs transformation changes the sample rate generated by the lfo to points and since the duration is one second, there will be exactly points samples generated. The cycles parameter is the frequency of the sinusoid, so a value of 1 will generate one cycle (in one second) and a value of 2 will generate 2 cycles in one second, etc.

;; create a 1-second sinusoid with points samples at cycles hz:
(defun short-sine (points cycles)
  (control-srate-abs points (lfo cycles)))
In the fft-test function, the let introduces a list of local variables (just fft-iterfft-iter, it will be ignored by this function. Inside the let, fft-iter is set to a newly created instance of fft1-iterator, initialized with a signal that has 4 cycles of a sinusoid in 32 points, with an fft length of 32 samples and a skip of 32 samples (zero overlap):
(defun fft-test ()
  (let (fft-iter)
    ;; signal will have 4 cycles in 32 points:
    (setf fft-iter (make-fft1-iterator (short-sine 32 4) 32 32))
    (display "fft-test" (send fft-iter :next))))

Running this prints an array of nearly-zero values except for the 9th element (index 8), which is -1. The layout is as follows:

The output should look like this:
> (load "fft1.lsp")
;loading "fft1.lsp"
;[ gc: total 20640, 3360 free; samples 12 KB, 7KB free ]
;fft-test : (SEND FFT-ITER :NEXT) = #(1.53076e-017 0 0 0 0 0 0 -3.06152e-017
; -1 0 0 0 0 0 0 3.06152e-017
; 0 0 0 0 0 0 0 -3.06152e-017
; 8.9407e-008 0 0 0 0 0 0 1.53076e-017)
;T
;>

Thus, the element at index 8 is the 4th Sine component, indicating a 4th harmonic as expected.

Simple Inverse FFT Example

Now, let's try to reverse the FFT and recover the sinusoid. We will use the snd-ifft function, which takes five parameters:

The result returned from snd-ifft is a sound which is assigned here to local variable ifft-snd. In this case, the sound will have a start time of 0, a sample rate of 32, the spectral frames (actually just one frame) will be obtained by sending the :next message to fft-iter, the skip between successive frames is 32, and no windowing is applied.

(defun ifft-test ()
  (let (fft-iter ifft-snd)
    (setf fft-iter (make-fft1-iterator (short-sine 32 4) 32 32))
    (setf ifft-snd (snd-ifft 0 32 fft-iter 32 NIL))
    (display "fft-ifft" (snd-length ifft-snd 200))
    (display "fft-ifft" (snd-samples ifft-snd 200))
    ifft-snd )) ; return the sound

Now call the function:

(ifft-test)

Looking at the printed samples, you can see that the result is a close approximation to the expected sinusoid. Since ifft-test returns a sound, you can call (play (ifft-test)), but in this case you won't hear much because the sound is only 32 samples, and there may also be problems trying to play a sound that is only sampled at 32 samples per second.

Exercise: Modify short-sine to produce a cosine and observe the difference in the FFT. Confirm that the IFFT reproduces the original cosine. (Hint: the parameters to LFO should be: (lfo cycles 1.0 *sine-table* 90.0), indicating frequency, duration, the waveform, and the initial phase, respectively.)
 

Sequence of FFT Frames

Let's create an FFT iterator that reads sounds from a file. This is similar to what was done before, but here, s-read is used to read samples instead of using lfo to compute samples:

(defun file-fft1 (filename frame-length skip)
  (make-fft1-iterator (s-read filename) frame-length skip))

And here is a function to recover the sound file from spectral frames and play it. The sound will start at time 0, at the sample rate *sound-srate*, using iterator to generate successive spectral frames, advancing skip samples between fft's, and no windowing. Note: *sound-srate* is a global variable from Nyquist that is the default sample rate for sounds. It can be read directly, but you should only modify it by calling sound-srate-abs:

(defun play-fft1 (iterator skip)
  (play (snd-ifft 0 *sound-srate* iterator skip NIL)))

Define a filename for testing. Remember that the backslash (\) character is the escape character for XLisp strings, so you need to type two backslashes to denote one backslash character. Windows will also accept forward slashes (/) as a separator, and the forward slash does not need to be escaped. Be sure to test this with a mono file at the default sample rate of 44100:

;; a convenient sound file name (change this to one of your mono soundfiles):
(setf sfn "D:\\brain\\outro\\soup.wav")
Define a function to call play-fft1 using sfn as the input function for analysis, a frame length (fft size) of 512, and a skip size of 512 (no overlap). The start time (0), sample rate (*sound-srate*) and windowing function (none) are hard-coded in play-fft1.
(defun file-test () (play-fft1 (file-fft1 sfn 512 512) 512))

If you do not hear your original file, try playing the original file by typing (play-file sfn), and make sure the file is a mono file at 44,100 Hz sample rate (to match *sound-srate* used in play-fft1). Check out the file by typing (sf-info sfn).

You can try different skip parameters, e.g. to play the sound at double speed, try this:

(play-fft1 (file-fft1 sfn 512 512) 256)

XLisp and Garbage Collection

In the previous example, you may have noticed a lot of messages that look something like the following:

[ gc: total 18640, 3114 free; samples 49KB, 39KB free ]

What the gc: Messages Mean

These messages are printed whenever the garbage collector runs to reclaim memory that was allocated but is no longer in use by the program. The memory found by the garbage collector is reused in future computation to avoid allocating more memory than is necessary. The particular message here says that 18640 "cells" have been allocated. Each cell is 10 bytes, so there are 186,400 bytes, or about 186KB used to store XLisp programs and data. This does not include the XLisp interpreter, Nyquist functions, and system libraries, that take an additional 500KB or so. It also does not include audio samples, which are stored separately. Of the 18640 cells, 3114 were free after garbage collection. Of the 49KB of audio data (samples) allocated by Nyquist, 39KB were free for reuse after after garbage collection.

Making GC More Effective

Nyquist and XLisp do a pretty good job of effective, automatic garbage collection, but since XLisp was designed to run on small machines, it favors using a minimal amount of memory, and this may result in frequent garbage collections. FFT processing, in particular, makes heavy use of XLisp, since every floating point number in every FFT frame requires the allocation of a 10-byte cell. When you are processing with FFTs, you can often gain some speed by telling XLisp to allocate more memory cells. Do this by calling expand:

(expand 100)

The parameter tells how many additional internal blocks of memory to allocate. Each block is 2000 cells, or about 20KB, so if the parameter is 100, you will allocate about 2MB. After allocating memory, the next garbage collection might look like:

[ gc: total 218640, 204298 free; samples 49KB, 44KB free ]

Notice that now, garbage collection frees a higher proportion of memory than before   (204298 out of 218640 cells). Garbage collection is more efficient when there is a higher proportion of memory to free.

On the other hand, you should not allocate more memory than necessary. If you allocate too much, parts of Nyquist and other programs will be paged out of RAM and onto your disk drive. Since Nyquist must access all of its memory for every garbage collection, it is very inefficient to have any of that memory reside on disk.

Should you always expand memory? Almost all non-FFT audio processing uses special data structures that are outside of XLisp and are managed without the use of the garbage collector. Since audio processing normally takes over 95% of the execution time, you normally do not have to worry much about the efficiency of XLisp.

Simple Spectral Processing

The main purpose of FFT and IFFT functions is to enable processing in the spectral domain. Let's create a very simple example in which certain frequencies are set to zero, creating a filter effect. (Note that this is not a very good way to create filters because of boundary effects.) We need to create an object that will access an FFT iterator to get frames, and that will serve as an iterator to deliver frames. The code is as follows:

(setf fft-hp-class (send class :new '(source bins)))

(send fft-hp-class :answer :next '() '(
  ; get a new frame of fft coefficients from source
  (let ((frame (send source :next))) ; creates local variable - frame
    ; a note about let:
    ; the first list after let names new local variables
    ; the following lists (below) are expressions to be evaluated
    ;
    ; if frame is nil, there are no more frames to process
    (cond (frame
           ; if frame is not nil, then do the following
           (dotimes (i bins)
             ; set all the bins from 0 to bins-1 (these are low frequency bins) to zero
             ; the order of coefficients is:
             ;    DC coefficient
             ;    first real
             ;    first imaginary
             ;    second real
             ;    second imaginary
             ;    ...
             ;    if the length is even, then last is the real
             ;       coefficient of the Nyquist frequency
             ;
             ; (Note: aref gets the ith element of the array)
             (setf (aref frame i) 0.0)))) ; end of cond
    frame))) ; frame is the last expression of the let, so it is the return value
Create an fft-hp-class method called :isnew to initialize the fft-hp-class class variables source and bins. The source will be a file-fft.

A refresher as to what goes on with file-fft: file-fft1 reads a file into a sound variable and calls make-fft1-iterator. Then make-fft1-iterator returns an fft-1 class by sending fft1-class :new with a copy of the sound. It makes this copy because it will have skip samples removed every invocation of fft1-class:next because this method calls snd-fft which removes the samples every time a new fft frame is calculated.

(send fft-hp-class :answer :isnew '(s b) '(
    (setf source s)
    (setf bins b)))
This is a function to make and return an instance of the fft-hp-class class.
(defun make-fft-hp (source bins)
  ; Send the fft-hp-calss a :new message to create a new instance of the class
  ; and to initialize the instance with the values of source and bins
  ; Returns an object that responds to the :next message by getting the next
  ; fft frame, zeroing the low frequencies, and returning the frame.
  (send fft-hp-class :new source bins))
In play-fft1, fft frames from make-fft-hp are converted back into audio. The audio is reconstructed using a skip of 512.
(defun hp-test ()
  (play-fft1 (make-fft-hp (file-fft1 sfn 512 512) 11) 512))

Note that make-fft-hp takes a parameter, bins, that tells how many coefficients of the FFT to zero. At 44,100 Hz, the frame rate is 44100/512 = 86.13 Hz, and this corresponds to the first bin (coefficients at locations 1 and 2). Since we specified 11 for the bins parameter, this will zero the DC component and 5 complex pairs, representing frequencies up to 86.12 * 5 = 430.7 Hz. (If you are trying this on a laptop computer with small speakers, or if you are using a sound without low frequencies, you may want to increase the number of bins in hp-test from 11 to 30 to make the effect really obvious.)

You will notice some raspy sounds in the output. Since the FFT frames are not smoothed by any window or overlapped, there are serious windowing artifacts that are easily heard. We will look at ways to reduce this problem later.

Spectral Modulation

An interesting effect is to let one spectrum modulate another one. By multiplying the amplitude spectrum of one signal onto that of another, one can superimpose the two signals. (Note that adding spectra is simply a way to mix them and is equivalent to addition in the time domain.)

Let's create a bright FM sound to serve as a spectral modulation "carrier". First, create the FM sound. This sound has an initial, middle, and fiinal modulation index as well as a (constant) pitch:

(defun fm-tone (step mi1 mi2 mi3)
  (let ((hz (step-to-hz step)))
    (setf mi1 (* mi1 hz))
    (setf mi2 (* mi2 hz))
    (setf mi3 (* mi3 hz))
    (fmosc c4 (partial step 
                       (control-srate-abs *sound-srate* 
                         (pwl 0 mi1 0.5 mi2 1 mi3 1))))))

The modulated sound will be a cluster of three FM tones. You could use just one tone, but I have had better luck with chords or noise. You want to make sure the signal has plenty of amplitude at many frequencies; otherwise, the spectral modulation will not have anything to modulate.

(defun mod-snd ()
  (sum
    (fm-tone c3 15 20 15)   ;; adjust FM parameters here
    (fm-tone d3 15 20 15)   ;; adjust FM parameters here
    (fm-tone e3 15 20 15))) ;; adjust FM parameters here

Next, we need a class to do the modulation. As before, objects of this class respond to the :next message. In this case, the :next method sends  :next to both sources. It then multiplies the coefficients of one by the computed amplitude spectrum of the other.

(setf fft-modulator-class (send class :new '(src1 src2)))

(send fft-modulator-class :answer :isnew '(s1 s2) '(
    (setf src1 s1)
    (setf src2 s2)))

(send fft-modulator-class :answer :next '() '(
  (let ((frame1 (send src1 :next))
        (frame2 (send src2 :next))
        n half_n)
    (cond ((and frame1 frame2)
           ; multiply frame2 by the amplitude coefficients of frame1
           (setf (aref frame2 0) (* (aref frame2 0) (aref frame1 0))) ;; DC
           (setf n (- (length frame1) 1))
           ; Subtracted 1 because we already took care of DC component
           (setf half_n (/ n 2)) ; integer divide
           (dotimes (i half_n)
             (let* ((i2 (+ i i 2))
                    (i2m1 (- i2 1))
                    (amp (sqrt (+ (* (aref frame1 i2m1) (aref frame1 i2m1))
                                  (* (aref frame1 i2)   (aref frame1 i2))))))
                (setf (aref frame2 i2m1) (* (aref frame2 i2m1) amp))
                (setf (aref frame2 i2) (* (aref frame2 i2) amp))))
           (cond ((= n (+ half_n half_n 2)) ;; n is even -> nyquist component
                  (setf (aref frame2 n) (* (aref frame2 n) (aref frame1 n)))))
           frame2)
          (t nil)))))

(defun make-fft-modulator (src1 src2)
  (send fft-modulator-class :new src1 src2))

The code for  :next is longer than you might expect, but it is basically just a vector multiplication. Lisp is not ideal for numerical algorithms like this. The following code will test the new class:

(defun mod-test ()
  (let ((fs 512)) ;; frame size
    (play-fft1 (make-fft-modulator 
                 (file-fft1 sfn fs fs)
                 (make-fft1-iterator (mod-snd) fs fs))
               fs)))

Here, we pass in iterators for the sound stored in a file and for the modulated sound. This example seems to work best if the sound in the file is a voice. You can also try stretching the speech by reducing the step parameter to file-fft and by making the modulated sound longer. You might also try different sizes of FFTs using the frame size (fs) parameter. If the FFT frame size is too big, you will resolve individual harmonics of the voice, and if it is too small, you will not resolve formants. 256 and 512 seem to be good numbers when working at 44,100 Hz sampling rates.

Smoothing Windows

In the previous examples, and especially in the last one, a lot of buzzing and noise is created because the boundaries of the FFT frames are no longer continuous when the spectrum is altered. To reduce the buzzing effect, we can multiply the frame by a smoothing window before reconstructing the time domain signal. We can also use a smoothing window on the input side, before taking the FFT. For now, we will try just a smoothing window on the IFFT side.

The smoothing window will be a raised cosine pulse, although other window shapes could be used. The following code implements a raised cosine. (A raised cosine is a smooth bell-shaped curve that rises from zero to 1 and back again. The curve is one period of a sinusoid or cosine, but "raised" so that its minimum value is at zero.) The lfo function computes a sine curve, so we use an initial phase of 270 degrees to get the proper shape:

(defun raised-cosine ()
  (scale 0.5 
    (sum (const 1) 
         (lfo (/ 1.0 (get-duration 1)) 1 *sine-table* 270))))

Next, we need a function to create a window of the proper length. The sample rate of the window does not matter because the FFT and IFFT functions will simply take the proper number of samples (without any interpolation or resampling). Therefore, we will use the frame-size as the sample rate, insuring that there are frame-size samples:

(defun fft-window (frame-size)
  (control-srate-abs frame-size (raised-cosine)))

Now we can pass in a window to the IFFT. Let's redo the spectral modulation example with smoothing windows on the IFFT side. First, we will rewrite play-fft to use windowing:

(defun play-fft (iterator frame-size skip)
  (play (snd-ifft 0 *sound-srate* iterator 
                  skip (fft-window frame-size))))

Now, we can rewrite mod-test:

(defun mod-test-w ()
  (let ((fs 512)) ;; frame size
    (play-fft (make-fft-modulator 
                (file-fft1 sfn fs (/ fs 2))
                (make-fft1-iterator (mod-snd) fs (/ fs 2)))
              fs (/ fs 2))))

Notice that we reduced the step size to half the frame size. This is because we want the windows to overlap by 50%, creating a nice cross-fade from one frame to the next.

You might guess that the square windowing on the analysis (FFT) side is injecting high frequencies into the spectrum. We can introduce windowing there too. We can modify make-fft-iterator to use a window. This will require changes to fft1-class, so we will use the name fft-class:

(setf fft-class (send class :new '(sound length skip window)))

(send fft-class :answer :next '() '(
    (snd-fft sound length skip window)))

(send fft-class :answer :isnew '(snd len skp) '(
    (setf sound snd)
    (setf length len)
    (setf skip skp)
    (setf window (fft-window len)) ))

(defun make-fft-iterator (sound length skip)
  (send fft-class :new (snd-copy sound) length skip))

Now, we can rewrite file-fft to use windowing:

(defun file-fft (filename frame-length skip)
  (make-fft-iterator (s-read filename) frame-length skip))

Now, we can rewrite mod-test yet again:

(defun mod-test-ww ()
  (let ((fs 512)) ;; frame size
    (play-fft (make-fft-modulator 
                (file-fft sfn fs (/ fs 2))
                (make-fft-iterator (mod-snd) fs (/ fs 2)))
              fs (/ fs 2))))

This version seems to have less definition for speech input than the previous one. It might be that some windowing artifacts in the right places are actually beneficial! In fact, some simple experiments indicate that, in this example, the speech sounds better when the (mod-snd) sound is analyzed without windowing. This probably creates more noise to be modulated by the speech signal, so more speech information gets through.

We can also stretch the result by taking smaller steps during the analysis (FFT) stage so that more frames are generated. I will also increase the frame size to obtain fewer frames/second.

(defun mod-test-wws ()
  (let ((fs 1024)) ;; frame size
    (play-fft (make-fft-modulator 
                (file-fft sfn fs (/ fs 16))
                (make-fft1-iterator (mod-snd) fs (/ fs 16)))
              fs (/ fs 2))))

Acknowledgments

Cort Lippe and Zack Settel have explored cross synthesis and other effects with FFTs extensively, and their work inspired these examples.