📐 Module 4 · The Math Arsenal · Chapter 4.5 · 11 min read

Fourier Transform

Write any signal as a sum of sinusoids — a bridge from signal to compute.

What you'll learn here

  • Explain the idea of decomposing a signal into frequency components
  • Write the continuous and discrete (DFT) Fourier transforms
  • State the FFT's O(N log N) complexity and why it matters
  • Recall the convolution theorem (multiplication in time = MVM in frequency)
  • Show why SIDRA crossbars often skip FFT by doing convolution directly

Hook: Notes in Music = Secret of Signals

Press C, E, G on a piano simultaneously — you hear a chord. The chord is a complex wave, but your brain separates it into three notes. That’s the natural Fourier transform your ear performs.

Joseph Fourier showed in 1822: every periodic signal can be written as a sum of sines and cosines. That’s the foundation of modern signal processing (radio, audio, image, MRI, JPEG, MP3, WiFi, 5G — all of it).

Why is Fourier important for AI?

  1. Convolution theorem: convolution in the time domain = multiplication in the frequency domain. Foundation of CNNs.
  2. FFT (Fast Fourier Transform): O(NlogN)O(N \log N) algorithm, taking N2N^2 down to NlogNN \log N. One of the cornerstones of modern computing.
  3. Spectral methods: processing signals in frequency space is often more efficient (filtering, compression, denoising).

Interesting question for SIDRA: can the crossbar do Fourier? Yes — the DFT is an MVM (matrix × vector). But more interesting: SIDRA can do convolution directly in the crossbar → it doesn’t need FFT.

This chapter builds Fourier from intuition to math, reveals the FFT trick, and shows why SIDRA “skips” Fourier.

Intuition: Signal = Sum of Frequencies

A square wave looks simple, but it has infinitely many frequency components:

square(t)=4π(sint+sin3t3+sin5t5+sin7t7+)\text{square}(t) = \frac{4}{\pi}\left(\sin t + \frac{\sin 3t}{3} + \frac{\sin 5t}{5} + \frac{\sin 7t}{7} + \ldots\right)

Only odd harmonics, with falling amplitude. Add enough and a square wave emerges (Gibbs phenomenon leaves a small ripple, but the approximation is good).

General principle: every “well-behaved” signal is a sum of sinusoids. Decomposing it = the Fourier transform.

A signal’s “spectrum”:

SignalSpectrum
Single sinusoidOne peak (at that frequency)
Square waveOdd harmonics 1f, 3f, 5f, …
ImpulseFlat spectrum (equal at every frequency)
White noiseFlat spectrum (random)
Beethoven’s 5thCluster of complex peaks

Practical use:

  • Filtering: remove unwanted frequencies (bass/treble, denoising).
  • Compression: JPEG/MP3 drop the unimportant frequencies (compression).
  • Function analysis: which frequencies dominate? FFT shows.
  • Convolution: filtering is a convolution; in the frequency domain it becomes multiplication → fast.

Formalism: Continuous and Discrete Fourier

L1 · Başlangıç

Continuous Fourier transform:

For a continuous signal x(t)x(t):

X(f)=x(t)e2πiftdtX(f) = \int_{-\infty}^{\infty} x(t) e^{-2\pi i f t} \, dt

Inverse:

x(t)=X(f)e2πiftdfx(t) = \int_{-\infty}^{\infty} X(f) e^{2\pi i f t} \, df

X(f)X(f) is complex: amplitude = how strong, phase = when.

Discrete Fourier (DFT):

For arrays (NN samples):

Xk=n=0N1xne2πikn/N,k=0,1,,N1X_k = \sum_{n=0}^{N-1} x_n e^{-2\pi i k n / N}, \quad k = 0, 1, \ldots, N-1

This is an MVM — matrix entries Wkn=e2πikn/NW_{kn} = e^{-2\pi i kn/N}. Complexity: O(N2)O(N^2).

FFT (Fast Fourier Transform):

Cooley-Tukey 1965 algorithm: even/odd split + recursion. Complexity O(NlogN)O(N \log N).

NNDFTFFT
25665,5362,048
10241M10,240
1M10¹²20M

Without the FFT, modern digital signal processing would be impossible. Maybe the most important algorithm in modern computing.

L2 · Tam

Convolution theorem:

Two signals x(t)x(t) and h(t)h(t). Convolution:

(xh)(t)=x(τ)h(tτ)dτ(x * h)(t) = \int x(\tau) h(t - \tau) \, d\tau

Fourier transform:

F{xh}=X(f)H(f)\mathcal{F}\{x * h\} = X(f) \cdot H(f)

Convolution in time = multiplication in frequency. Very powerful:

  • Time-domain convolution complexity: O(N2)O(N^2).
  • Go to frequency + multiply + come back (FFT): O(NlogN)O(N \log N).

Practical effect: long convolutions (radar, audio, image processing) are accelerated with FFT.

CNN tie-in:

A CNN’s conv layer = convolution with a small hh (filter, 3×3, 5×5). Small filter → direct convolution is fast; FFT usually unnecessary.

But with large filters (e.g., global attention) the FFT kernel can be used.

Convolution on SIDRA:

The SIDRA crossbar can do convolution directly. Sliding-window MVM:

  • Filter weights programmed into the crossbar.
  • For each position, the input window enters, MVM runs → output pixel.
  • N×N filter, M×M image → M² × N² MVMs.

Crossbar ~10 ns/MVM → full image convolution at microsecond scale.

Need the FFT? For SIDRA, usually no — small filters direct, large filters could use FFT (still MVM, still maps to crossbar).

L3 · Derin

FFT on the crossbar:

The DFT matrix Wkn=e2πikn/NW_{kn} = e^{-2\pi i kn/N} is fixed. Programmable into the crossbar → DFT becomes an MVM.

But: complex numbers. Memristors are real-valued conductances. Fix: 2× size → store real and imaginary parts separately. An N×NN \times N DFT matrix → a 2N×2N2N \times 2N real crossbar.

256-DFT: 256×256 → 512×512 real = 4 SIDRA crossbars. Possible, but the FFT’s O(NlogN)O(N \log N) advantage disappears (the crossbar is already parallel, O(1)O(1)).

Practical decision: in SIDRA, do compute-heavy convolution directly in the crossbar; FFT not needed. FFT is more sensible on a digital CPU/DSP (specialized hardware exists).

Other Fourier-like transforms:

  • DCT (Discrete Cosine Transform): the basis of JPEG, compression.
  • Hadamard transform: ±1 entries → no multiplication, only ±. Fast approximation.
  • Wavelet: time + frequency together. Wavelet-based compression (JPEG-2000).

All MVM-based → doable on SIDRA, but specialized digital hardware is usually more efficient.

Spectral methods in AI:

Recent years have brought spectral neural networks:

  • Fourier Neural Operators (Li et al. 2020): partial-differential-equation solvers.
  • Spectral attention (Lee-Thorp et al. 2022): FFT to accelerate attention.
  • Wavelet transformer (Wang et al. 2024): hybrid time-frequency.

These models are interesting for SIDRA — partly hybrid (CMOS FFT + crossbar MVM) at the Y10 horizon.

Experiment: Compute a 4-Sample DFT by Hand

4-sample sequence: x=(1,0,1,0)\mathbf{x} = (1, 0, -1, 0).

DFT:

Xk=n=03xne2πikn/4=n=03xneiπkn/2X_k = \sum_{n=0}^{3} x_n e^{-2\pi i k n / 4} = \sum_{n=0}^{3} x_n e^{-i\pi k n / 2}

X0=1+01+0=0X_0 = 1 + 0 - 1 + 0 = 0 X1=1+0eiπ/21eiπ+0=1(1)=2X_1 = 1 + 0 \cdot e^{-i\pi/2} - 1 \cdot e^{-i\pi} + 0 = 1 - (-1) = 2 X2=1+01ei2π+0=11=0X_2 = 1 + 0 - 1 \cdot e^{-i 2\pi} + 0 = 1 - 1 = 0 X3=1+01ei3π+0=1(1)=2X_3 = 1 + 0 - 1 \cdot e^{-i 3\pi} + 0 = 1 - (-1) = 2

X=(0,2,0,2)\mathbf{X} = (0, 2, 0, 2)

Interpretation: the signal looks like cos(πn/2)\cos(\pi n / 2) (only k=1k=1 and k=3k=3 peaks). As expected: the sequence is literally a cos\cos wave (1, 0, -1, 0).

On a SIDRA crossbar:

DFT matrix WW:

W=[11111i1i11111i1i]W = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & -i & -1 & i \\ 1 & -1 & 1 & -1 \\ 1 & i & -1 & -i \end{bmatrix}

Split real and imaginary:

WR=[1111101011111010],WI=[0000010100000101]W_R = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 0 & -1 & 0 \\ 1 & -1 & 1 & -1 \\ 1 & 0 & -1 & 0 \end{bmatrix}, \quad W_I = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & -1 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & -1 \end{bmatrix}

Two separate crossbars (4×4 each) compute the DFT. Result: real + i × imaginary.

But for digital FFT: a 4-DFT only needs 4log4=84 \log 4 = 8 complex multiplies, sub-microsecond. SIDRA is overkill at this size. For a 1024-DFT (common in signal processing), the crossbar becomes a real option.

Quick Quiz

1/6Core claim of the Fourier transform?

Lab Exercise

Convolution vs FFT on SIDRA crossbar for a 256-sample signal.

Data:

  • Signal: xR256x \in \mathbb{R}^{256} (e.g. audio).
  • Filter: hR32h \in \mathbb{R}^{32} (e.g. low-pass).
  • Convolution y=xhy = x * h: y=256+321=287|y| = 256 + 32 - 1 = 287.

Compute complexity:

(a) Direct-convolution FLOPs? (b) FFT-based: FFT(xx, 512) + FFT(hh, 512, zero-pad) + elementwise mul + inverse FFT(512). FLOPs? (c) Direct convolution on SIDRA: how many MVMs, how many ns? (d) FFT on SIDRA: 512-DFT crossbar size, MVMs, time? (e) Which is better for SIDRA?

Solutions

(a) Direct: 287 outputs × 32 multiplies + ~31 adds = 287 × 63 ≈ 18K FLOPs.

(b) FFT(512) = 512 × log 512 = 512 × 9 ≈ 4600 FLOPs/transform. 3 transforms (x, h forward + inverse) + 512 elementwise multiplies = ~14,800 FLOPs. Less than direct, but the log advantage is small (filter is small).

(c) Direct on SIDRA: 32-input filter. In a 256×256 crossbar, use 32 rows → 287 sliding windows. Each ~10 ns. Total: ~2900 ns ≈ 3 µs.

(d) FFT on SIDRA: 512×512 DFT matrix → 4 crossbars (256×256). Real + imaginary split → 8 crossbars. 3 FFT × 4 MVM = 12 MVMs × 10 ns = 120 ns. Plus elementwise: 5 ns. Total ~125 ns. Much faster.

(e) For large signals, FFT on SIDRA is good — especially for parallel batches. For small signals, direct convolution suffices. Hybrid: pick by signal size. Compiler decision (chapter 6.7).

Cheat Sheet

  • Fourier: every signal is a sum of sinusoids.
  • DFT: discrete transform, Xk=xne2πikn/NX_k = \sum x_n e^{-2\pi ikn/N}. Complexity O(N2)O(N^2).
  • FFT: Cooley-Tukey 1965, O(NlogN)O(N \log N). A cornerstone of modern computing.
  • Convolution theorem: F{xh}=XH\mathcal{F}\{x*h\} = X \cdot H. Math basis of CNNs.
  • SIDRA tie: DFT is an MVM → crossbar can do it, but direct convolution is often more efficient.
  • Hybrid: FFT crossbar for large signals; direct for small.

Vision: Spectral AI and SIDRA's Place

Most AI today works in the time domain (Transformers included). Spectral AI is a new wave:

  • Y1 (today): Convolution direct on SIDRA; spectral analysis on CMOS FFT.
  • Y3 (2027): Spectral attention prototype (Lee-Thorp style). Language model inference 2-3× faster.
  • Y10 (2029): Hybrid time-frequency processing. Image and audio models.
  • Y100 (2031+): Fourier Neural Operator hardware. Weather, fluid dynamics, PDE-based scientific AI.
  • Y1000 (long horizon): Quantum FFT (Shor’s basis) hybrids. Classical + quantum spectral AI.

Meaning for Türkiye: spectral AI is new — big rivals haven’t dominated yet. SIDRA + spectral processing = collaboration opening with research bodies like TÜBİTAK BİLGEM.

Unexpected future: a direct-physical-Fourier chip. Optical waveguides naturally separate a signal’s frequency components as light (a prism!). On a photonic SIDRA Y100, Fourier could be “free”.

Further Reading

  • Next chapter: 4.6 — Quantization and Quantization Error
  • Previous: 4.4 — Probability and Noise
  • Classical reference: Oppenheim & Schafer, Discrete-Time Signal Processing.
  • Fourier math: Bracewell, The Fourier Transform and Its Applications.
  • FFT discovery: Cooley & Tukey, An algorithm for the machine calculation of complex Fourier series, Math. Comput. 1965.
  • Fourier Neural Operators: Li et al., Fourier Neural Operator for Parametric Partial Differential Equations, ICLR 2021.
  • FNet (spectral attention): Lee-Thorp et al., FNet: Mixing Tokens with Fourier Transforms, NAACL 2022.