🔌 Module 5 · Chip Hardware · Chapter 5.10 · 10 min read

Noise Models

SIDRA's analog reality — controlling noise by design.

What you'll learn here

  • Name SIDRA's six noise sources (thermal, shot, 1/f, programming, IR drop, drift)
  • Apply the noise-model math (σ formulas)
  • Compute MVM SNR at cell/column/crossbar levels
  • Explain noise-aware compiler strategies
  • Validate Y1's tolerable noise budget against practical AI models

Hook: Analog = Noisy

Digital CMOS: a bit is 0 or 1. Noise = 0. SIDRA analog: continuous current. Every read is noisy.

We covered noise theory in 4.4. This chapter gives the practical SIDRA noise model: how many sources, which dominates, how to compute.

Bottom line: Y1 ~5% RMS noise → 6 effective bits → enough for INT8 AI. Y10 ~2% → 8 bits.

Intuition: 6 Noise Sources

When reading a SIDRA cell:

SourceChapterTypical magnitude
1. Thermal (Johnson)4.4~5 nA RMS
2. Shot4.4~5-20 nA RMS
3. 1/f (flicker)4.4~10 nA RMS long-term
4. Programming5.5 ISPP~5% absolute (~50 nA @ 1 µA)
5. IR drop5.12~5% systematic
6. Drift5.2~1% / year

Total (per cell): σ52+102+102+502+50275\sigma \approx \sqrt{5^2 + 10^2 + 10^2 + 50^2 + 50^2} \approx 75 nA (~7.5% on a 1 µA signal).

That’s low effective bits (~7). Crossbar averaging does better.

Formalism: Six Noise Sources

L1 · Başlangıç

1. Thermal (Johnson-Nyquist): σT2=4kTGΔf\sigma_T^2 = 4 k T G \Delta f Typical SIDRA: 5-10 nA.

2. Shot: σS2=2qIΔf\sigma_S^2 = 2 q I \Delta f Dominant at low currents: 5-20 nA.

3. 1/f (flicker): SI(f)=KI2/fS_I(f) = K I^2 / f Long-term drift source. RMS 10 nA.

4. Programming (post-ISPP):

With ISPP, σG1%\sigma_G \approx 1\%σI=σGV\sigma_I = \sigma_G \cdot V = 1 nA @ 100 µS, 0.25 V. Small.

Without ISPP (single-pulse) ~5% → 50 nA.

5. IR drop:

End-of-WL cells see less voltage. Systematic error 5%. Fix: double-ended drive (chapter 5.12).

6. Drift:

Retention is finite. Typical 1% drift per year. Annual refresh.

Combined model: σtotal2=σT2+σS2+σ1/f2+σP2+σIR2+σD2\sigma_{\text{total}}^2 = \sigma_T^2 + \sigma_S^2 + \sigma_{1/f}^2 + \sigma_P^2 + \sigma_{IR}^2 + \sigma_D^2

Independent-source assumption. In practice 1/f and drift partially correlated.

L2 · Tam

MVM-level SNR:

A crossbar reads 256 columns in parallel. Signal sums as N², noise as N (independent) → SNR rises by N.

Cell SNR (single read):

Signal 1 µA, noise 75 nA (all sources) → SNR = 13 → 22 dB.

Column SNR:

Signal = GiVi256Iiˉ\sum G_i V_i \approx 256 \cdot \bar{I_i} = ~256 µA. Noise 25675\sqrt{256} \cdot 75 = 1.2 µA. SNR = 256/1.2 = 213 → 47 dB.

256-column parallel crossbar:

Single-MVM SNR = 47 dB → ~8 effective bits. Good.

Model-level noise:

Single MVM is 8 bits. But AI models stack 10+ layers → noise compounds. 12-layer GPT-2: σ_out = √12 · σ_layer = 3.5 × 5% = 17%. Still tolerable (classification margin 20-50%).

Tolerance:

AI models tolerate noise. 5-10% RMS is acceptable. SIDRA operates in this band.

Averaging improvement:

4× reads → noise drops by √4 = 2×. Single MVM 15 ns → 4 reads 60 ns. Throughput 4× lower but SNR +6 dB (9 effective bits).

Averaging on critical layers, single read on non-critical. Compiler decision.

L3 · Derin

Noise-aware compiler (chapter 6.7):

The compiler analyzes model weights:

  • Big weights (|w| > 0.5): sensitive → SIDRA programming, averaging.
  • Small weights (|w| < 0.1): unimportant → prune, don’t write to crossbar.
  • Mid weights: standard 8-bit.

Whisper example:

30% pruning → smaller model, noise impact reduced.

Noise-injection training (QAT + NI):

Add ~5% noise to weights during training → model becomes robust. SIDRA inference noise is already learned.

Accuracy: standard INT8 76% → noise-injection training 76.5% (small gain).

Temperature effects:

HfO₂ conductance: G(T)=G0exp(Ea/kT)G(T) = G_0 \exp(-E_a/k T). Ea=0.2E_a = 0.2 eV typical → 25°C → 85°C swing 40%.

Calibration: thermal sensor per CU, calibrate per cluster. Temperature-aware scale factor.

Noise + temperature + drift = composite model:

The SIDRA simulator (chapter 6.8) simulates all. Used for model testing/validation.

Real Y1 estimate:

Lab measurements:

  • Single cell σ = 75 nA (lab).
  • Column σ (256 parallel) = 1.2 µA.
  • MVM effective bits = 8.
  • Model accuracy loss (INT8 benchmark) = 0.3-0.5%.

Y10 target: σ → 30% (ISPP improvement + tight layout), 10 effective bits.

Experiment: MNIST Inference Noise Impact

Model: MLP 784 → 128 → 10. 2 layers.

Noise:

  • Layer 1: σ = 5% relative.
  • Layer 2: σ = 5% relative.
  • Combined: σ_out = √(5² + 5²) = 7% relative.

MNIST accuracy:

  • FP32: 98%
  • INT8 quantized: 97.8%
  • INT8 + 5% noise (SIDRA): 97.5%
  • INT8 + 10% noise: 96.5%

Averaging effect (4×):

  • 5% → 2.5%.
  • New accuracy: 97.9% (very close to FP32).

Y10 (2% noise):

  • Single read 97.9%.
  • No averaging needed.

Energy:

  • Y1 + 4× averaging: 4 mJ × 4 = 16 mJ.
  • Y10 single read: 3 mJ. 5× more efficient.

Quick Quiz

1/6How many main noise sources in SIDRA?

Lab Exercise

Y1 noise budget limits.

Y1:

  • Cell σ = 75 nA.
  • Column σ = 1.2 µA.
  • MVM effective bits = 8.

Questions:

(a) Total noise for ResNet-50 (50 layers)? (b) Tolerable accuracy drop on ImageNet? (c) Which layers should use 4× averaging? (d) What does 25 → 85°C swing do? (e) What’s the Y10 σ target?

Solutions

(a) 50 layers × 5% RMS = 35% total (worst case; in practice the model tolerates much).

(b) ImageNet FP32 76% → SIDRA INT8 noise ~74-75% (1-2% drop).

(c) First 3 and last 3 layers (sensitive: closest to output). Middle layers single-read.

(d) Temperature-aware scaling: Gactual/Gtarget(T)G_{\text{actual}} / G_{\text{target}}(T). Compiler does periodic calibration. Net effect < 1%.

(e) Y10 σ target ~2% (ISPP improvements + tighter layout). 10 effective bits. Noise drops → accuracy holds at FP32 levels.

Cheat Sheet

  • 6 noise sources: thermal, shot, 1/f, programming, IR drop, drift.
  • Cell σ: ~75 nA (~7% on a 1 µA signal).
  • Column SNR: × N = 24 dB improvement.
  • Effective MVM: ~8 bits in Y1.
  • AI tolerance: 5-10% noise → 0.5-2% accuracy loss.
  • Mitigation: averaging, noise-injection training, temperature calibration.
  • Y10 target: σ 2%, 10 effective bits.

Vision: Make Noise a Feature

  • Y1: Noise tolerated.
  • Y3: Noise-aware compiler, weight-specific design.
  • Y10: Controlled-stochastic memristor (noise level tunable).
  • Y100: Noise as regularizer (Bayesian NN, MCMC).
  • Y1000: Noise = compute (probabilistic AI).

Further Reading

  • Next chapter: 5.11 — Power and Thermal Management
  • Previous: 5.9 — Compute Engine and DMA
  • Memristor noise: Rodriguez et al., Noise analysis in memristor-based neural networks, IEEE TED 2018.
  • Noise-aware training: Joshi et al., Accurate deep neural network inference using computational phase-change memory, Nature Comm. 2020.
  • Analog AI reliability: Ambrogio et al., Nature 2023.