Noise Models
SIDRA's analog reality — controlling noise by design.
Prerequisites
What you'll learn here
- Name SIDRA's six noise sources (thermal, shot, 1/f, programming, IR drop, drift)
- Apply the noise-model math (σ formulas)
- Compute MVM SNR at cell/column/crossbar levels
- Explain noise-aware compiler strategies
- Validate Y1's tolerable noise budget against practical AI models
Hook: Analog = Noisy
Digital CMOS: a bit is 0 or 1. Noise = 0. SIDRA analog: continuous current. Every read is noisy.
We covered noise theory in 4.4. This chapter gives the practical SIDRA noise model: how many sources, which dominates, how to compute.
Bottom line: Y1 ~5% RMS noise → 6 effective bits → enough for INT8 AI. Y10 ~2% → 8 bits.
Intuition: 6 Noise Sources
When reading a SIDRA cell:
| Source | Chapter | Typical magnitude |
|---|---|---|
| 1. Thermal (Johnson) | 4.4 | ~5 nA RMS |
| 2. Shot | 4.4 | ~5-20 nA RMS |
| 3. 1/f (flicker) | 4.4 | ~10 nA RMS long-term |
| 4. Programming | 5.5 ISPP | ~5% absolute (~50 nA @ 1 µA) |
| 5. IR drop | 5.12 | ~5% systematic |
| 6. Drift | 5.2 | ~1% / year |
Total (per cell): nA (~7.5% on a 1 µA signal).
That’s low effective bits (~7). Crossbar averaging does better.
Formalism: Six Noise Sources
1. Thermal (Johnson-Nyquist): Typical SIDRA: 5-10 nA.
2. Shot: Dominant at low currents: 5-20 nA.
3. 1/f (flicker): Long-term drift source. RMS 10 nA.
4. Programming (post-ISPP):
With ISPP, → = 1 nA @ 100 µS, 0.25 V. Small.
Without ISPP (single-pulse) ~5% → 50 nA.
5. IR drop:
End-of-WL cells see less voltage. Systematic error 5%. Fix: double-ended drive (chapter 5.12).
6. Drift:
Retention is finite. Typical 1% drift per year. Annual refresh.
Combined model:
Independent-source assumption. In practice 1/f and drift partially correlated.
MVM-level SNR:
A crossbar reads 256 columns in parallel. Signal sums as N², noise as N (independent) → SNR rises by N.
Cell SNR (single read):
Signal 1 µA, noise 75 nA (all sources) → SNR = 13 → 22 dB.
Column SNR:
Signal = = ~256 µA. Noise = 1.2 µA. SNR = 256/1.2 = 213 → 47 dB.
256-column parallel crossbar:
Single-MVM SNR = 47 dB → ~8 effective bits. Good.
Model-level noise:
Single MVM is 8 bits. But AI models stack 10+ layers → noise compounds. 12-layer GPT-2: σ_out = √12 · σ_layer = 3.5 × 5% = 17%. Still tolerable (classification margin 20-50%).
Tolerance:
AI models tolerate noise. 5-10% RMS is acceptable. SIDRA operates in this band.
Averaging improvement:
4× reads → noise drops by √4 = 2×. Single MVM 15 ns → 4 reads 60 ns. Throughput 4× lower but SNR +6 dB (9 effective bits).
Averaging on critical layers, single read on non-critical. Compiler decision.
Noise-aware compiler (chapter 6.7):
The compiler analyzes model weights:
- Big weights (|w| > 0.5): sensitive → SIDRA programming, averaging.
- Small weights (|w| < 0.1): unimportant → prune, don’t write to crossbar.
- Mid weights: standard 8-bit.
Whisper example:
30% pruning → smaller model, noise impact reduced.
Noise-injection training (QAT + NI):
Add ~5% noise to weights during training → model becomes robust. SIDRA inference noise is already learned.
Accuracy: standard INT8 76% → noise-injection training 76.5% (small gain).
Temperature effects:
HfO₂ conductance: . eV typical → 25°C → 85°C swing 40%.
Calibration: thermal sensor per CU, calibrate per cluster. Temperature-aware scale factor.
Noise + temperature + drift = composite model:
The SIDRA simulator (chapter 6.8) simulates all. Used for model testing/validation.
Real Y1 estimate:
Lab measurements:
- Single cell σ = 75 nA (lab).
- Column σ (256 parallel) = 1.2 µA.
- MVM effective bits = 8.
- Model accuracy loss (INT8 benchmark) = 0.3-0.5%.
Y10 target: σ → 30% (ISPP improvement + tight layout), 10 effective bits.
Experiment: MNIST Inference Noise Impact
Model: MLP 784 → 128 → 10. 2 layers.
Noise:
- Layer 1: σ = 5% relative.
- Layer 2: σ = 5% relative.
- Combined: σ_out = √(5² + 5²) = 7% relative.
MNIST accuracy:
- FP32: 98%
- INT8 quantized: 97.8%
- INT8 + 5% noise (SIDRA): 97.5%
- INT8 + 10% noise: 96.5%
Averaging effect (4×):
- 5% → 2.5%.
- New accuracy: 97.9% (very close to FP32).
Y10 (2% noise):
- Single read 97.9%.
- No averaging needed.
Energy:
- Y1 + 4× averaging: 4 mJ × 4 = 16 mJ.
- Y10 single read: 3 mJ. 5× more efficient.
Quick Quiz
Lab Exercise
Y1 noise budget limits.
Y1:
- Cell σ = 75 nA.
- Column σ = 1.2 µA.
- MVM effective bits = 8.
Questions:
(a) Total noise for ResNet-50 (50 layers)? (b) Tolerable accuracy drop on ImageNet? (c) Which layers should use 4× averaging? (d) What does 25 → 85°C swing do? (e) What’s the Y10 σ target?
Solutions
(a) 50 layers × 5% RMS = 35% total (worst case; in practice the model tolerates much).
(b) ImageNet FP32 76% → SIDRA INT8 noise ~74-75% (1-2% drop).
(c) First 3 and last 3 layers (sensitive: closest to output). Middle layers single-read.
(d) Temperature-aware scaling: . Compiler does periodic calibration. Net effect < 1%.
(e) Y10 σ target ~2% (ISPP improvements + tighter layout). 10 effective bits. Noise drops → accuracy holds at FP32 levels.
Cheat Sheet
- 6 noise sources: thermal, shot, 1/f, programming, IR drop, drift.
- Cell σ: ~75 nA (~7% on a 1 µA signal).
- Column SNR: × N = 24 dB improvement.
- Effective MVM: ~8 bits in Y1.
- AI tolerance: 5-10% noise → 0.5-2% accuracy loss.
- Mitigation: averaging, noise-injection training, temperature calibration.
- Y10 target: σ 2%, 10 effective bits.
Vision: Make Noise a Feature
- Y1: Noise tolerated.
- Y3: Noise-aware compiler, weight-specific design.
- Y10: Controlled-stochastic memristor (noise level tunable).
- Y100: Noise as regularizer (Bayesian NN, MCMC).
- Y1000: Noise = compute (probabilistic AI).
Further Reading
- Next chapter: 5.11 — Power and Thermal Management
- Previous: 5.9 — Compute Engine and DMA
- Memristor noise: Rodriguez et al., Noise analysis in memristor-based neural networks, IEEE TED 2018.
- Noise-aware training: Joshi et al., Accurate deep neural network inference using computational phase-change memory, Nature Comm. 2020.
- Analog AI reliability: Ambrogio et al., Nature 2023.