Probability and Noise
Memristor noise isn't always a bug — sometimes it's a feature.
Prerequisites
What you'll learn here
- Define random variable, expected value (E), variance (Var)
- State the formulas for Normal (Gaussian), Bernoulli, Poisson distributions and their use cases
- Explain the physics of thermal (Johnson), shot, and 1/f noise
- Compute SNR (Signal-to-Noise Ratio) for a SIDRA crossbar
- Show how noise can be useful in AI (regularizer, dropout)
Hook: Perfection Isn't Possible — and Isn't Needed
An ideal chip: every signal precise, every measurement correct, every computation deterministic. A practical chip: noise on every signal, error in every measurement, estimation in every computation.
A SIDRA Y1 cell stores 8-bit (256-level) conductance. But thermal noise, shot noise, drift, IR drop, and temperature swings drop effective accuracy to ~6 bits. Two bits lost. Is that a problem?
Answer: usually not. Sometimes an advantage.
- 6 bits is enough for AI inference (INT8 is standard, INT4 is widespread).
- Noise plays the role of a regularizer in classical AI (dropout, weight noise).
- The brain synapse is already noisy (vesicles are probabilistic) — feature, not bug.
- SIDRA’s real position: not “deterministic digital”, but “noisy but efficient analog”.
This chapter covers probability fundamentals, noise sources, how SIDRA measures and tames them, and shows that noise can help AI learning.
Intuition: Probability and Expected Value
A random variable (RV) is a variable whose value is random.
- A die roll: , equal probability.
- A memristor read current: , “true” value, Gaussian noise.
Probability distribution: the probability of every value.
- Die: for every .
- Memristor: — zero-mean Gaussian.
Expected value (E): long-run average.
- Die: .
- Memristor: (noise has zero mean).
Variance: how scattered around the mean.
- Die: .
- Memristor: .
Standard deviation: . Same units, “typical deviation” size.
- Memristor: . Typical SIDRA: of .
Intuition: a single measurement is noisy, but the average of many measurements is much sharper. Central limit theorem: the standard deviation of an -sample average is . 100 measurements → 10× improvement.
SIDRA practical use: if a single MVM is repeated 10× over 100 µs, effective accuracy rises from 6 bits to ~9 bits — but speed drops 10×. Trade-off.
Formalism: Distributions, Noise Models, SNR
Three core distributions:
Bernoulli: , .
- Expected value: .
- Variance: .
- Use: single-bit event (vesicle release, bit read).
Normal (Gaussian): .
- Density: .
- , .
- Use: thermal noise, measurement error, weight initialization.
Poisson: , .
- .
- Use: spike count, photon count, rare events.
Expected-value rules:
- Linearity: .
- If independent: .
Variance rules:
- .
- If independent: .
Three physical noise sources:
1. Thermal noise (Johnson-Nyquist):
- = Boltzmann (1.38 × 10⁻²³ J/K)
- = temperature (K)
- = conductance (S)
- = bandwidth (Hz)
Numbers: K, µS, MHz → A = 12.9 nA.
Typical MVM output: 1-10 µA → SNR ≈ 770 → 30-40 dB.
2. Shot noise:
- = electron charge (1.6 × 10⁻¹⁹ C)
- = average current
Numbers: µA, MHz → = 5.7 nA.
Dominates at low current. Same order as thermal.
3. 1/f (flicker) noise:
- = material constant (HfO₂ ≈ 10⁻¹¹).
Grows as frequency drops (slow drift source). Dominates over long retention.
Total noise (independent sources):
SNR (Signal-to-Noise Ratio):
In dB: .
- 30 dB → 1000× signal:noise → ~5 effective bits.
- 40 dB → 10000× → ~6.5 bits.
- 60 dB → 10⁶× → ~10 bits.
SIDRA Y1 target: ~30-40 dB.
Crossbar noise in detail:
Total noise current in one column of a 256×256 crossbar:
So .
Signal also sums: (mean).
SNR: .
Crossbar SNR is × the per-cell SNR. Good news.
Programming noise:
You can’t program a memristor exactly. After ISPP: , (ISPP) or (basic).
This noise is persistent — unlike thermal, it doesn’t change per read. In AI it acts as “weight quantization noise”. Modern DL is designed to tolerate it (post-training quantization).
Drift:
Conductance changes slowly: . Typical: ~5% drift per year.
Fix: periodic refresh (re-program a few cells per month) or drift-aware compiler (predict and compensate).
Is noise bad for AI?
Surprisingly: mostly not, sometimes helpful:
- Weight noise = stochastic regularizer: adding small noise to weights reduces overfitting (Hinton et al. 1992).
- Dropout: randomly disable neurons during training → more robust model. SIDRA’s natural “sneak path” noise can do something similar.
- Stochastic gradient: SGD’s strength is its noise → finds good minima.
- Bayesian networks: weights are actually distributions. SIDRA hardware noise produces this naturally.
SIDRA Y10 target: controlled-stochastic memristor — noise level tunable by design. Optimize per AI workload.
Experiment: Compute the SNR of a Cell
A SIDRA Y1 cell:
- µS (between HRS and LRS)
- V (read voltage)
- K
- MHz
- ISPP programming noise: → µS
Signal current: µA.
Thermal noise: nA.
Shot noise: nA.
Programming noise (in current units): µA = 250 nA.
Total: nA.
Programming noise dominates (much larger than thermal/shot).
SNR: dB.
Effective bits: bits.
Crossbar level (256 columns): SNR rises 256× → , ~42 dB, ~7 effective bits.
Bottom line: SIDRA Y1 has ~7 effective bits per column. Enough for INT8 inference; not for FP32.
Improvement paths:
- Tighter ISPP: → programming noise drops 50%.
- Multi-read averaging (4 samples): drops 50%.
- Cold operation (T = 250 K): thermal noise drops 15%.
Y10 target: ~50 dB SNR, ~9-10 effective bits.
Quick Quiz
Lab Exercise
SNR analysis for MNIST classification on SIDRA Y1.
Scenario:
- MNIST: 28×28 = 784 pixels, 10 classes.
- 2-layer MLP: 784 × 128 → 128 × 10.
- Each layer on SIDRA crossbars: first layer 4 crossbars (256×256), second layer 1 crossbar.
- Each cell: 6-7 effective-bit SNR.
- Inference: one forward pass.
Data:
- Typical MNIST classification accuracy (FP32 model): 98%.
- After INT8 quantization: 97.5% (1% loss).
- INT4 quantization: 94% (4% loss).
- 6-bit effective (SIDRA): 96-97% expected.
Questions:
(a) MVMs per inference? Latency in ns? (b) How much noise per MVM (mean current × 5%)? (c) How does total noise accumulate across 2 layers? (d) How many MVM averages to hit 96% classification? (e) How much does averaging extend inference? Practical?
Solutions
(a) Layer 1: 784×128. Crossbar 256×256 → 4 MVMs (parallel). Layer 2: 128×10 → 1 MVM. Total 5 MVMs, sequential. Time: 5 × 10 ns = 50 ns.
(b) Each MVM output ~10 µA. Programming noise 5% → 0.5 µA. Thermal/shot ~50 nA. Total ~510 nA per output → ~5% relative.
(c) Two layers in series → noise RMS adds: relative. As long as the classification margin exceeds this, accuracy holds.
(d) 5-bit effective → ~93% accuracy. 6-bit (single MVM) → 96%. 7-bit (4× averaging) → 97-98%. 4× averaging suffices.
(e) 4× averaging: 4 × 50 ns = 200 ns/inference. Still 5M inferences/s. Practical. SIDRA Y1 5M MNIST/s. Compare: H100 ~100M MNIST/s, but at 700 W. SIDRA ~150× slower but 230× less energy.
Note: Y1 is overkill for MNIST (419M cells, MNIST model needs ~100K). Real role: parallel batches of small models.
Cheat Sheet
- Random variable: value-is-random variable. = mean, = scatter.
- Three distributions: Normal (noise), Bernoulli (binary), Poisson (count).
- Three noises: Thermal (4kTG·Δf), Shot (2qI·Δf), 1/f (drift).
- Total noise: .
- SNR: signal²/noise². dB = 10 log₁₀ SNR.
- Crossbar SNR: cell SNR × N (parallelism wins).
- SIDRA Y1: ~30-40 dB SNR, ~6-7 effective bits.
- Noise = feature: stochastic regularizer, dropout effect, Bayesian nets.
Vision: Make Noise a Design Tool
Classical engineering: noise is the enemy. Modern AI: noise is a friend. SIDRA brings that paradigm into silicon:
- Y1 (today): Noise is “tolerated bad” — enough for INT8 inference.
- Y3 (2027): ISPP improvement + temperature compensation → SNR 50 dB, 9 bits.
- Y10 (2029): Controlled-stochastic memristor — noise level programmable. For Bayesian nets, dropout-replication, stochastic MAC.
- Y100 (2031+): Noise-aware compiler — train the model knowing per-cell noise profiles. Hardware-software co-design.
- Y1000 (long horizon): Noise-energy co-optimization. AI models use noise as a compute resource (sampling, MCMC).
Meaning for Türkiye: the noise-tolerant AI design race has just begun. SIDRA is an early move. Combine academia + workshop + industry (ASELSAN, ASELSAN AI etc.) → Türkiye’s first national “noise-aware AI architecture”.
Unexpected future: the stochastic AI era. Instead of today’s deterministic models, AI that gives probabilistic answers (a distribution of answers, with confidence). That mirrors the brain; SIDRA hardware stochasticity is the natural carrier. The Y100 version of ChatGPT returns not just an “answer” but “answer + confidence interval”.
Further Reading
- Next chapter: 4.5 — Fourier Transform
- Previous: 4.3 — Derivative and Gradient
- Probability foundation: Ross, A First Course in Probability.
- Stochastic processes: Ross, Introduction to Probability Models.
- Thermal noise: Nyquist, Thermal agitation of electric charge in conductors, Phys. Rev. 1928.
- Shot noise: Schottky 1918 (original).
- Memristor noise: Suri et al., Physical aspects of low power synapses based on phase change memory devices, J. Appl. Phys. 2012.
- Noise as regularizer in AI: Hinton et al., Improving neural networks by preventing co-adaptation of feature detectors, arXiv 2012 (dropout).
- Bayesian neural networks: Neal, Bayesian Learning for Neural Networks, Springer 1996.