🔌 Module 5 · Chip Hardware · Chapter 5.8 · 10 min read

MUX, Decoder, and Analog ECC

Pick the cell, fix the error — the control backbone of large arrays.

What you'll learn here

  • Explain multiplexer (MUX) and address decoder roles
  • Sketch the Y1 WL/BL selection mechanism
  • State why ECC (Error Correction Code) is needed and Hamming-code basics
  • Distinguish analog ECC strategies (redundancy, averaging, sigma-delta)
  • Compute Y1 cell-failure tolerance

Hook: Pick the Right Cell from 419M

SIDRA Y1 has 419M memristors. Which one to read, when? Address decoder + multiplexer.

Also: some cells are broken (manufacturing defects), some reads are noisy (analog error). Detect and correct. ECC.

This chapter covers SIDRA’s “control backbone” and fault-tolerance mechanisms.

Intuition: From Address to Cell to Correct Answer

MUX (Multiplexer):

N inputs, 1 output. Address bits select:

  • 256:1 MUX = 8-bit address, picks 1 of 256 cells.
  • WL MUX: which row to drive.
  • BL MUX: which column to read.

Decoder:

Address → one-hot signal. 8-bit address → 256 outputs, only one high.

In Y1, every CU has a WL decoder + BL decoder at its head. A MUX runs across the 16 crossbars.

ECC (Error Correction Code):

Bit-level: parity, Hamming, BCH, Reed-Solomon. Detect + fix one-bit errors.

Analog-level: redundancy (3× copies, majority vote), averaging (5× reads), sigma-delta (cancel quantization error).

Formalism: Decoder + MUX + ECC

L1 · Başlangıç

8-bit address → pick one of 256 cells:

Address a[7:0] (256 combinations).

Decoder: 8-input, 256-output combinational circuit. Output o[i] = 1 ⇔ address = i.

Circuit: 8-input AND gate × 256 (simple but big). Optimized: tree-based decoder, log_2 256 = 8-level AND tree.

256:1 MUX:

Decoder outputs gate transmission gates. Only the selected cell connects to the crossbar.

Circuit: 256 transmission gates + decoder. Area: ~250 µm² (in 28 nm CMOS).

Y1 hierarchy:

  • Within a CU: 16 crossbars → 16:1 MUX (4-bit address).
  • Within a crossbar: 256×256 → WL decoder (8-bit) + BL decoder (8-bit).
  • Cluster: 25 CUs → 25:1 routing matrix.
L2 · Tam

Hamming code (classical bit-level ECC):

n data bits + k parity bits, n + k = 2^k - 1.

Corrects single-bit errors, detects double-bit. Common: (7, 4) Hamming → 4 data + 3 parity.

For SIDRA: byte (8 bit) uses (12, 8) Hamming = 4 parity bits. Each cell read passes ECC check.

In practice:

  • Data: 8 bits
  • Storage: 12 bits (4 parity)
  • After read: compute parity, find any flipped bit, correct.

Overhead: 50% storage. Y1 419M / 12 = ~35M data bytes (350 MB net).

Analog ECC:

Bit-level ECC is digital. Analog quantization errors differ. Strategies:

1. Redundancy: write each weight to 3 cells, take majority.

  • Overhead: 200%.
  • Tolerates: 1 cell failure / 3.

2. Averaging: write each weight to 1 cell, read 5 times, average.

  • Overhead: 5× read time (storage unchanged).
  • Tolerates: random noise reduces by √5 = 2.2×.

3. Sigma-delta: track weight + error, fold the error into the next cell.

  • Overhead: small.
  • Tolerates: quantization error nullified long-term.

SIDRA Y1 approach: triple redundancy for critical cells (references, biases); 2× averaging + Hamming bytes for the rest. Total overhead ~30%.

L3 · Derin

Failure modes and tolerance:

Cell failure types:

  1. Stuck-at-LRS: cell won’t program, always low R. Rate: ~0.5%.
  2. Stuck-at-HRS: cell won’t program, always high R. Rate: ~0.3%.
  3. Read fail: noisy measurement, beyond margin. Rate: ~0.1%.
  4. Drift fail: value drifts over time. Rate: ~0.1%/year.

Total Y1 production cell failure: ~1%.

419M × 0.01 = 4.2M faulty cells in Y1.

Tolerance strategy:

  • Per crossbar, spare rows (256 + 4 redundant). Boot tests map out bad rows.
  • Per crossbar, spare columns (256 + 4 redundant). Same.
  • Byte-level ECC (Hamming).

Test and remapping:

At boot, every crossbar is tested:

  1. Program all cells → read → compare.
  2. Faulty cells go in a table (cell-failure map).
  3. The compiler uses that table → routes weights to good cells.

Sigma-delta as decision aid:

Each bit read carries a confidence. Sigma-delta tracks error cumulatively, folds into the next reading. Standard in modern analog ADCs.

Reed-Solomon (block ECC):

Tolerates multi-byte errors (e.g., a whole row corrupt). Y1 uses cluster-level Reed-Solomon. Overhead 10-20%.

Practical Y1 total ECC overhead:

  • Cell redundancy (critical): 5%
  • Byte Hamming: 50% (critical bytes), 0% (non-critical)
  • Cluster Reed-Solomon: 15%
  • Practical total: ~30% overhead.

419M cells - 30% = ~290M effective weights. Still big (290 MB), GPT-2 fits.

MTBF (Mean Time Between Failures):

Y1 production yield 75% → 25% chips scrapped.

Operating MTBF: cell-failure rate ~0.1%/year → 4.2M cells × 0.001 = 4200 new failures/year. The cell-failure map gets a periodic update.

After 10 years: 42K new failures. ECC handles it. A 10-year lifetime is realistic.

Experiment: Hamming Code Step-by-Step

8-bit data: 10110011 (binary).

(12, 8) Hamming: add 4 parity bits.

Positions (1-12):

  • p1, p2, d1, p3, d2, d3, d4, p4, d5, d6, d7, d8.

Data d1-d8 = 1, 0, 1, 1, 0, 0, 1, 1.

Parity compute:

p1 = XOR(d1, d2, d4, d5, d7) = 1 ⊕ 0 ⊕ 1 ⊕ 0 ⊕ 1 = 1 p2 = XOR(d1, d3, d4, d6, d7) = 1 ⊕ 1 ⊕ 1 ⊕ 0 ⊕ 1 = 0 p3 = XOR(d2, d3, d4, d8) = 0 ⊕ 1 ⊕ 1 ⊕ 1 = 1 p4 = XOR(d5, d6, d7, d8) = 0 ⊕ 0 ⊕ 1 ⊕ 1 = 0

Final 12-bit: p1 p2 d1 p3 d2 d3 d4 p4 d5 d6 d7 d8 = 1 0 1 1 0 1 1 0 0 0 1 1.

Storage: 12 cells.

Error simulation: d3 (position 6) flipped.

Read: 1 0 1 1 0 0 1 0 0 0 1 1 (d3 changed).

Decoder compute:

Recompute parity, compare with received.

p1’ = 1 ⊕ 0 ⊕ 1 ⊕ 0 ⊕ 1 = 1 ✓ p2’ = 1 ⊕ 0 ⊕ 1 ⊕ 0 ⊕ 1 = 1 ✗ (received 0) p3’ = 0 ⊕ 0 ⊕ 1 ⊕ 1 = 0 ✗ (received 1) p4’ = 0 ⊕ 0 ⊕ 1 ⊕ 1 = 0 ✓

Syndrome: 0110 = 6 → position 6 errored. Flip bit 6 → 1 → 0 ⊕ 1 = correction.

Result: original d3 = 1 recovered.

Win: a 1-bit error was corrected automatically. Memristor drift over years flipping a cell → ECC catches it.

Quick Quiz

1/6What does a MUX do?

Lab Exercise

Y1 ECC budget analysis.

Y1:

  • 419M cells.
  • Manufacturing failure 1% → 4.2M faulty.
  • Drift +0.1%/year → 0.4M/year new.

ECC strategies:

  • Hamming (12,8): 50% overhead.
  • Triple redundancy: 200%.
  • Sigma-delta: 0% (sequential).

Questions:

(a) Apply Hamming to all cells: net effective data bytes? (b) Triple redundancy only for “critical” 20% of weights (references, biases): net? (c) Hybrid (Hamming for critical + averaging for non-critical) targeting 30% overhead? (d) After 10 years (4M extra failures): accuracy drop? (e) Annual periodic refresh requirement?

Solutions

(a) 419M / 12 × 8 = 279M data bytes (50% overhead). Corrects 1-bit errors.

(b) Critical 20% = 84M weights × 3 = 252M cells. Non-critical 80% = 335M weights × 1 = 335M cells. Total 587M > 419M. Doesn’t fit. Drop critical to 5% → 21M × 3 + 398M × 1 = 461M. Still over. In practice: minimal redundancy.

(c) Hybrid: 50% cells Hamming (50% overhead), 50% sigma-delta (0% overhead). Net overhead = 25%. Raw 419M × 0.75 = 314M effective weights.

(d) 10 years → 4M failures / 314M total = 1.3% extra error rate. ECC + redundancy mask most → AI accuracy drop below 0.5%.

(e) Refresh: once/year reprogram critical cells (per failure map). Total refresh: ~5% of cells × 100 µs = ~20 s (parallel cluster under 1 s). Once a year.

Cheat Sheet

  • MUX: N inputs → 1 output (address selection).
  • Decoder: address → one-hot output.
  • Y1 hierarchy: Cluster MUX (25:1), CU MUX (16:1), crossbar decoder (256×256).
  • ECC: bit-level (Hamming, Reed-Solomon), analog-level (redundancy, averaging, sigma-delta).
  • Y1 production failure: ~1%, tolerated via redundant rows/cols + ECC.
  • Boot test: 100 ms, builds failure map.
  • Total ECC overhead: ~30%. Net: 290M weights.

Vision: Fault-Tolerant AI Hardware

Y10+ targets:

  • Y3: smart ECC (compiler-aware), overhead 20%.
  • Y10: model-aware redundancy (critical layers protected), overhead 15%.
  • Y100: Self-healing crossbar — cells degrade → auto-refresh + reroute. Overhead 5%.
  • Y1000: Bio-compatible self-repair (organic synapses).

For Türkiye: fault-tolerant hardware design is critical for space/defense. ASELSAN, TUSAŞ collaboration → SIDRA-based satellite/defense AI products.

Further Reading

  • Next chapter: 5.9 — Compute Engine and DMA
  • Previous: 5.7 — TIA: Transimpedance Sensing
  • ECC classic: Lin & Costello, Error Control Coding, 2nd ed.
  • Analog ECC: Akarvardar et al., Analog circuit techniques for error-tolerant memory systems, JSSC 2021.
  • Memristor reliability: Govoreanu et al., RRAM endurance and retention, IEEE EDM 2017.