DAC — SAR + ISPP
Number to voltage to cell — the start of SIDRA's programming chain.
Prerequisites
What you'll learn here
- State the DAC's core role and why 8-bit suffices in SIDRA
- Explain the SAR (Successive Approximation Register) DAC architecture
- Write the ISPP algorithm step-by-step as pseudocode
- Compute the DAC's energy and area budget for Y1
- Distinguish programming vs read DACs
Hook: Digital Intent, Analog Voltage
The SIDRA crossbar lives in the analog world: Ohm’s law, Kirchhoff, voltages and currents. Software is digital: bits, bytes, integers. The bridge is the DAC (Digital-to-Analog Converter).
Y1 has two DAC roles:
- Read DAC: during inference, applies an input voltage to each row (8-bit input → 0-0.5 V).
- Programming DAC: writes each cell to one of 256 levels via an ISPP voltage (8-bit weight → SET pulse).
A wrong DAC output corrupts the whole MVM. This chapter details Y1’s DAC design and the ISPP (Incremental Step Pulse Programming) algorithm.
Intuition: Convert Digital to Voltage
A DAC’s only job: 8-bit number (0-255) → analog voltage (0-V_max).
Naive: divided reference:
For 256 distinct voltage levels:
- A reference voltage (V_ref = 0.5 V) crosses a resistor string.
- 256 taps (0, V_ref/256, 2V_ref/256, …, V_ref).
- A multiplexer picks the right tap for the 8-bit input.
Numbers:
- Step: V_ref / 256 = 1.95 mV.
- Precision: ±LSB/2 = ±0.97 mV.
- 8-bit DAC ~256 levels = matches the SIDRA cell quantization.
Real designs — more elaborate:
256 resistors + 256:1 MUX → too much area. Practical: capacitor-based SAR DAC or R-2R ladder:
- R-2R ladder: only 2 resistor values (R, 2R), 16 resistors for 8 bits. Compact.
- Capacitor SAR: 256 unit capacitors, binary-weighted.
- Hybrid: 6-bit MSB + 2-bit LSB structures.
SIDRA Y1 uses R-2R ladder + buffer (area + power balance).
What is ISPP?
A single pulse won’t program a cell exactly to a target G (noise, variation). Iterative approach:
- Apply a low voltage pulse.
- Read the cell (measure G_actual).
- If error: increase pulse width or voltage; repeat.
- Stop when G_target ± tolerance is hit.
5-15 iterations → 1% precision for 256 levels.
Formalism: SAR DAC and the ISPP Algorithm
SAR DAC (Successive Approximation Register):
Originally an ADC technique, “inverted” → DAC. 8-bit SAR DAC:
input: 8-bit number n (0-255)
V_ref = 0.5 V
output: V_out = (n / 256) × V_refR-2R ladder structure:
V_ref
|
R────R────R────R───... (8 R, MSB → LSB)
| | | |
2R 2R 2R 2R
| | | |
b_7 b_6 b_5 ... b_0
↓ ↓ ↓ ↓
←── output (V_out)Each bit (b_i) on/off contributes V_ref/2^(i+1).
V_out = V_ref × (b_7/2 + b_6/4 + b_5/8 + … + b_0/256).
Practical energy:
- Unit resistor: ~10 kΩ.
- DAC current: V_ref / R = 50 µA.
- DAC energy/conversion: 50 µA × 0.5 V × 10 ns = 0.25 pJ.
Y1: 256 DACs × 0.25 pJ = 64 pJ per CU MVM. Total Y1 1.6M DACs × 50M conv/s × 0.25 pJ = ~20 W. Above TDP!
Fix: DACs only run on active rows (sparsity). 30% activity → 6 W. Still high. Simpler DAC (4-bit + dither) → 0.05 pJ. Y1 takes a hybrid approach.
ISPP algorithm (pseudocode):
def ispp(cell, target_G, tolerance=1e-6):
pulse_voltage = 1.0 # V (start)
pulse_width = 10e-9 # 10 ns
max_iterations = 15
for i in range(max_iterations):
# Apply SET pulse
apply_pulse(cell, voltage=pulse_voltage, width=pulse_width)
# Read
G_actual = read_cell(cell)
error = target_G - G_actual
if abs(error) < tolerance:
return success
if error > 0: # need more SET
pulse_voltage += 0.05 # small increment
pulse_width *= 1.2 # +20%
else: # overshoot → partial RESET
pulse_voltage = -0.5 # negative (RESET)
pulse_width = 5e-9 # small
return failureTypical iterations: 5-15. Each ~50 ns (pulse + read). Total: 250 ns - 750 ns per cell.
Crossbar parallel programming:
Can 256 cells in the same row be programmed in parallel? No — same WL voltage hits all cells, but each may need a different G_target. Each cell may need a different pulse width.
Strategy:
- WL pulse: parallel (256 cells, same voltage).
- BL control: per-cell (each column has its own transistor gate).
- Result: 256 cells get the pulse simultaneously, but BL transistors choose which actually receive it.
Programming order:
- From the lowest G_target to the highest.
- Each cycle: pulse → read all 256 → check errors → repeat.
Time: 256 cells × ~5 iterations × 50 ns = 64 µs/row. All 256 rows × 64 µs = 16 ms/crossbar. Y1 with 6400 crossbars × 16 ms = 100 s? No, parallel clusters → ~5-10 s. Done once (model load).
DAC precision and linearity:
Two error types at the DAC output:
- Differential nonlinearity (DNL): deviation between successive level differences. Ideal: ±0.5 LSB.
- Integral nonlinearity (INL): deviation from the ideal straight line. Ideal: ±1 LSB.
SIDRA Y1 DAC typical:
- DNL: ±0.5 LSB (good).
- INL: ±1.5 LSB (moderate).
If INL grows → some level differences widen → MVM accuracy drops.
Thermal effects:
DAC reference voltage V_ref is temperature-sensitive. Each 1°C → 100 ppm drift. Across -25°C to 85°C → 110°C × 100 ppm = 1.1% drift.
Fix: bandgap voltage reference (V_BG) — very stable vs T (~10 ppm/°C). Y1 has 4 bandgap references per cluster.
Programming vs read DACs:
| Property | Read DAC | Program DAC |
|---|---|---|
| Voltage range | 0-0.5 V | -2 V to +2 V |
| Precision | 8-bit | 8-bit (with iterative ISPP) |
| Speed | 10 ns | 100 ns/pulse |
| Count (Y1) | 1.6M (256/CU) | Fewer (shared) |
| Energy | 0.25 pJ/conv | 5 pJ/pulse |
Programming DACs are fewer because they’re only used at program-time and can be shared. Common per cluster.
ISPP at 1% precision:
Cells respond differently → calibration first. Before ISPP starts:
- Apply a test pulse → measure R_initial.
- Estimate R_initial → Pulse-G curve (per cell).
- Adjust the ISPP target to that curve.
Result: cell-to-cell variance 5% → after ISPP, 1%.
Lifetime impact:
Each ISPP cycle adds wear-out. 15 iterations = 15 SETs. Endurance 10⁶ → ~67K reprogram cycles per cell post-ISPP. Plenty for inference.
Modern alternatives:
- Closed-loop SET: measure in-situ during the pulse. Faster, but more complex.
- Multi-shot programming: several small pulses in parallel.
- Stochastic programming: random pulses + filtering.
SIDRA Y1 uses classical ISPP. Y10+ targets closed-loop.
Experiment: 50 µS Target with ISPP
Program a cell to G_target = 50 µS. Tolerance ±1 µS.
Iteration 0: G_initial = 5 µS (HRS).
Iteration 1: Pulse 1.0 V, 10 ns. → G_1 = 28 µS. Error: -22 µS.
Iteration 2: Pulse 1.05 V, 12 ns. → G_2 = 41 µS. Error: -9 µS.
Iteration 3: Pulse 1.1 V, 14 ns. → G_3 = 49 µS. Error: -1 µS. Done!
3 iterations × 50 ns = 150 ns total.
If we overshoot (G_actual = 55):
Iteration 4: Negative pulse -0.5 V, 5 ns. → G_4 = 49.5 µS. Done.
Practical total: 3-7 iterations, ~150-350 ns.
Full crossbar (65K cells):
- ~5 iterations average per cell.
- 256 cells in a row in parallel, but each different target → BL control sequential.
- 256 rows × 5 iterations × 50 ns = 64 µs/row × 256 = 16 ms/crossbar.
- Y1 16 clusters parallel → ~1 ms/cluster average.
- Total Y1 model load: ~640 ms.
Done once (after load, non-volatile, persistent).
Quick Quiz
Lab Exercise
Optimize Y1 DAC energy budget.
Y1 DACs:
- 1.6M DACs (256 read DACs per CU).
- Each: 0.25 pJ/conversion.
- Conversion rate: 50M MVM/s.
Total DAC energy: 1.6 × 10⁶ × 50 × 10⁶ × 0.25 × 10⁻¹² = 20 W.
Above TDP 3W! Problem.
Questions:
(a) DAC energy at 30% activity? (b) Drop DAC bit depth from 8 → 4 bit + dither? (c) Halve the DAC count (sharing)? Performance impact? (d) Y10 target: TDC bypasses the DAC. Total saving? (e) Y1 final practical DAC energy estimate?
Solutions
(a) 20 W × 0.30 = 6 W. Still 2× TDP.
(b) 4-bit DAC: 16 levels, 0.05 pJ/conv (4× less). 16 levels suffice for INT4 inference (modern AI). Total: 6 W × 0.20 = 1.2 W. Practical.
(c) DAC sharing (one DAC per 2 rows). Half count = 800K DACs. Energy: 0.6 W. But: parallel reads no longer possible → MVM time 2× longer (15 → 30 ns). Throughput halved.
(d) TDC alternative (chapter 5.6): time-domain readout → bypasses ADC and simplifies DACs. Estimate: 50% energy savings. Y10 DAC contribution ~0.3 W.
(e) Y1 final estimate: 4-bit dither + sparsity + sharing → ~0.8 W DAC share. Within TDP. Y10 ~0.2 W (TDC + 7 nm).
Lesson: ADC/DAC area + power dominates analog AI chips. Y1’s design works around these limits; Y10+ uses TDC to leap past them.
Cheat Sheet
- DAC: digital number → analog voltage. Crossbar input.
- Y1 DAC: 8-bit, R-2R ladder, 0-0.5 V, 0.25 pJ/conversion.
- ISPP: iterative programming. Pulse → read → adjust → repeat. 5-15 iterations, 1% precision.
- Programming vs read DAC: voltage range, frequency, count differ.
- Y1 model load: ~640 ms. Once, then non-volatile.
- Energy budget: raw DAC alone 20 W (worst case) → 0.8 W with sparsity + dither.
Vision: Beyond DAC — Sigma-Delta and Direct Spike
Conventional SAR DAC fills analog AI chip area. New architectures:
- Y1 (today): R-2R ladder DAC, 8-bit. Standard approach.
- Y3 (2027): Sigma-Delta DAC — noise shaping for effective 12-bit (smaller area).
- Y10 (2029): TDC (Time-to-Digital) read → DAC bypass. Voltage encoded as time.
- Y100 (2031+): Direct spike-encoding. Input = spike train → no DAC. Bio-compatible.
- Y1000 (long horizon): Optical DAC — intensity encoding in photonic waveguides.
Meaning for Türkiye: mixed-signal circuit design (DAC, ADC) is the “mature” side of semiconductors. Türkiye’s TÜBİTAK BİLGEM, ASELSAN, etc. are strong here. SIDRA channels that capability into neuromorphic AI.
Unexpected: a fully DAC-free architecture. With spike-based encoding, the DAC vanishes entirely. SIDRA Y100 steps in that direction. Not clock-driven digital, event-driven analog compute.
Further Reading
- Next chapter: 5.6 — TDC: Time-Domain Readout
- Previous: 5.4 — YILDIRIM Chip Architecture
- DAC design classic: Razavi, Design of Analog CMOS Integrated Circuits, Ch. 12.
- SAR architecture: Wikipedia “Successive Approximation ADC”.
- ISPP for memristors: Kim et al., Programming algorithms for multilevel-cell phase-change memory, ASP-DAC 2014.
- Sigma-Delta DAC: Schreier & Temes, Understanding Delta-Sigma Data Converters, 2nd ed.