🔌 Module 5 · Chip Hardware · Chapter 5.5 · 12 min read

DAC — SAR + ISPP

Number to voltage to cell — the start of SIDRA's programming chain.

What you'll learn here

  • State the DAC's core role and why 8-bit suffices in SIDRA
  • Explain the SAR (Successive Approximation Register) DAC architecture
  • Write the ISPP algorithm step-by-step as pseudocode
  • Compute the DAC's energy and area budget for Y1
  • Distinguish programming vs read DACs

Hook: Digital Intent, Analog Voltage

The SIDRA crossbar lives in the analog world: Ohm’s law, Kirchhoff, voltages and currents. Software is digital: bits, bytes, integers. The bridge is the DAC (Digital-to-Analog Converter).

Y1 has two DAC roles:

  1. Read DAC: during inference, applies an input voltage to each row (8-bit input → 0-0.5 V).
  2. Programming DAC: writes each cell to one of 256 levels via an ISPP voltage (8-bit weight → SET pulse).

A wrong DAC output corrupts the whole MVM. This chapter details Y1’s DAC design and the ISPP (Incremental Step Pulse Programming) algorithm.

Intuition: Convert Digital to Voltage

A DAC’s only job: 8-bit number (0-255) → analog voltage (0-V_max).

Naive: divided reference:

For 256 distinct voltage levels:

  • A reference voltage (V_ref = 0.5 V) crosses a resistor string.
  • 256 taps (0, V_ref/256, 2V_ref/256, …, V_ref).
  • A multiplexer picks the right tap for the 8-bit input.

Numbers:

  • Step: V_ref / 256 = 1.95 mV.
  • Precision: ±LSB/2 = ±0.97 mV.
  • 8-bit DAC ~256 levels = matches the SIDRA cell quantization.

Real designs — more elaborate:

256 resistors + 256:1 MUX → too much area. Practical: capacitor-based SAR DAC or R-2R ladder:

  • R-2R ladder: only 2 resistor values (R, 2R), 16 resistors for 8 bits. Compact.
  • Capacitor SAR: 256 unit capacitors, binary-weighted.
  • Hybrid: 6-bit MSB + 2-bit LSB structures.

SIDRA Y1 uses R-2R ladder + buffer (area + power balance).

What is ISPP?

A single pulse won’t program a cell exactly to a target G (noise, variation). Iterative approach:

  1. Apply a low voltage pulse.
  2. Read the cell (measure G_actual).
  3. If error: increase pulse width or voltage; repeat.
  4. Stop when G_target ± tolerance is hit.

5-15 iterations → 1% precision for 256 levels.

Formalism: SAR DAC and the ISPP Algorithm

L1 · Başlangıç

SAR DAC (Successive Approximation Register):

Originally an ADC technique, “inverted” → DAC. 8-bit SAR DAC:

input: 8-bit number n (0-255)
V_ref = 0.5 V

output: V_out = (n / 256) × V_ref

R-2R ladder structure:

             V_ref
              |
   R────R────R────R───...   (8 R, MSB → LSB)
   |    |    |    |
   2R   2R   2R   2R
   |    |    |    |
   b_7  b_6  b_5  ...  b_0
   ↓    ↓    ↓    ↓
   ←── output (V_out)

Each bit (b_i) on/off contributes V_ref/2^(i+1).

V_out = V_ref × (b_7/2 + b_6/4 + b_5/8 + … + b_0/256).

Practical energy:

  • Unit resistor: ~10 kΩ.
  • DAC current: V_ref / R = 50 µA.
  • DAC energy/conversion: 50 µA × 0.5 V × 10 ns = 0.25 pJ.

Y1: 256 DACs × 0.25 pJ = 64 pJ per CU MVM. Total Y1 1.6M DACs × 50M conv/s × 0.25 pJ = ~20 W. Above TDP!

Fix: DACs only run on active rows (sparsity). 30% activity → 6 W. Still high. Simpler DAC (4-bit + dither) → 0.05 pJ. Y1 takes a hybrid approach.

L2 · Tam

ISPP algorithm (pseudocode):

def ispp(cell, target_G, tolerance=1e-6):
    pulse_voltage = 1.0  # V (start)
    pulse_width = 10e-9  # 10 ns
    max_iterations = 15
    
    for i in range(max_iterations):
        # Apply SET pulse
        apply_pulse(cell, voltage=pulse_voltage, width=pulse_width)
        
        # Read
        G_actual = read_cell(cell)
        error = target_G - G_actual
        
        if abs(error) < tolerance:
            return success
        
        if error > 0:  # need more SET
            pulse_voltage += 0.05  # small increment
            pulse_width *= 1.2  # +20%
        else:  # overshoot → partial RESET
            pulse_voltage = -0.5  # negative (RESET)
            pulse_width = 5e-9  # small
    
    return failure

Typical iterations: 5-15. Each ~50 ns (pulse + read). Total: 250 ns - 750 ns per cell.

Crossbar parallel programming:

Can 256 cells in the same row be programmed in parallel? No — same WL voltage hits all cells, but each may need a different G_target. Each cell may need a different pulse width.

Strategy:

  • WL pulse: parallel (256 cells, same voltage).
  • BL control: per-cell (each column has its own transistor gate).
  • Result: 256 cells get the pulse simultaneously, but BL transistors choose which actually receive it.

Programming order:

  • From the lowest G_target to the highest.
  • Each cycle: pulse → read all 256 → check errors → repeat.

Time: 256 cells × ~5 iterations × 50 ns = 64 µs/row. All 256 rows × 64 µs = 16 ms/crossbar. Y1 with 6400 crossbars × 16 ms = 100 s? No, parallel clusters → ~5-10 s. Done once (model load).

L3 · Derin

DAC precision and linearity:

Two error types at the DAC output:

  1. Differential nonlinearity (DNL): deviation between successive level differences. Ideal: ±0.5 LSB.
  2. Integral nonlinearity (INL): deviation from the ideal straight line. Ideal: ±1 LSB.

SIDRA Y1 DAC typical:

  • DNL: ±0.5 LSB (good).
  • INL: ±1.5 LSB (moderate).

If INL grows → some level differences widen → MVM accuracy drops.

Thermal effects:

DAC reference voltage V_ref is temperature-sensitive. Each 1°C → 100 ppm drift. Across -25°C to 85°C → 110°C × 100 ppm = 1.1% drift.

Fix: bandgap voltage reference (V_BG) — very stable vs T (~10 ppm/°C). Y1 has 4 bandgap references per cluster.

Programming vs read DACs:

PropertyRead DACProgram DAC
Voltage range0-0.5 V-2 V to +2 V
Precision8-bit8-bit (with iterative ISPP)
Speed10 ns100 ns/pulse
Count (Y1)1.6M (256/CU)Fewer (shared)
Energy0.25 pJ/conv5 pJ/pulse

Programming DACs are fewer because they’re only used at program-time and can be shared. Common per cluster.

ISPP at 1% precision:

Cells respond differently → calibration first. Before ISPP starts:

  • Apply a test pulse → measure R_initial.
  • Estimate R_initial → Pulse-G curve (per cell).
  • Adjust the ISPP target to that curve.

Result: cell-to-cell variance 5% → after ISPP, 1%.

Lifetime impact:

Each ISPP cycle adds wear-out. 15 iterations = 15 SETs. Endurance 10⁶ → ~67K reprogram cycles per cell post-ISPP. Plenty for inference.

Modern alternatives:

  • Closed-loop SET: measure in-situ during the pulse. Faster, but more complex.
  • Multi-shot programming: several small pulses in parallel.
  • Stochastic programming: random pulses + filtering.

SIDRA Y1 uses classical ISPP. Y10+ targets closed-loop.

Experiment: 50 µS Target with ISPP

Program a cell to G_target = 50 µS. Tolerance ±1 µS.

Iteration 0: G_initial = 5 µS (HRS).

Iteration 1: Pulse 1.0 V, 10 ns. → G_1 = 28 µS. Error: -22 µS.

Iteration 2: Pulse 1.05 V, 12 ns. → G_2 = 41 µS. Error: -9 µS.

Iteration 3: Pulse 1.1 V, 14 ns. → G_3 = 49 µS. Error: -1 µS. Done!

3 iterations × 50 ns = 150 ns total.

If we overshoot (G_actual = 55):

Iteration 4: Negative pulse -0.5 V, 5 ns. → G_4 = 49.5 µS. Done.

Practical total: 3-7 iterations, ~150-350 ns.

Full crossbar (65K cells):

  • ~5 iterations average per cell.
  • 256 cells in a row in parallel, but each different target → BL control sequential.
  • 256 rows × 5 iterations × 50 ns = 64 µs/row × 256 = 16 ms/crossbar.
  • Y1 16 clusters parallel → ~1 ms/cluster average.
  • Total Y1 model load: ~640 ms.

Done once (after load, non-volatile, persistent).

Quick Quiz

1/6What does a DAC do?

Lab Exercise

Optimize Y1 DAC energy budget.

Y1 DACs:

  • 1.6M DACs (256 read DACs per CU).
  • Each: 0.25 pJ/conversion.
  • Conversion rate: 50M MVM/s.

Total DAC energy: 1.6 × 10⁶ × 50 × 10⁶ × 0.25 × 10⁻¹² = 20 W.

Above TDP 3W! Problem.

Questions:

(a) DAC energy at 30% activity? (b) Drop DAC bit depth from 8 → 4 bit + dither? (c) Halve the DAC count (sharing)? Performance impact? (d) Y10 target: TDC bypasses the DAC. Total saving? (e) Y1 final practical DAC energy estimate?

Solutions

(a) 20 W × 0.30 = 6 W. Still 2× TDP.

(b) 4-bit DAC: 16 levels, 0.05 pJ/conv (4× less). 16 levels suffice for INT4 inference (modern AI). Total: 6 W × 0.20 = 1.2 W. Practical.

(c) DAC sharing (one DAC per 2 rows). Half count = 800K DACs. Energy: 0.6 W. But: parallel reads no longer possible → MVM time 2× longer (15 → 30 ns). Throughput halved.

(d) TDC alternative (chapter 5.6): time-domain readout → bypasses ADC and simplifies DACs. Estimate: 50% energy savings. Y10 DAC contribution ~0.3 W.

(e) Y1 final estimate: 4-bit dither + sparsity + sharing → ~0.8 W DAC share. Within TDP. Y10 ~0.2 W (TDC + 7 nm).

Lesson: ADC/DAC area + power dominates analog AI chips. Y1’s design works around these limits; Y10+ uses TDC to leap past them.

Cheat Sheet

  • DAC: digital number → analog voltage. Crossbar input.
  • Y1 DAC: 8-bit, R-2R ladder, 0-0.5 V, 0.25 pJ/conversion.
  • ISPP: iterative programming. Pulse → read → adjust → repeat. 5-15 iterations, 1% precision.
  • Programming vs read DAC: voltage range, frequency, count differ.
  • Y1 model load: ~640 ms. Once, then non-volatile.
  • Energy budget: raw DAC alone 20 W (worst case) → 0.8 W with sparsity + dither.

Vision: Beyond DAC — Sigma-Delta and Direct Spike

Conventional SAR DAC fills analog AI chip area. New architectures:

  • Y1 (today): R-2R ladder DAC, 8-bit. Standard approach.
  • Y3 (2027): Sigma-Delta DAC — noise shaping for effective 12-bit (smaller area).
  • Y10 (2029): TDC (Time-to-Digital) read → DAC bypass. Voltage encoded as time.
  • Y100 (2031+): Direct spike-encoding. Input = spike train → no DAC. Bio-compatible.
  • Y1000 (long horizon): Optical DAC — intensity encoding in photonic waveguides.

Meaning for Türkiye: mixed-signal circuit design (DAC, ADC) is the “mature” side of semiconductors. Türkiye’s TÜBİTAK BİLGEM, ASELSAN, etc. are strong here. SIDRA channels that capability into neuromorphic AI.

Unexpected: a fully DAC-free architecture. With spike-based encoding, the DAC vanishes entirely. SIDRA Y100 steps in that direction. Not clock-driven digital, event-driven analog compute.

Further Reading

  • Next chapter: 5.6 — TDC: Time-Domain Readout
  • Previous: 5.4 — YILDIRIM Chip Architecture
  • DAC design classic: Razavi, Design of Analog CMOS Integrated Circuits, Ch. 12.
  • SAR architecture: Wikipedia “Successive Approximation ADC”.
  • ISPP for memristors: Kim et al., Programming algorithms for multilevel-cell phase-change memory, ASP-DAC 2014.
  • Sigma-Delta DAC: Schreier & Temes, Understanding Delta-Sigma Data Converters, 2nd ed.