💻 Module 6 · Software Stack · Chapter 6.4 · 9 min read

The ISPP Algorithm, Step by Step

Iterative programming hitting 256 levels — software perspective.

What you'll learn here

  • Detail the ISPP algorithm at firmware level with pseudocode
  • Explain the adaptive pulse-width strategy
  • Describe how ISPP failure (max iterations) is handled
  • Summarize the compiler ↔ ISPP API
  • Identify Y10's closed-loop ISPP target

Hook: 256 Levels at 1% Precision

Chapter 5.5 covered the ISPP (Incremental Step Pulse Programming) principle. This chapter does the firmware code detail.

Goal: program 256 discrete G levels at ±1% precision. The algorithm runs in the RISC-V firmware, per cell.

Intuition: Iterative Approach

A standard SET pulse has 5%+ error. ISPP feedback brings it to 1%:

Target G = 50 µS.
Pulse 1 → measure 30 µS → -20 error.
Pulse 2 → measure 42 µS → -8.
Pulse 3 → measure 49 µS → -1. Done!

3-15 iterations, average 5-7. Pulse width or voltage rises each iteration.

Formalism: Pseudocode + Adaptive Control

L1 · Başlangıç

Core ISPP firmware code:

typedef struct {
    uint32_t cell_addr;
    int target_g;           // µS × 100 (fixed-point)
    int tolerance;          // ±µS × 100
    int max_iterations;     // typically 15
} ispp_params_t;

int ispp_program_cell(ispp_params_t *p) {
    int pulse_voltage = INITIAL_VOLTAGE;  // mV (1500 = 1.5V)
    int pulse_width   = INITIAL_WIDTH;    // ns (10)
    
    for (int i = 0; i < p->max_iterations; i++) {
        apply_set_pulse(p->cell_addr, pulse_voltage, pulse_width);
        int g_actual = read_cell(p->cell_addr);
        int error = p->target_g - g_actual;
        
        if (abs(error) < p->tolerance) {
            return ISPP_SUCCESS;
        }
        
        if (error > 0) {
            // SET stronger
            pulse_voltage += 50;        // +50 mV
            pulse_width = pulse_width * 12 / 10;  // +20%
            
            // Saturate
            if (pulse_voltage > MAX_VOLTAGE) pulse_voltage = MAX_VOLTAGE;
            if (pulse_width > MAX_WIDTH) pulse_width = MAX_WIDTH;
        } else {
            // Overshoot → small RESET
            apply_reset_pulse(p->cell_addr, -500, 5);  // -0.5V, 5ns
            pulse_voltage = INITIAL_VOLTAGE;  // reset
            pulse_width = INITIAL_WIDTH;
        }
    }
    
    return ISPP_FAIL;
}
L2 · Tam

Adaptive pulse strategy:

First few iterations aggressive (large step). As we approach the target, small steps. Logarithmic convergence.

// Pulse width by error magnitude
if (error > 50) pulse_width = 100;   // big step
else if (error > 20) pulse_width = 50;
else if (error > 5) pulse_width = 20;
else pulse_width = 10;  // fine tune

Crossbar parallel programming:

256 cells in the same row can be programmed in parallel. Same pulse applied; BL control selects which cells receive it.

void program_row(int row, int targets[256]) {
    int active[256] = {1, 1, ..., 1};  // all cells active
    
    for (int iter = 0; iter < 15; iter++) {
        // Pulse to all active cells
        apply_row_pulse(row, active, voltage, width);
        
        // Read all
        int reads[256];
        read_row(row, reads);
        
        // Per-cell check
        bool all_done = true;
        for (int col = 0; col < 256; col++) {
            if (active[col] && abs(reads[col] - targets[col]) < tolerance) {
                active[col] = 0;  // this cell done
            } else if (active[col]) {
                all_done = false;
            }
        }
        
        if (all_done) return;
        
        // Pulse adjust
        voltage += 50;
        width = width * 12 / 10;
    }
}

Time: 256 cells parallel → ~16 iterations × 50 ns = 800 ns/row. 256 rows × 800 ns = 200 µs/crossbar.

L3 · Derin

ISPP fail handling:

After 15 iterations without success:

  1. Cell is stuck-LRS or stuck-HRS.
  2. Add to failure map.
  3. ECC remaps to a redundant cell.
  4. Compiler knows; writes the weight to another cell.

Y1 ISPP fail rate: 0.5%. Manageable.

Closed-loop ISPP (Y10 target):

Classic ISPP: pulse → wait → measure. Closed-loop: measure during the pulse, stop instantly.

Pros:

  • 30% time reduction.
  • Higher precision (less overshoot).

Cons:

  • More complex circuitry.

Tested in Y10 prototype.

Temperature compensation:

G(T) Arrhenius. Per-crossbar thermal sensor → firmware applies correction:

int g_target_corrected = g_target * exp_lookup(Ea / (k * T));

ISPP runs against the corrected target. Drift drops.

Compiler ↔ ISPP API:

Driver IOCTL LOAD_MODEL:

struct model_data {
    weight_t weights[N];     // FP32 weights
    int n_layers;
    layer_info_t layers[N_LAYERS];
};

Firmware:

  1. Compiler-quantized weights are received.
  2. Map to cells (cluster, CU, crossbar, row, col).
  3. Program each cell with ISPP.
  4. Update failure map.
  5. Signal “model loaded” to driver.

Performance:

Y1 model load:

  • 419M cells × 800 ns / 256 (parallel) = 1.3 seconds.
  • Practical: ~640 ms (some clusters parallel).

Initial inference latency ~1 second. Done once.

Experiment: ISPP Trace

Cell #12345 target G = 75 µS:

Iter 1: V=1500mV, W=10ns → G=20 µS, err=-55
Iter 2: V=1550mV, W=12ns → G=45 µS, err=-30
Iter 3: V=1600mV, W=14ns → G=62 µS, err=-13
Iter 4: V=1650mV, W=17ns → G=72 µS, err=-3
Iter 5: V=1700mV, W=20ns → G=76 µS, err=+1 → SUCCESS!

Total: 5 iterations × 50 ns = 250 ns.

Failed example (cell #98765, stuck-HRS):

Iter 1-15: G always under 5 µS, target 50.
Iter 15: SUCCESS=false, FAIL signal.

Cell disabled, ECC remaps to redundant cell.

Quick Quiz

1/6Typical ISPP iteration count?

Lab Exercise

ISPP optimization:

(a) Classic ISPP 5% initial error, 5 iterations → 1%. Total 250 ns/cell. (b) Closed-loop: 175 ns/cell. 30% faster. (c) Adaptive pulse: 200 ns/cell. 20% faster. (d) Both combined (Y10 target): 150 ns/cell. 40% faster.

Y10 model load (10B cells, 50K parallel columns): 10B / 50K × 150 ns = 30 ms. 20× faster than Y1 (640 ms).

Inference downtime is critical (for model swaps) → Y10 wins big.

Cheat Sheet

  • ISPP: iterative programming, 5% → 1% error.
  • 5-15 iterations, average 5-7.
  • Adaptive pulse: error-based pulse width.
  • Parallel: 256 cells/row simultaneously.
  • Fail handling: failure map + ECC.
  • Y1 model load: 640 ms.
  • Y10 closed-loop: 30 ms.

Vision: Write Speed = Online Learning

Faster ISPP enables online learning. Continuous weight updates like STDP:

  • Y1: static model (training external).
  • Y3: last-layer online ISPP (1-second update).
  • Y10: 30 ms ISPP → fine-tuning in seconds.
  • Y100: microsecond ISPP → hardware STDP. Bio-compatible plasticity.

Further Reading