🔌 Module 5 · Chip Hardware · Chapter 5.13 · 9 min read

Signal Chain and Packaging

From memristor to chip pin — the end-to-end data path.

What you'll learn here

  • Identify the full signal chain (memristor → TIA → ADC → compute → DMA → PCIe)
  • Explain PCIe bandwidth and latency
  • Compare chip packaging tech (BGA, FC-BGA, CoWoS)
  • State Y1's packaging choice and its electrical/thermal implications
  • Briefly understand signal integrity, ESD protection, and I/O control

Hook: From Memristor to Pin

A SIDRA inference: voltage applied → current out → digitized → goes to the CPU. That path is the signal chain.

At the same time the chip’s pins connect it to the world: PCIe, power, thermal sensor, JTAG. Packaging is the physical interface.

This chapter summarizes the end-to-end path.

Intuition: The Dataflow

Memristor cell (analog G)
    ↓ (Ohm: I = G·V)
WL/BL intersection (current)
    ↓ (KCL: column sum)
TIA (current → voltage)

ADC or TDC (analog → digital)

Compute engine (activation, bias, norm)

L1 → L2 → L3 SRAM (DMA)

PCIe controller (16 GB/s)

PCIe pins (BGA package)

PCIe slot (host CPU)

Each stage was detailed in earlier chapters. This chapter focuses on packaging.

Formalism: PCIe and Packaging

L1 · Başlangıç

PCIe 5.0 spec (Y1):

  • 4 lanes.
  • 32 GT/s per lane (Giga Transfer/s).
  • Net bandwidth: 16 GB/s bidirectional.
  • Latency: 100-500 ns (round-trip CPU ↔ chip).

Y1 PCIe packet:

PCIe TLP (Transaction Layer Packet):

  • Header (3-4 DW = 12-16 bytes).
  • Data payload (256-4096 bytes typical).
  • Overhead: ~5%.

Typical inference: input 1 KB → 1 PCIe transaction. Done in ~100 ns.

Packaging types:

TypePin countSpeedThermalCost
QFP (legacy)100-200lowlowcheap
BGA500-2000midmidmid
FC-BGA500-3000highgoodexpensive
CoWoS (TSMC)5000+very highexcellentvery expensive

Y1 choice: FC-BGA (Flip-Chip Ball Grid Array).

  • ~1500 pins (PCIe + power + I/O).
  • 30 mm × 30 mm package.
  • Thermal: heat spreader on the top surface.
L2 · Tam

FC-BGA details:

Flip-chip: die packaged “face down”. Bumps (solder balls) bond to the substrate. The substrate carries a ball-grid array → soldered to the PCB.

Pros:

  • Short electrical path (high speed).
  • Direct power delivery (through bumps).
  • Thermal: heat spreader bonds directly to the die.

Y1 inside the package:

  • Die: 10 mm × 10 mm.
  • Substrate: 30 mm × 30 mm.
  • Heat spreader: 25 mm × 25 mm.
  • Ball pitch: 0.5 mm → ~50 × 50 = 2500-grid.

Signal integrity (SI):

For high-speed PCIe (32 GT/s):

  • Characteristic impedance: 100 Ω differential.
  • Eye diagram: BER < 10⁻¹².
  • Equalization: pre-emphasis + decision feedback.

Y1’s PCIe interface includes on-chip equalization.

ESD (Electrostatic Discharge) protection:

Per-pin ESD diodes (HBM 2 kV class). Static charge → routed to ground, chip safe.

Power delivery (PDN):

Y1 3 W @ 1 V = 3 A. Distributed via bumps (~100 power bumps). Decoupling caps on-package + on-die.

L3 · Derin

Heterogeneous packaging (Y10+ target):

CoWoS (Chip-on-Wafer-on-Substrate) — TSMC. Multiple dies on one interposer:

  • SIDRA (logic + crossbar).
  • HBM (high-bandwidth memory).
  • I/O die.

Y10 target: SIDRA + 1 GB HBM3 stack. 1 TB/s bandwidth.

3D stack (Y100):

Two SIDRA dies stacked. Connected via TSV (Through-Silicon Via).

2× density, 2× bandwidth, but cooling is harder.

PCB design:

Y1 PCB:

  • 8-12 layers.
  • PCIe slot (CPU connection).
  • Power regulators (12 V → 1 V switching).
  • Clock oscillator.
  • USB JTAG (debug).

PCB typical 100 mm × 60 mm. Datacenter add-in card form factor.

Yield and test:

Wafer testing:

  1. Die sort (probe card).
  2. Yield ~75%.
  3. Good dies packaged.
  4. Package test: 95% pass.

Net packaged-product yield: 0.75 × 0.95 = 71% final yield.

Manufacturing line (detail in module 7):

  • Wafer: TSMC (28 nm CMOS).
  • BEOL memristor: SIDRA workshop (UNAM).
  • Packaging: ASE or Amkor (Taiwan).
  • Test: SIDRA workshop.

Annual production target:

Y1 ~10K-100K chips/year (workshop capacity). Y10 millions (with mini-fab). Y100 tens of millions (full fab).

Experiment: Full-Signal-Chain Latency

GPT-2 1-token inference end-to-end:

StageTime
1. CPU input prep100 ns
2. PCIe transfer (in)50 ns
3. Y1 L3 → L2 → L1 DMA50 ns
4. 12-layer MVM + compute5 µs
5. Y1 L1 → L2 → L3 DMA50 ns
6. PCIe transfer (out)50 ns
7. CPU post-process100 ns
Total~5.4 µs

Inference: 5 µs. Wrap (PCIe + DMA + CPU): 0.4 µs. PCIe overhead 7%.

Inside: Y1 net 4.9 µs, outside 0.4 µs. PCIe is efficient.

Throughput:

Sequential single-thread: 5.4 µs/token = 185K tokens/s.

Pipelined (16 parallel clusters): 16× = 3M tokens/s.

H100 batch 32: ~10M tokens/s. SIDRA Y1 batch 16: 3M tokens/s. 3× slower but 200× less energy.

Quick Quiz

1/6Y1 PCIe specs?

Lab Exercise

Y1 packaging-options analysis.

Options:

  1. QFP (300 pins, cheap, low speed).
  2. BGA (1500 pins, mid).
  3. FC-BGA (1500 pins, high speed, good thermal). Y1 choice.
  4. CoWoS (5000 pins, very expensive). Y10 target.

Questions:

(a) For PCIe 5.0 × 16 lanes (total 64 GB/s) → how many pins? (b) For Y1 thermal dissipation (3 W) → what kind of heat spreader? (c) Y10 (30 W) packaging change? (d) Y100 (100 W) total cost: CoWoS + liquid cooling? (e) Can a packaging line be set up in Türkiye?

Solutions

(a) PCIe 5.0 × 16 lanes = 64 differential pairs × 2 (TX/RX) × 2 (diff) = 256 pins for PCIe alone. + power + I/O + others = ~1200 pins total. FC-BGA needed.

(b) 3 W → small heat spreader (10 mm² Cu plate). Passive cooling suffices (R_th ~15°C/W).

(c) Y10 30 W → bigger heat spreader + heat sink. Same package (FC-BGA), with extra heat sink.

(d) Y100 CoWoS: ~500/package(TSMC).Liquidcooling+radiator 500/package (TSMC). Liquid cooling + radiator ~100/system. Total chip+cooling: ~1000/Y100.1Munits/year1000/Y100. 1M units/year → 1B capex.

(e) Türkiye packaging: ASELSAN, BİLGEM run low-pin BGA. FC-BGA is more advanced; not yet in Türkiye but feasible in 5-10 years. Investment ~$50-100M (mid-large single-line fab).

Cheat Sheet

  • Signal chain: memristor → TIA → ADC → compute → DMA → PCIe.
  • Y1 PCIe: 5.0 × 4 lanes = 16 GB/s.
  • Packaging: FC-BGA, ~1500 pins, 30×30 mm.
  • Thermal: flip-chip heat spreader, R_th 15°C/W.
  • ESD: diodes per pin.
  • Y10 target: CoWoS + HBM3 (1 TB/s).
  • Y100: 3D stack + liquid.
  • End-to-end inference: Y1 ~5.4 µs/token (~7% overhead).

Vision: The Heterogeneous Integration Era

  • Y1: single-die FC-BGA.
  • Y3: SIDRA + sensor chiplet (camera AI).
  • Y10: SIDRA + HBM3 CoWoS.
  • Y100: SIDRA + photonic + HBM 3D stack.
  • Y1000: Wafer-scale (Cerebras-style) + bio-compatible.

For Türkiye: heterogeneous packaging needs mini-fab infrastructure. ASELSAN collaboration + national investment realistic for the Y10 era.

Further Reading