Signal Chain and Packaging
From memristor to chip pin — the end-to-end data path.
Prerequisites
What you'll learn here
- Identify the full signal chain (memristor → TIA → ADC → compute → DMA → PCIe)
- Explain PCIe bandwidth and latency
- Compare chip packaging tech (BGA, FC-BGA, CoWoS)
- State Y1's packaging choice and its electrical/thermal implications
- Briefly understand signal integrity, ESD protection, and I/O control
Hook: From Memristor to Pin
A SIDRA inference: voltage applied → current out → digitized → goes to the CPU. That path is the signal chain.
At the same time the chip’s pins connect it to the world: PCIe, power, thermal sensor, JTAG. Packaging is the physical interface.
This chapter summarizes the end-to-end path.
Intuition: The Dataflow
Memristor cell (analog G)
↓ (Ohm: I = G·V)
WL/BL intersection (current)
↓ (KCL: column sum)
TIA (current → voltage)
↓
ADC or TDC (analog → digital)
↓
Compute engine (activation, bias, norm)
↓
L1 → L2 → L3 SRAM (DMA)
↓
PCIe controller (16 GB/s)
↓
PCIe pins (BGA package)
↓
PCIe slot (host CPU)Each stage was detailed in earlier chapters. This chapter focuses on packaging.
Formalism: PCIe and Packaging
PCIe 5.0 spec (Y1):
- 4 lanes.
- 32 GT/s per lane (Giga Transfer/s).
- Net bandwidth: 16 GB/s bidirectional.
- Latency: 100-500 ns (round-trip CPU ↔ chip).
Y1 PCIe packet:
PCIe TLP (Transaction Layer Packet):
- Header (3-4 DW = 12-16 bytes).
- Data payload (256-4096 bytes typical).
- Overhead: ~5%.
Typical inference: input 1 KB → 1 PCIe transaction. Done in ~100 ns.
Packaging types:
| Type | Pin count | Speed | Thermal | Cost |
|---|---|---|---|---|
| QFP (legacy) | 100-200 | low | low | cheap |
| BGA | 500-2000 | mid | mid | mid |
| FC-BGA | 500-3000 | high | good | expensive |
| CoWoS (TSMC) | 5000+ | very high | excellent | very expensive |
Y1 choice: FC-BGA (Flip-Chip Ball Grid Array).
- ~1500 pins (PCIe + power + I/O).
- 30 mm × 30 mm package.
- Thermal: heat spreader on the top surface.
FC-BGA details:
Flip-chip: die packaged “face down”. Bumps (solder balls) bond to the substrate. The substrate carries a ball-grid array → soldered to the PCB.
Pros:
- Short electrical path (high speed).
- Direct power delivery (through bumps).
- Thermal: heat spreader bonds directly to the die.
Y1 inside the package:
- Die: 10 mm × 10 mm.
- Substrate: 30 mm × 30 mm.
- Heat spreader: 25 mm × 25 mm.
- Ball pitch: 0.5 mm → ~50 × 50 = 2500-grid.
Signal integrity (SI):
For high-speed PCIe (32 GT/s):
- Characteristic impedance: 100 Ω differential.
- Eye diagram: BER < 10⁻¹².
- Equalization: pre-emphasis + decision feedback.
Y1’s PCIe interface includes on-chip equalization.
ESD (Electrostatic Discharge) protection:
Per-pin ESD diodes (HBM 2 kV class). Static charge → routed to ground, chip safe.
Power delivery (PDN):
Y1 3 W @ 1 V = 3 A. Distributed via bumps (~100 power bumps). Decoupling caps on-package + on-die.
Heterogeneous packaging (Y10+ target):
CoWoS (Chip-on-Wafer-on-Substrate) — TSMC. Multiple dies on one interposer:
- SIDRA (logic + crossbar).
- HBM (high-bandwidth memory).
- I/O die.
Y10 target: SIDRA + 1 GB HBM3 stack. 1 TB/s bandwidth.
3D stack (Y100):
Two SIDRA dies stacked. Connected via TSV (Through-Silicon Via).
2× density, 2× bandwidth, but cooling is harder.
PCB design:
Y1 PCB:
- 8-12 layers.
- PCIe slot (CPU connection).
- Power regulators (12 V → 1 V switching).
- Clock oscillator.
- USB JTAG (debug).
PCB typical 100 mm × 60 mm. Datacenter add-in card form factor.
Yield and test:
Wafer testing:
- Die sort (probe card).
- Yield ~75%.
- Good dies packaged.
- Package test: 95% pass.
Net packaged-product yield: 0.75 × 0.95 = 71% final yield.
Manufacturing line (detail in module 7):
- Wafer: TSMC (28 nm CMOS).
- BEOL memristor: SIDRA workshop (UNAM).
- Packaging: ASE or Amkor (Taiwan).
- Test: SIDRA workshop.
Annual production target:
Y1 ~10K-100K chips/year (workshop capacity). Y10 millions (with mini-fab). Y100 tens of millions (full fab).
Experiment: Full-Signal-Chain Latency
GPT-2 1-token inference end-to-end:
| Stage | Time |
|---|---|
| 1. CPU input prep | 100 ns |
| 2. PCIe transfer (in) | 50 ns |
| 3. Y1 L3 → L2 → L1 DMA | 50 ns |
| 4. 12-layer MVM + compute | 5 µs |
| 5. Y1 L1 → L2 → L3 DMA | 50 ns |
| 6. PCIe transfer (out) | 50 ns |
| 7. CPU post-process | 100 ns |
| Total | ~5.4 µs |
Inference: 5 µs. Wrap (PCIe + DMA + CPU): 0.4 µs. PCIe overhead 7%.
Inside: Y1 net 4.9 µs, outside 0.4 µs. PCIe is efficient.
Throughput:
Sequential single-thread: 5.4 µs/token = 185K tokens/s.
Pipelined (16 parallel clusters): 16× = 3M tokens/s.
H100 batch 32: ~10M tokens/s. SIDRA Y1 batch 16: 3M tokens/s. 3× slower but 200× less energy.
Quick Quiz
Lab Exercise
Y1 packaging-options analysis.
Options:
- QFP (300 pins, cheap, low speed).
- BGA (1500 pins, mid).
- FC-BGA (1500 pins, high speed, good thermal). Y1 choice.
- CoWoS (5000 pins, very expensive). Y10 target.
Questions:
(a) For PCIe 5.0 × 16 lanes (total 64 GB/s) → how many pins? (b) For Y1 thermal dissipation (3 W) → what kind of heat spreader? (c) Y10 (30 W) packaging change? (d) Y100 (100 W) total cost: CoWoS + liquid cooling? (e) Can a packaging line be set up in Türkiye?
Solutions
(a) PCIe 5.0 × 16 lanes = 64 differential pairs × 2 (TX/RX) × 2 (diff) = 256 pins for PCIe alone. + power + I/O + others = ~1200 pins total. FC-BGA needed.
(b) 3 W → small heat spreader (10 mm² Cu plate). Passive cooling suffices (R_th ~15°C/W).
(c) Y10 30 W → bigger heat spreader + heat sink. Same package (FC-BGA), with extra heat sink.
(d) Y100 CoWoS: ~100/system. Total chip+cooling: ~1B capex.
(e) Türkiye packaging: ASELSAN, BİLGEM run low-pin BGA. FC-BGA is more advanced; not yet in Türkiye but feasible in 5-10 years. Investment ~$50-100M (mid-large single-line fab).
Cheat Sheet
- Signal chain: memristor → TIA → ADC → compute → DMA → PCIe.
- Y1 PCIe: 5.0 × 4 lanes = 16 GB/s.
- Packaging: FC-BGA, ~1500 pins, 30×30 mm.
- Thermal: flip-chip heat spreader, R_th 15°C/W.
- ESD: diodes per pin.
- Y10 target: CoWoS + HBM3 (1 TB/s).
- Y100: 3D stack + liquid.
- End-to-end inference: Y1 ~5.4 µs/token (~7% overhead).
Vision: The Heterogeneous Integration Era
- Y1: single-die FC-BGA.
- Y3: SIDRA + sensor chiplet (camera AI).
- Y10: SIDRA + HBM3 CoWoS.
- Y100: SIDRA + photonic + HBM 3D stack.
- Y1000: Wafer-scale (Cerebras-style) + bio-compatible.
For Türkiye: heterogeneous packaging needs mini-fab infrastructure. ASELSAN collaboration + national investment realistic for the Y10 era.
Further Reading
- Next chapter: 5.14 — Y1 / Y10 / Y100 Comparison
- Previous: 5.12 — Metal Lines and IR Drop
- Packaging: Lau, Heterogeneous Integrations, Springer 2018.
- PCIe spec: PCI-SIG official documentation.
- CoWoS: Yu et al., TSMC presentations.