Power and Thermal Management
How to live in a 3 W budget — DVFS, cooling, power delivery.
Prerequisites
What you'll learn here
- Break down the Y1 power budget by component
- Explain DVFS (Dynamic Voltage and Frequency Scaling)
- Compute thermal resistance and cooling strategies
- Detail power gating and clock gating design
- State Y10 → Y100 power-scaling concerns
Hook: 3 W = One LED Bulb
Y1’s TDP (Thermal Design Power) is 3 W — the same as an LED bulb. H100 is 233× that (700 W).
This 3 W:
- Energy economics (edge devices: battery life).
- Temperature (chip ~60°C, easy to cool).
- Budget discipline (every CU has its share).
This chapter explains how Y1 holds 3 W, how DVFS works, and how temperature is managed.
Intuition: Three Power Strategies
Static power:
Transistor leakage (off-state still flows). 28 nm CMOS Y1: ~50 mW (~2% TDP). Manageable.
Dynamic power:
CMOS switching: .
- : activity factor (~5-30%).
- : switching capacitance.
- : voltage.
- : clock frequency.
Y1 dynamic power: ~2 W (most of TDP).
Crossbar power:
Memristor read energy (static DC). Per active crossbar ~10 mW.
Y1 strategy:
- DVFS: low activity → drop frequency + voltage → less energy.
- Power gating: turn idle CUs off (zero power).
- Clock gating: clock only active blocks.
Formalism: Power Math and Thermal
Y1 power budget:
| Component | Power | Share |
|---|---|---|
| Crossbar (analog) | 0.5 W | 17% |
| ADC | 1.0 W | 33% |
| DAC | 0.8 W | 27% |
| Compute engine | 0.3 W | 10% |
| SRAM + DMA | 0.2 W | 7% |
| Clock + I/O | 0.2 W | 7% |
| Total | 3.0 W | TDP |
Activity factor:
In inference, only part of the chip is active. Typical 30% activity → 3 W. 100% activity (worst case) → 10 W (above TDP, throttle).
DVFS (Dynamic Voltage Frequency Scaling):
| Mode | Frequency | Voltage | Power | Performance |
|---|---|---|---|---|
| Idle | 100 MHz | 0.6 V | 100 mW | ~1% |
| Low | 250 MHz | 0.7 V | 500 mW | 25% |
| Mid | 500 MHz | 0.8 V | 1.5 W | 50% |
| High | 1 GHz | 1.0 V | 3 W | 100% |
Power scales as V²f → low V is a huge save. Standard in modern CPU/GPU.
Power gating:
Cut power to idle CUs. Zero power off-state. Wake-up ~1 µs. Y1 has 16 cluster-independent gating.
Thermal management:
Chip dissipates 3 W. If cooling is insufficient temperature climbs → memristor drift accelerates → ECC margin shrinks.
Thermal resistance:
- : power (W).
- : thermal resistance (°C/W).
- : temperature rise above ambient.
Y1 cooling options:
| Cooling | @ 3W | Device example | |
|---|---|---|---|
| Passive (heat spreader) | 15 °C/W | 45°C | Smartphone |
| Small heat sink | 8 °C/W | 24°C | Laptop |
| Active fan | 3 °C/W | 9°C | Embedded server |
| Liquid | 0.5 °C/W | 1.5°C | Datacenter |
For Y1, passive heat spreader is enough (45°C @ 25°C ambient = 70°C). Memristor up to 85°C is OK.
Y10 (30 W) → small heat sink. Y100 (100 W) → active fan. Y1000 → liquid.
Throttling:
If temperature approaches 80°C, drop frequency → less power → cooler. Feedback loop.
Typical throttle: 80°C → DVFS High → Mid. 90°C → Low. 100°C → critical shutdown.
Thermal sensors:
Y1 has 4 thermal sensors per cluster (16 total). On-chip calibrated. The compute engine reads continuously.
Voltage regulator:
On-chip voltages 0.6-1.0 V. External supply 1.8 V or 3.3 V. LDO (Low Dropout Regulator) or switching converter in between.
Y1: switched-capacitor regulator + bandgap reference. Output selectable 0.6/0.8/1.0 V (DVFS).
Efficiency:
LDO: V_out / V_in = 33% (1.0 V output, 3.3 V input). Low. Switching: 85-90%. Better.
Y1 uses a switching converter.
Power delivery network (PDN):
For 3 W draw at 1 V → 3 A. Thick metal wires. PDN typically uses upper 5 layers (M16-M20).
PDN IR drop: 100 mV droop (10% on 1 V) tolerable. Decoupling capacitors (~1 nF/mm²) help.
Boot-time power-up:
Chip wake: voltages ramp, clock tree stabilizes, calibration, model load. ~1 second.
Low-power modes:
- Active: full operation, 3 W.
- Idle: clock-gated, only SRAM active, ~50 mW.
- Sleep: most blocks off, periodic wake, ~5 mW.
- Deep sleep: only RTC, ~0.5 mW (memristor non-volatile, model preserved).
On mobile devices, deep sleep most of the time → battery life lengthens.
Thermal modeling:
Chip thermal map: hot spots at ADC + DAC regions. Compiler load-balances across hot spots.
Spread heat across the chip → uniform temperature → uniform memristor drift.
Y100 thermal: 100 W, 1 cm², heat density ~10 W/cm² → cooling critical. Microfluidic cooling (chapter 5.15).
Experiment: Y1 Power Profile
Scenario: GPT-2 inference, 30% average activity.
Instantaneous power profile:
| Phase | Active block | Power |
|---|---|---|
| Token input | DMA + L3 | 0.5 W |
| Embedding lookup | 1 cluster | 1.5 W |
| Layer 1 attention | 4 CUs | 2.0 W |
| Layer 1 FFN | 4 CUs | 2.5 W |
| Layers 2-12 sequentially | 4 CUs/seq | 2.5 W avg |
| Output projection | 1 cluster | 1.0 W |
| Token output | DMA + L3 | 0.5 W |
Average: ~2 W per token (over 1 µs).
1000 tokens (1 ms): 2 W × 1 ms = 2 mJ. Easy to cool.
Worst case (sustained max):
All 16 clusters + every CU active → ~10 W. Above TDP 3W → throttle: 1 GHz → 500 MHz, 4 W → still over → 250 MHz, 1.5 W. Half performance.
Temperature:
- Passive cooling 15°C/W → 3 W × 15 = 45°C above ambient.
- 25°C ambient + 45°C = 70°C die. Within memristor 85°C limit.
1 hour sustained: 3 W × 3600 s = 10.8 kJ energy. In battery terms: an 18650 cell (3.6 V × 3 Ah = 39 kJ) = 3.6 hours sustained. Mobile scenario at 10% activity → 36 hours.
Quick Quiz
Lab Exercise
Battery-life calculation for an edge device.
Device:
- SIDRA Y1 (3 W TDP, 10% average activity) + ARM CPU (1 W) + sensors (0.5 W).
- Battery: 4000 mAh @ 3.7 V = 14.8 Wh = 53 kJ.
Use scenarios:
(a) 100% SIDRA active (continuous inference): battery life? (b) 10% SIDRA active (always-on listening): battery life? (c) 1% SIDRA + idle (deep sleep most of the time): battery life? (d) Compare: same device with GPU instead: 700 W * 1% = 7 W battery insufficient. (e) Smart-earbuds battery target with SIDRA?
Solutions
(a) 3 W (SIDRA) + 1 W (CPU) + 0.5 W (sensor) = 4.5 W. 14.8 / 4.5 = 3.3 hours. Continuous AI is short.
(b) 0.3 W (SIDRA 10%) + 1 W + 0.5 W = 1.8 W. 14.8 / 1.8 = 8.2 hours. Good for a speech assistant.
(c) 0.03 W (SIDRA 1%) + 0.05 W (CPU sleep) + 0.1 W (sensor) = 0.18 W. 14.8 / 0.18 = 82 hours = 3.4 days. Smartwatch-class continuous wear.
(d) GPU 7 W + system 1.5 W = 8.5 W → 1.7 hours. SIDRA 8.2 hours = 5× longer.
(e) A SIDRA Y1-based earbud with a 24-hour battery target is realistic (always-on speech recognition + translation).
Cheat Sheet
- Y1 TDP: 3 W.
- Power budget: ADC 33%, DAC 27%, crossbar 17%, compute 10%.
- DVFS: 4 modes, V²f savings.
- Cooling: Y1 passive, Y10 heat sink, Y100 active/liquid.
- Power gating: idle blocks off.
- Temperature: Y1 ~70°C (limit 85°C).
- Battery life: edge device 8-80 hours by activity.
Vision: The Sustainable AI Era
- Y1: 3 W → mobile/edge.
- Y3: 10 W → laptop, smart camera.
- Y10: 30 W → workstation, datacenter blade.
- Y100: 100 W → datacenter (liquid cooling).
- Y1000: 100 W but 100× performance → photonic + 3D stack.
For Türkiye: edge AI devices are a new market. SIDRA-based 24-hour-battery devices can ship from 2027-2028. ASELSAN, BİLGEM, university startups can lead this front.
Further Reading
- Next chapter: 5.12 — Metal Lines and IR Drop
- Previous: 5.10 — Noise Models
- Thermal management: Pedram, Power Aware Design Methodologies, Springer.
- DVFS: Hennessy & Patterson, Computer Architecture, Ch. 1.
- Microfluidic cooling: Tuckerman & Pease, IEEE EDL 1981 (classic).