🔌 Module 5 · Chip Hardware · Chapter 5.11 · 10 min read

Power and Thermal Management

How to live in a 3 W budget — DVFS, cooling, power delivery.

Prerequisites

5.10 — Noise Models

What you'll learn here

Break down the Y1 power budget by component
Explain DVFS (Dynamic Voltage and Frequency Scaling)
Compute thermal resistance and cooling strategies
Detail power gating and clock gating design
State Y10 → Y100 power-scaling concerns

Hook: 3 W = One LED Bulb

Y1’s TDP (Thermal Design Power) is 3 W — the same as an LED bulb. H100 is 233× that (700 W).

This 3 W:

Energy economics (edge devices: battery life).
Temperature (chip ~60°C, easy to cool).
Budget discipline (every CU has its share).

This chapter explains how Y1 holds 3 W, how DVFS works, and how temperature is managed.

Intuition: Three Power Strategies

Static power:

Transistor leakage (off-state still flows). 28 nm CMOS Y1: ~50 mW (~2% TDP). Manageable.

Dynamic power:

CMOS switching: $P = \alpha C V^2 f$ .

$\alpha$ : activity factor (~5-30%).
$C$ : switching capacitance.
$V$ : voltage.
$f$ : clock frequency.

Y1 dynamic power: ~2 W (most of TDP).

Crossbar power:

Memristor read energy (static DC). Per active crossbar ~10 mW.

Y1 strategy:

DVFS: low activity → drop frequency + voltage → less energy.
Power gating: turn idle CUs off (zero power).
Clock gating: clock only active blocks.

Formalism: Power Math and Thermal

L1 · Başlangıç

Y1 power budget:

Component	Power	Share
Crossbar (analog)	0.5 W	17%
ADC	1.0 W	33%
DAC	0.8 W	27%
Compute engine	0.3 W	10%
SRAM + DMA	0.2 W	7%
Clock + I/O	0.2 W	7%
Total	3.0 W	TDP

Activity factor:

In inference, only part of the chip is active. Typical 30% activity → 3 W. 100% activity (worst case) → 10 W (above TDP, throttle).

DVFS (Dynamic Voltage Frequency Scaling):

Mode	Frequency	Voltage	Power	Performance
Idle	100 MHz	0.6 V	100 mW	~1%
Low	250 MHz	0.7 V	500 mW	25%
Mid	500 MHz	0.8 V	1.5 W	50%
High	1 GHz	1.0 V	3 W	100%

Power scales as V²f → low V is a huge save. Standard in modern CPU/GPU.

Power gating:

Cut power to idle CUs. Zero power off-state. Wake-up ~1 µs. Y1 has 16 cluster-independent gating.

L2 · Tam

Thermal management:

Chip dissipates 3 W. If cooling is insufficient temperature climbs → memristor drift accelerates → ECC margin shrinks.

Thermal resistance:

$\Delta T = P \cdot R_{th}$

$P$ : power (W).
$R_{th}$ : thermal resistance (°C/W).
$\Delta T$ : temperature rise above ambient.

Y1 cooling options:

Cooling	$R_{th}$	$\Delta T$ @ 3W	Device example
Passive (heat spreader)	15 °C/W	45°C	Smartphone
Small heat sink	8 °C/W	24°C	Laptop
Active fan	3 °C/W	9°C	Embedded server
Liquid	0.5 °C/W	1.5°C	Datacenter

For Y1, passive heat spreader is enough (45°C @ 25°C ambient = 70°C). Memristor up to 85°C is OK.

Y10 (30 W) → small heat sink. Y100 (100 W) → active fan. Y1000 → liquid.

Throttling:

If temperature approaches 80°C, drop frequency → less power → cooler. Feedback loop.

Typical throttle: 80°C → DVFS High → Mid. 90°C → Low. 100°C → critical shutdown.

Thermal sensors:

Y1 has 4 thermal sensors per cluster (16 total). On-chip calibrated. The compute engine reads continuously.

L3 · Derin

Voltage regulator:

On-chip voltages 0.6-1.0 V. External supply 1.8 V or 3.3 V. LDO (Low Dropout Regulator) or switching converter in between.

Y1: switched-capacitor regulator + bandgap reference. Output selectable 0.6/0.8/1.0 V (DVFS).

Efficiency:

LDO: V_out / V_in = 33% (1.0 V output, 3.3 V input). Low. Switching: 85-90%. Better.

Y1 uses a switching converter.

Power delivery network (PDN):

For 3 W draw at 1 V → 3 A. Thick metal wires. PDN typically uses upper 5 layers (M16-M20).

PDN IR drop: 100 mV droop (10% on 1 V) tolerable. Decoupling capacitors (~1 nF/mm²) help.

Boot-time power-up:

Chip wake: voltages ramp, clock tree stabilizes, calibration, model load. ~1 second.

Low-power modes:

Active: full operation, 3 W.
Idle: clock-gated, only SRAM active, ~50 mW.
Sleep: most blocks off, periodic wake, ~5 mW.
Deep sleep: only RTC, ~0.5 mW (memristor non-volatile, model preserved).

On mobile devices, deep sleep most of the time → battery life lengthens.

Thermal modeling:

Chip thermal map: hot spots at ADC + DAC regions. Compiler load-balances across hot spots.

Spread heat across the chip → uniform temperature → uniform memristor drift.

Y100 thermal: 100 W, 1 cm², heat density ~10 W/cm² → cooling critical. Microfluidic cooling (chapter 5.15).

Experiment: Y1 Power Profile

Scenario: GPT-2 inference, 30% average activity.

Instantaneous power profile:

Phase	Active block	Power
Token input	DMA + L3	0.5 W
Embedding lookup	1 cluster	1.5 W
Layer 1 attention	4 CUs	2.0 W
Layer 1 FFN	4 CUs	2.5 W
Layers 2-12 sequentially	4 CUs/seq	2.5 W avg
Output projection	1 cluster	1.0 W
Token output	DMA + L3	0.5 W

Average: ~2 W per token (over 1 µs).

1000 tokens (1 ms): 2 W × 1 ms = 2 mJ. Easy to cool.

Worst case (sustained max):

All 16 clusters + every CU active → ~10 W. Above TDP 3W → throttle: 1 GHz → 500 MHz, 4 W → still over → 250 MHz, 1.5 W. Half performance.

Temperature:

Passive cooling 15°C/W → 3 W × 15 = 45°C above ambient.
25°C ambient + 45°C = 70°C die. Within memristor 85°C limit.

1 hour sustained: 3 W × 3600 s = 10.8 kJ energy. In battery terms: an 18650 cell (3.6 V × 3 Ah = 39 kJ) = 3.6 hours sustained. Mobile scenario at 10% activity → 36 hours.

Quick Quiz

1/6Y1 TDP?

300 mW3 W30 W300 W

Lab Exercise

Battery-life calculation for an edge device.

Device:

SIDRA Y1 (3 W TDP, 10% average activity) + ARM CPU (1 W) + sensors (0.5 W).
Battery: 4000 mAh @ 3.7 V = 14.8 Wh = 53 kJ.

Use scenarios:

(a) 100% SIDRA active (continuous inference): battery life? (b) 10% SIDRA active (always-on listening): battery life? (c) 1% SIDRA + idle (deep sleep most of the time): battery life? (d) Compare: same device with GPU instead: 700 W * 1% = 7 W battery insufficient. (e) Smart-earbuds battery target with SIDRA?

Solutions

(a) 3 W (SIDRA) + 1 W (CPU) + 0.5 W (sensor) = 4.5 W. 14.8 / 4.5 = 3.3 hours. Continuous AI is short.

(b) 0.3 W (SIDRA 10%) + 1 W + 0.5 W = 1.8 W. 14.8 / 1.8 = 8.2 hours. Good for a speech assistant.

(c) 0.03 W (SIDRA 1%) + 0.05 W (CPU sleep) + 0.1 W (sensor) = 0.18 W. 14.8 / 0.18 = 82 hours = 3.4 days. Smartwatch-class continuous wear.

(d) GPU 7 W + system 1.5 W = 8.5 W → 1.7 hours. SIDRA 8.2 hours = 5× longer.

(e) A SIDRA Y1-based earbud with a 24-hour battery target is realistic (always-on speech recognition + translation).

Cheat Sheet

Y1 TDP: 3 W.
Power budget: ADC 33%, DAC 27%, crossbar 17%, compute 10%.
DVFS: 4 modes, V²f savings.
Cooling: Y1 passive, Y10 heat sink, Y100 active/liquid.
Power gating: idle blocks off.
Temperature: Y1 ~70°C (limit 85°C).
Battery life: edge device 8-80 hours by activity.

Vision: The Sustainable AI Era

Y1: 3 W → mobile/edge.
Y3: 10 W → laptop, smart camera.
Y10: 30 W → workstation, datacenter blade.
Y100: 100 W → datacenter (liquid cooling).
Y1000: 100 W but 100× performance → photonic + 3D stack.

For Türkiye: edge AI devices are a new market. SIDRA-based 24-hour-battery devices can ship from 2027-2028. ASELSAN, BİLGEM, university startups can lead this front.

Prerequisites

What you'll learn here

🪝 Hook: 3 W = One LED Bulb

🧭 Intuition: Three Power Strategies

📐 Formalism: Power Math and Thermal

🧪 Experiment: Y1 Power Profile

📝 Quick Quiz

🛠️ Lab Exercise

🗂️ Cheat Sheet

🔮 Vision: The Sustainable AI Era

📚 Further Reading