🔌 Module 5 · Chip Hardware · Chapter 5.11 · 10 min read

Power and Thermal Management

How to live in a 3 W budget — DVFS, cooling, power delivery.

Prerequisites

What you'll learn here

  • Break down the Y1 power budget by component
  • Explain DVFS (Dynamic Voltage and Frequency Scaling)
  • Compute thermal resistance and cooling strategies
  • Detail power gating and clock gating design
  • State Y10 → Y100 power-scaling concerns

Hook: 3 W = One LED Bulb

Y1’s TDP (Thermal Design Power) is 3 W — the same as an LED bulb. H100 is 233× that (700 W).

This 3 W:

  • Energy economics (edge devices: battery life).
  • Temperature (chip ~60°C, easy to cool).
  • Budget discipline (every CU has its share).

This chapter explains how Y1 holds 3 W, how DVFS works, and how temperature is managed.

Intuition: Three Power Strategies

Static power:

Transistor leakage (off-state still flows). 28 nm CMOS Y1: ~50 mW (~2% TDP). Manageable.

Dynamic power:

CMOS switching: P=αCV2fP = \alpha C V^2 f.

  • α\alpha: activity factor (~5-30%).
  • CC: switching capacitance.
  • VV: voltage.
  • ff: clock frequency.

Y1 dynamic power: ~2 W (most of TDP).

Crossbar power:

Memristor read energy (static DC). Per active crossbar ~10 mW.

Y1 strategy:

  1. DVFS: low activity → drop frequency + voltage → less energy.
  2. Power gating: turn idle CUs off (zero power).
  3. Clock gating: clock only active blocks.

Formalism: Power Math and Thermal

L1 · Başlangıç

Y1 power budget:

ComponentPowerShare
Crossbar (analog)0.5 W17%
ADC1.0 W33%
DAC0.8 W27%
Compute engine0.3 W10%
SRAM + DMA0.2 W7%
Clock + I/O0.2 W7%
Total3.0 WTDP

Activity factor:

In inference, only part of the chip is active. Typical 30% activity → 3 W. 100% activity (worst case) → 10 W (above TDP, throttle).

DVFS (Dynamic Voltage Frequency Scaling):

ModeFrequencyVoltagePowerPerformance
Idle100 MHz0.6 V100 mW~1%
Low250 MHz0.7 V500 mW25%
Mid500 MHz0.8 V1.5 W50%
High1 GHz1.0 V3 W100%

Power scales as V²f → low V is a huge save. Standard in modern CPU/GPU.

Power gating:

Cut power to idle CUs. Zero power off-state. Wake-up ~1 µs. Y1 has 16 cluster-independent gating.

L2 · Tam

Thermal management:

Chip dissipates 3 W. If cooling is insufficient temperature climbs → memristor drift accelerates → ECC margin shrinks.

Thermal resistance:

ΔT=PRth\Delta T = P \cdot R_{th}

  • PP: power (W).
  • RthR_{th}: thermal resistance (°C/W).
  • ΔT\Delta T: temperature rise above ambient.

Y1 cooling options:

CoolingRthR_{th}ΔT\Delta T @ 3WDevice example
Passive (heat spreader)15 °C/W45°CSmartphone
Small heat sink8 °C/W24°CLaptop
Active fan3 °C/W9°CEmbedded server
Liquid0.5 °C/W1.5°CDatacenter

For Y1, passive heat spreader is enough (45°C @ 25°C ambient = 70°C). Memristor up to 85°C is OK.

Y10 (30 W) → small heat sink. Y100 (100 W) → active fan. Y1000 → liquid.

Throttling:

If temperature approaches 80°C, drop frequency → less power → cooler. Feedback loop.

Typical throttle: 80°C → DVFS High → Mid. 90°C → Low. 100°C → critical shutdown.

Thermal sensors:

Y1 has 4 thermal sensors per cluster (16 total). On-chip calibrated. The compute engine reads continuously.

L3 · Derin

Voltage regulator:

On-chip voltages 0.6-1.0 V. External supply 1.8 V or 3.3 V. LDO (Low Dropout Regulator) or switching converter in between.

Y1: switched-capacitor regulator + bandgap reference. Output selectable 0.6/0.8/1.0 V (DVFS).

Efficiency:

LDO: V_out / V_in = 33% (1.0 V output, 3.3 V input). Low. Switching: 85-90%. Better.

Y1 uses a switching converter.

Power delivery network (PDN):

For 3 W draw at 1 V → 3 A. Thick metal wires. PDN typically uses upper 5 layers (M16-M20).

PDN IR drop: 100 mV droop (10% on 1 V) tolerable. Decoupling capacitors (~1 nF/mm²) help.

Boot-time power-up:

Chip wake: voltages ramp, clock tree stabilizes, calibration, model load. ~1 second.

Low-power modes:

  • Active: full operation, 3 W.
  • Idle: clock-gated, only SRAM active, ~50 mW.
  • Sleep: most blocks off, periodic wake, ~5 mW.
  • Deep sleep: only RTC, ~0.5 mW (memristor non-volatile, model preserved).

On mobile devices, deep sleep most of the time → battery life lengthens.

Thermal modeling:

Chip thermal map: hot spots at ADC + DAC regions. Compiler load-balances across hot spots.

Spread heat across the chip → uniform temperature → uniform memristor drift.

Y100 thermal: 100 W, 1 cm², heat density ~10 W/cm² → cooling critical. Microfluidic cooling (chapter 5.15).

Experiment: Y1 Power Profile

Scenario: GPT-2 inference, 30% average activity.

Instantaneous power profile:

PhaseActive blockPower
Token inputDMA + L30.5 W
Embedding lookup1 cluster1.5 W
Layer 1 attention4 CUs2.0 W
Layer 1 FFN4 CUs2.5 W
Layers 2-12 sequentially4 CUs/seq2.5 W avg
Output projection1 cluster1.0 W
Token outputDMA + L30.5 W

Average: ~2 W per token (over 1 µs).

1000 tokens (1 ms): 2 W × 1 ms = 2 mJ. Easy to cool.

Worst case (sustained max):

All 16 clusters + every CU active → ~10 W. Above TDP 3W → throttle: 1 GHz → 500 MHz, 4 W → still over → 250 MHz, 1.5 W. Half performance.

Temperature:

  • Passive cooling 15°C/W → 3 W × 15 = 45°C above ambient.
  • 25°C ambient + 45°C = 70°C die. Within memristor 85°C limit.

1 hour sustained: 3 W × 3600 s = 10.8 kJ energy. In battery terms: an 18650 cell (3.6 V × 3 Ah = 39 kJ) = 3.6 hours sustained. Mobile scenario at 10% activity → 36 hours.

Quick Quiz

1/6Y1 TDP?

Lab Exercise

Battery-life calculation for an edge device.

Device:

  • SIDRA Y1 (3 W TDP, 10% average activity) + ARM CPU (1 W) + sensors (0.5 W).
  • Battery: 4000 mAh @ 3.7 V = 14.8 Wh = 53 kJ.

Use scenarios:

(a) 100% SIDRA active (continuous inference): battery life? (b) 10% SIDRA active (always-on listening): battery life? (c) 1% SIDRA + idle (deep sleep most of the time): battery life? (d) Compare: same device with GPU instead: 700 W * 1% = 7 W battery insufficient. (e) Smart-earbuds battery target with SIDRA?

Solutions

(a) 3 W (SIDRA) + 1 W (CPU) + 0.5 W (sensor) = 4.5 W. 14.8 / 4.5 = 3.3 hours. Continuous AI is short.

(b) 0.3 W (SIDRA 10%) + 1 W + 0.5 W = 1.8 W. 14.8 / 1.8 = 8.2 hours. Good for a speech assistant.

(c) 0.03 W (SIDRA 1%) + 0.05 W (CPU sleep) + 0.1 W (sensor) = 0.18 W. 14.8 / 0.18 = 82 hours = 3.4 days. Smartwatch-class continuous wear.

(d) GPU 7 W + system 1.5 W = 8.5 W → 1.7 hours. SIDRA 8.2 hours = 5× longer.

(e) A SIDRA Y1-based earbud with a 24-hour battery target is realistic (always-on speech recognition + translation).

Cheat Sheet

  • Y1 TDP: 3 W.
  • Power budget: ADC 33%, DAC 27%, crossbar 17%, compute 10%.
  • DVFS: 4 modes, V²f savings.
  • Cooling: Y1 passive, Y10 heat sink, Y100 active/liquid.
  • Power gating: idle blocks off.
  • Temperature: Y1 ~70°C (limit 85°C).
  • Battery life: edge device 8-80 hours by activity.

Vision: The Sustainable AI Era

  • Y1: 3 W → mobile/edge.
  • Y3: 10 W → laptop, smart camera.
  • Y10: 30 W → workstation, datacenter blade.
  • Y100: 100 W → datacenter (liquid cooling).
  • Y1000: 100 W but 100× performance → photonic + 3D stack.

For Türkiye: edge AI devices are a new market. SIDRA-based 24-hour-battery devices can ship from 2027-2028. ASELSAN, BİLGEM, university startups can lead this front.

Further Reading

  • Next chapter: 5.12 — Metal Lines and IR Drop
  • Previous: 5.10 — Noise Models
  • Thermal management: Pedram, Power Aware Design Methodologies, Springer.
  • DVFS: Hennessy & Patterson, Computer Architecture, Ch. 1.
  • Microfluidic cooling: Tuckerman & Pease, IEEE EDL 1981 (classic).