🧠 Module 3 · From Biology to Algorithm · Chapter 3.4 · 13 min read

Brain Energy Efficiency

20 W vs 1.5 kW — why the brain is thousands of times more efficient than a GPU, and how SIDRA narrows the gap.

Prerequisites

What you'll learn here

State the brain's power budget (20 W) and how it's spent (spike, synapse, baseline)
Explain how sparse coding (~2% active) drives energy efficiency
Compare modern GPU/AI energy (H100, GPT-3 training) with concrete numbers
State the Landauer limit (kT ln 2) and how far brain/SIDRA sit above it
Map SIDRA Y1/Y10/Y100 onto biological efficiency targets

Hook: 86 Billion Cells, 20 Watts

The human brain consumes ~20% of a daily ~1700 kcal budget → ~340 kcal/day = an average 16-20 W. That’s an LED bulb. And yet:

It recognizes images, parses language, reads emotion, plans, learns, remembers.
Continuously, in parallel, without pause.

The other side:

NVIDIA H100 GPU: ~700 W (single card).
One GPT-4 training run (estimate): ~50,000+ H100-hours × 700 W ≈ 35 GWh (a Turkish city’s daily consumption).
GPT-3 training: ~1287 MWh (Patterson et al. 2021), CO₂ footprint ~552 tons.

Efficiency ratio: the brain runs ~10¹⁴-10¹⁵ synaptic ops/s on 20 W. A modern GPU runs ~10¹⁴ FP8 ops/s on 700 W. Numerically that’s 35× efficiency gap — but the brain is also doing training + inference + learning + sensory processing in parallel; the GPU runs one task. System-level gap is 1000-10,000×.

This chapter unpacks where that gap comes from and how SIDRA tries to close it.

Intuition: The Brain's Three Tricks

The brain’s 20-watt miracle rests on three principles operating at three levels:

Sparse coding (sparsity): at any moment, only 1-3% of cortical neurons are active. The rest rest → no ATP burned. (A GPU clocks every transistor every cycle → not sparse at all.)
Event-driven (spike-based): no spike, no energy. A GPU at 1 GHz sends signals a billion times per second; the brain averages 1 Hz, only when an “event” happens.
Analog computation: synaptic integration is multiply + add → one or two ion channels. A digital MAC needs 10⁵ transistors.

Combined into one equation:

P_{\text{brain}} \approx N_{\text{neuron}} \cdot f_{\text{avg}} \cdot E_{\text{spike}} + N_{\text{synapse}} \cdot f_{\text{syn}} \cdot E_{\text{syn}} + P_{\text{baseline}}

Numbers:

$N_{\text{neuron}} = 86 \times 10^9$ , $f_{\text{avg}} = 1$ Hz, $E_{\text{spike}} = 0.3$ nJ → ~26 W spikes
$N_{\text{syn}} = 10^{14}$ , $f_{\text{syn}} = 0.5$ Hz, $E_{\text{syn}} = 10$ fJ → ~0.5 W synapses
$P_{\text{baseline}}$ (membrane pump, vesicle recycling) → ~5-10 W

Total ~30-35 W (real ~20 W; this is an upper bound). Most energy is in spikes; the main savings come from sparse + event-driven.

SIDRA’s takeaways:

✅ Analog computation (memristor crossbar).
✅ Sparsity (only drive active rows).
⚠️ Event-driven still a prototype (Y3 target).

Formalism: Energy, Landauer, and Hardware Comparison

L1 · Başlangıç

TOPS/W metric:

Gold standard for AI hardware: trillion operations per second per watt (TOPS = 10¹² ops/s). One “operation” is typically one MAC (multiply-accumulate).

System	Op rate	Power	TOPS/W
Brain (synaptic event)	~10¹⁴ /s	20 W	~5
NVIDIA H100 (FP8 dense)	~2 × 10¹⁵ /s	700 W	~3
NVIDIA H100 (INT8 sparse, peak)	~4 × 10¹⁵ /s	700 W	~6
Apple M2 NPU (mobile)	~1.6 × 10¹³ /s	5 W	~3
SIDRA Y1 (estimated)	~3 × 10¹³ /s	3 W	~10
SIDRA Y10 (target)	~3 × 10¹⁵ /s	30 W	~100
SIDRA Y100 (vision)	~3 × 10¹⁶ /s	100 W	~300

In raw TOPS/W, SIDRA Y1 is already ~2× H100. Y100’s target is ~50× H100.

Important caveat: synaptic event ≠ FP8 MAC. The brain does “soft” approximate compute; the GPU does exact compute. The comparison is rough; the right takeaway: analog + sparse + event-driven can be 10-1000× more efficient than digital + dense + clocked.

L2 · Tam

Landauer limit (1961):

Erasing one bit of information thermodynamically requires at least $kT \ln 2$ of energy.

$k = 1.38 \times 10^{-23}$ J/K (Boltzmann constant)
$T = 300$ K (room temperature)
$\ln 2 \approx 0.693$

E_{\text{Landauer}} = kT \ln 2 = 2.87 \times 10^{-21} \text{ J} = 2.87 \text{ zJ}

The physical floor. No (irreversible) computation can run on less.

Comparison:

System	Bit-erase energy	× Landauer
Landauer floor	2.87 zJ	1×
Brain synaptic event	~10 fJ ≈ 10⁷ zJ	10⁷
SIDRA Y1 memristor read	~0.1 pJ ≈ 10⁸ zJ	10⁸
Modern CMOS gate	~1 fJ ≈ 10⁶ zJ	10⁶

The brain runs at 10 million × Landauer — yet still beats modern hardware by orders of magnitude at the system level. The lesson isn’t “approach the physical floor”, it’s “cut unnecessary computation in the system”.

Why isn’t the brain at Landauer? Because it dissipates heat (literally), pumps ions (chemical work), uses noisy signals (no quantization), doesn’t do reversible computing. Reversible computing can theoretically go below Landauer; in practice it remains research.

L3 · Derin

The math of sparse coding:

A cortical region’s activity rate $\alpha$ (~0.01-0.03). That isn’t 1 Hz per neuron, it’s the regional average. Spike energy:

P_{\text{spikes}} = N \cdot \alpha \cdot f_{\max} \cdot E_{\text{spike}}

$N = 10^{10}$ cortical neurons
$\alpha = 0.02$ (~2% active)
$f_{\max} = 50$ Hz (rate when active)
$E_{\text{spike}} = 0.3$ nJ

Result: $P = 10^{10} \times 0.02 \times 50 \times 0.3 \times 10^{-9} = 3$ W (cortex spikes).

If $\alpha = 1$ (every neuron always active): $3 \times 50 = 150$ W. Brain would burn. Sparsity saves a factor of 50.

SIDRA parallel — activity factor:

In chapter 1.6 we saw the CMOS dynamic power $P = \alpha C V^2 f$ . Same $\alpha$ here. Y1’s chip-level activity factor is $\alpha \approx 0.7\%$ (computed in lab 3.1). That’s actually ~3× more sparse than the brain’s 2%. SIDRA is already ahead of the biological sparsity bar (thanks to circuit discipline).

Training energy:

The brain learns online — ~340 kcal/day × 0.1 (estimate share for learning) = 34 kcal/day ≈ 142 kJ. Lifelong learning ~70 years × 34 kcal/day = 870 MJ ≈ 240 kWh.

A single GPT-3 training run = 1287 MWh. About 5400× more than a 240 kWh human brain, and that’s for one model. The brain’s scale is absurd: in 70 years it learns 5000 different things (walking, language, logic, music, driving, social order, millions of faces, tens of millions of words…).

Decarbonization angle: brain-style online + sparse learning hardware can shrink AI’s climate footprint by orders of magnitude. SIDRA’s “sustainable AI” claim sits here.

Experiment: One Hour of Thought vs One Hour of GPU Training

Compare: 1 hour of human brain vs 1 hour of NVIDIA H100 GPU.

Metric	Human brain	H100 GPU
Power	20 W	700 W
Hourly energy	0.02 kWh	0.7 kWh
Cost (TR home rate ~3 TL/kWh)	0.06 TL	2.10 TL
CO₂ (TR grid ~0.4 kg CO₂/kWh)	8 g	280 g

1 year continuously: Brain: ~175 kWh, ~526 TL, ~70 kg CO₂. GPU: ~6132 kWh, ~18,400 TL, ~2453 kg CO₂. GPU 35× more expensive, 35× dirtier.

Now SIDRA Y1 (estimate):

Power: 3 W
1 hour: 0.003 kWh
1 year: 26 kWh, 79 TL, 10.5 kg CO₂

SIDRA Y1 runs 6× more efficient than the brain (in raw power; but only does a small subset of brain functions).

SIDRA Y100 target (100 W):

1 hour: 0.1 kWh, 30 kuruş, 40 g CO₂.
1 year: 876 kWh, 2628 TL, 350 kg CO₂.
5× over the brain but 7× cheaper than GPU, doing a lot more work.

Bottom line: SIDRA closes the brain efficiency gap much faster than GPU.

Quick Quiz

1/6Approximate average brain power consumption?

Lab Exercise

Evaluate SIDRA Y1 as an alternative to an H100 GPU cluster in a data center.

Data:

A data center runs 1000 H100 GPUs: typical AI training cluster.
700 W each → cluster power: 700 kW. Cooling + facility = +50% → total 1.05 MW.
Annual consumption: 1.05 MW × 8760 h = 9.2 GWh/year.
TR grid 0.4 kg CO₂/kWh → 3680 t CO₂/year.
TR industrial electricity ~2 TL/kWh → 18.4 M TL/year.

SIDRA Y1 alternative:

1 H100 ≈ 1 billion transistors; 1 SIDRA Y1 ≈ 419M memristors + ~28 nm CMOS base (~1B transistors).
For the same inference throughput, how many SIDRA Y1 do we need?

Questions:

(a) H100 inference: ~4 PFLOPS FP8 sparse. SIDRA Y1: ~30 TOPS analog → ~130 SIDRA Y1 = 1 H100 for inference. (b) 1000 H100 → 130,000 SIDRA Y1 → 3 W each → 390 kW total (cooling +50% → 585 kW). (c) Annual energy? (d) Annual CO₂? (e) Annual cost? (f) CapEx delta: 1000 H100 = $50M ≈ 2 B TL. 130,000 Y1 (domestic estimate) ≈ ~800 M TL. Payback?

Solutions

(a) 4000 / 30 ≈ 133 → ~130 Y1 ≈ 1 H100 inference. (Different for training; Y1 is inference-focused.)

(b) Total power: 130,000 × 3 W = 390 kW. With 50% cooling → 585 kW. 44% less than the GPU cluster.

(d) 5.13 × 0.4 = 2050 t CO₂/year (vs 3680). 44% less pollution.

(e) 5.13 × 10⁶ kWh × 2 TL = 10.3 M TL/year (vs 18.4 M). 8.1 M TL/year savings.

(f) CapEx delta: 2 B − 800 M = 1.2 B TL gained (Y1 cheaper). Plus 8 M TL/yr opex savings. Big win — but Y1 is inference only; training still on GPU. Scenario: moving 50% of inference to SIDRA → ~5 M TL/yr opex + ~1 B TL CapEx delta.

Note: these are based on SIDRA Y1 prototype assumptions, with high uncertainty. Y10 (2029) numbers will be much stronger. Critical point: SIDRA is a side-by-side architecture to GPU (for inference) — not a replacement, a complement.

Cheat Sheet

Brain power: ~20 W. ~20% of body energy.
Three efficiency tricks: sparse coding (~2% active), event-driven (spikes), analog compute (synaptic MAC).
Spike energy: ~0.3 nJ × 86B neurons × 1 Hz = ~26 W (real 20 W; calc is upper bound).
Landauer floor: kT ln 2 = 2.87 zJ. Modern CMOS ~10⁶ above; brain ~10⁷ above.
GPU compare: H100 700 W → ~3-6 TOPS/W. SIDRA Y1 ~10, Y100 target ~300.
GPT-3 train: 1287 MWh. Human brain over 70 years: 240 kWh — 5400× more efficient.
SIDRA’s edge: analog + sparse + event-driven (Y3+).

Vision: Sustainable AI and SIDRA's Mission

AI’s energy demand could reach 5-10% of global electricity consumption by 2030 (IEA estimate). The fix isn’t faster GPUs — it’s a different architecture.

Y1 (today): 3 W TDP, ~10 TOPS/W. ~2× more efficient than GPU at limited capacity.
Y3 (2027): 10 W, ~30 TOPS/W. Event-driven prototype added.
Y10 (2029): 30 W, ~100 TOPS/W. Edge AI scales (smart cameras, robots, mobile).
Y100 (2031+): 100 W, ~300 TOPS/W. Standard for data-center inference. Training still GPU-bound.
Y1000 (long horizon): 100 W, ~1000 TOPS/W. Both training and inference analog. AI climate footprint drops 1000×.

Strategic angle for Türkiye: AI data centers will be major electricity consumers. If Türkiye builds a SIDRA-based national AI infrastructure: (1) it cuts national power demand, (2) carves an independent path in semiconductors, (3) opens an export market. The SIDRA workshop is the first concrete step on that path.

Unexpected future: brain-budgeted AI. A laptop-sized device drawing 20 W and running a GPT-class model. Possible at Y100. 2032-2035 horizon, and Türkiye has a real shot at being one of the few countries that can ship such a product. (The US and China will build their own Y100 — but the category has room for multiple players; SIDRA’s workshop + academic ecosystem is in the race.)

Prerequisites

What you'll learn here

🪝 Hook: 86 Billion Cells, 20 Watts

🧭 Intuition: The Brain's Three Tricks

📐 Formalism: Energy, Landauer, and Hardware Comparison

🧪 Experiment: One Hour of Thought vs One Hour of GPU Training

📝 Quick Quiz

🛠️ Lab Exercise

🗂️ Cheat Sheet

🔮 Vision: Sustainable AI and SIDRA's Mission

📚 Further Reading