Brain Energy Efficiency
20 W vs 1.5 kW — why the brain is thousands of times more efficient than a GPU, and how SIDRA narrows the gap.
Prerequisites
What you'll learn here
- State the brain's power budget (20 W) and how it's spent (spike, synapse, baseline)
- Explain how sparse coding (~2% active) drives energy efficiency
- Compare modern GPU/AI energy (H100, GPT-3 training) with concrete numbers
- State the Landauer limit (kT ln 2) and how far brain/SIDRA sit above it
- Map SIDRA Y1/Y10/Y100 onto biological efficiency targets
Hook: 86 Billion Cells, 20 Watts
The human brain consumes ~20% of a daily ~1700 kcal budget → ~340 kcal/day = an average 16-20 W. That’s an LED bulb. And yet:
- It recognizes images, parses language, reads emotion, plans, learns, remembers.
- Continuously, in parallel, without pause.
The other side:
- NVIDIA H100 GPU: ~700 W (single card).
- One GPT-4 training run (estimate): ~50,000+ H100-hours × 700 W ≈ 35 GWh (a Turkish city’s daily consumption).
- GPT-3 training: ~1287 MWh (Patterson et al. 2021), CO₂ footprint ~552 tons.
Efficiency ratio: the brain runs ~10¹⁴-10¹⁵ synaptic ops/s on 20 W. A modern GPU runs ~10¹⁴ FP8 ops/s on 700 W. Numerically that’s 35× efficiency gap — but the brain is also doing training + inference + learning + sensory processing in parallel; the GPU runs one task. System-level gap is 1000-10,000×.
This chapter unpacks where that gap comes from and how SIDRA tries to close it.
Intuition: The Brain's Three Tricks
The brain’s 20-watt miracle rests on three principles operating at three levels:
- Sparse coding (sparsity): at any moment, only 1-3% of cortical neurons are active. The rest rest → no ATP burned. (A GPU clocks every transistor every cycle → not sparse at all.)
- Event-driven (spike-based): no spike, no energy. A GPU at 1 GHz sends signals a billion times per second; the brain averages 1 Hz, only when an “event” happens.
- Analog computation: synaptic integration is multiply + add → one or two ion channels. A digital MAC needs 10⁵ transistors.
Combined into one equation:
Numbers:
- , Hz, nJ → ~26 W spikes
- , Hz, fJ → ~0.5 W synapses
- (membrane pump, vesicle recycling) → ~5-10 W
Total ~30-35 W (real ~20 W; this is an upper bound). Most energy is in spikes; the main savings come from sparse + event-driven.
SIDRA’s takeaways:
- ✅ Analog computation (memristor crossbar).
- ✅ Sparsity (only drive active rows).
- ⚠️ Event-driven still a prototype (Y3 target).
Formalism: Energy, Landauer, and Hardware Comparison
TOPS/W metric:
Gold standard for AI hardware: trillion operations per second per watt (TOPS = 10¹² ops/s). One “operation” is typically one MAC (multiply-accumulate).
| System | Op rate | Power | TOPS/W |
|---|---|---|---|
| Brain (synaptic event) | ~10¹⁴ /s | 20 W | ~5 |
| NVIDIA H100 (FP8 dense) | ~2 × 10¹⁵ /s | 700 W | ~3 |
| NVIDIA H100 (INT8 sparse, peak) | ~4 × 10¹⁵ /s | 700 W | ~6 |
| Apple M2 NPU (mobile) | ~1.6 × 10¹³ /s | 5 W | ~3 |
| SIDRA Y1 (estimated) | ~3 × 10¹³ /s | 3 W | ~10 |
| SIDRA Y10 (target) | ~3 × 10¹⁵ /s | 30 W | ~100 |
| SIDRA Y100 (vision) | ~3 × 10¹⁶ /s | 100 W | ~300 |
In raw TOPS/W, SIDRA Y1 is already ~2× H100. Y100’s target is ~50× H100.
Important caveat: synaptic event ≠ FP8 MAC. The brain does “soft” approximate compute; the GPU does exact compute. The comparison is rough; the right takeaway: analog + sparse + event-driven can be 10-1000× more efficient than digital + dense + clocked.
Landauer limit (1961):
Erasing one bit of information thermodynamically requires at least of energy.
- J/K (Boltzmann constant)
- K (room temperature)
The physical floor. No (irreversible) computation can run on less.
Comparison:
| System | Bit-erase energy | × Landauer |
|---|---|---|
| Landauer floor | 2.87 zJ | 1× |
| Brain synaptic event | ~10 fJ ≈ 10⁷ zJ | 10⁷ |
| SIDRA Y1 memristor read | ~0.1 pJ ≈ 10⁸ zJ | 10⁸ |
| Modern CMOS gate | ~1 fJ ≈ 10⁶ zJ | 10⁶ |
The brain runs at 10 million × Landauer — yet still beats modern hardware by orders of magnitude at the system level. The lesson isn’t “approach the physical floor”, it’s “cut unnecessary computation in the system”.
Why isn’t the brain at Landauer? Because it dissipates heat (literally), pumps ions (chemical work), uses noisy signals (no quantization), doesn’t do reversible computing. Reversible computing can theoretically go below Landauer; in practice it remains research.
The math of sparse coding:
A cortical region’s activity rate (~0.01-0.03). That isn’t 1 Hz per neuron, it’s the regional average. Spike energy:
- cortical neurons
- (~2% active)
- Hz (rate when active)
- nJ
Result: W (cortex spikes).
If (every neuron always active): W. Brain would burn. Sparsity saves a factor of 50.
SIDRA parallel — activity factor:
In chapter 1.6 we saw the CMOS dynamic power . Same here. Y1’s chip-level activity factor is (computed in lab 3.1). That’s actually ~3× more sparse than the brain’s 2%. SIDRA is already ahead of the biological sparsity bar (thanks to circuit discipline).
Training energy:
The brain learns online — ~340 kcal/day × 0.1 (estimate share for learning) = 34 kcal/day ≈ 142 kJ. Lifelong learning ~70 years × 34 kcal/day = 870 MJ ≈ 240 kWh.
A single GPT-3 training run = 1287 MWh. About 5400× more than a 240 kWh human brain, and that’s for one model. The brain’s scale is absurd: in 70 years it learns 5000 different things (walking, language, logic, music, driving, social order, millions of faces, tens of millions of words…).
Decarbonization angle: brain-style online + sparse learning hardware can shrink AI’s climate footprint by orders of magnitude. SIDRA’s “sustainable AI” claim sits here.
Experiment: One Hour of Thought vs One Hour of GPU Training
Compare: 1 hour of human brain vs 1 hour of NVIDIA H100 GPU.
| Metric | Human brain | H100 GPU |
|---|---|---|
| Power | 20 W | 700 W |
| Hourly energy | 0.02 kWh | 0.7 kWh |
| Cost (TR home rate ~3 TL/kWh) | 0.06 TL | 2.10 TL |
| CO₂ (TR grid ~0.4 kg CO₂/kWh) | 8 g | 280 g |
1 year continuously: Brain: ~175 kWh, ~526 TL, ~70 kg CO₂. GPU: ~6132 kWh, ~18,400 TL, ~2453 kg CO₂. GPU 35× more expensive, 35× dirtier.
Now SIDRA Y1 (estimate):
- Power: 3 W
- 1 hour: 0.003 kWh
- 1 year: 26 kWh, 79 TL, 10.5 kg CO₂
SIDRA Y1 runs 6× more efficient than the brain (in raw power; but only does a small subset of brain functions).
SIDRA Y100 target (100 W):
- 1 hour: 0.1 kWh, 30 kuruş, 40 g CO₂.
- 1 year: 876 kWh, 2628 TL, 350 kg CO₂.
- 5× over the brain but 7× cheaper than GPU, doing a lot more work.
Bottom line: SIDRA closes the brain efficiency gap much faster than GPU.
Quick Quiz
Lab Exercise
Evaluate SIDRA Y1 as an alternative to an H100 GPU cluster in a data center.
Data:
- A data center runs 1000 H100 GPUs: typical AI training cluster.
- 700 W each → cluster power: 700 kW. Cooling + facility = +50% → total 1.05 MW.
- Annual consumption: 1.05 MW × 8760 h = 9.2 GWh/year.
- TR grid 0.4 kg CO₂/kWh → 3680 t CO₂/year.
- TR industrial electricity ~2 TL/kWh → 18.4 M TL/year.
SIDRA Y1 alternative:
- 1 H100 ≈ 1 billion transistors; 1 SIDRA Y1 ≈ 419M memristors + ~28 nm CMOS base (~1B transistors).
- For the same inference throughput, how many SIDRA Y1 do we need?
Questions:
(a) H100 inference: ~4 PFLOPS FP8 sparse. SIDRA Y1: ~30 TOPS analog → ~130 SIDRA Y1 = 1 H100 for inference. (b) 1000 H100 → 130,000 SIDRA Y1 → 3 W each → 390 kW total (cooling +50% → 585 kW). (c) Annual energy? (d) Annual CO₂? (e) Annual cost? (f) CapEx delta: 1000 H100 = $50M ≈ 2 B TL. 130,000 Y1 (domestic estimate) ≈ ~800 M TL. Payback?
Solutions
(a) 4000 / 30 ≈ 133 → ~130 Y1 ≈ 1 H100 inference. (Different for training; Y1 is inference-focused.)
(b) Total power: 130,000 × 3 W = 390 kW. With 50% cooling → 585 kW. 44% less than the GPU cluster.
(c) 585 kW × 8760 h = 5.13 GWh/year (vs 9.2 GWh GPU). 44% savings.
(d) 5.13 × 0.4 = 2050 t CO₂/year (vs 3680). 44% less pollution.
(e) 5.13 × 10⁶ kWh × 2 TL = 10.3 M TL/year (vs 18.4 M). 8.1 M TL/year savings.
(f) CapEx delta: 2 B − 800 M = 1.2 B TL gained (Y1 cheaper). Plus 8 M TL/yr opex savings. Big win — but Y1 is inference only; training still on GPU. Scenario: moving 50% of inference to SIDRA → ~5 M TL/yr opex + ~1 B TL CapEx delta.
Note: these are based on SIDRA Y1 prototype assumptions, with high uncertainty. Y10 (2029) numbers will be much stronger. Critical point: SIDRA is a side-by-side architecture to GPU (for inference) — not a replacement, a complement.
Cheat Sheet
- Brain power: ~20 W. ~20% of body energy.
- Three efficiency tricks: sparse coding (~2% active), event-driven (spikes), analog compute (synaptic MAC).
- Spike energy: ~0.3 nJ × 86B neurons × 1 Hz = ~26 W (real 20 W; calc is upper bound).
- Landauer floor: kT ln 2 = 2.87 zJ. Modern CMOS ~10⁶ above; brain ~10⁷ above.
- GPU compare: H100 700 W → ~3-6 TOPS/W. SIDRA Y1 ~10, Y100 target ~300.
- GPT-3 train: 1287 MWh. Human brain over 70 years: 240 kWh — 5400× more efficient.
- SIDRA’s edge: analog + sparse + event-driven (Y3+).
Vision: Sustainable AI and SIDRA's Mission
AI’s energy demand could reach 5-10% of global electricity consumption by 2030 (IEA estimate). The fix isn’t faster GPUs — it’s a different architecture.
- Y1 (today): 3 W TDP, ~10 TOPS/W. ~2× more efficient than GPU at limited capacity.
- Y3 (2027): 10 W, ~30 TOPS/W. Event-driven prototype added.
- Y10 (2029): 30 W, ~100 TOPS/W. Edge AI scales (smart cameras, robots, mobile).
- Y100 (2031+): 100 W, ~300 TOPS/W. Standard for data-center inference. Training still GPU-bound.
- Y1000 (long horizon): 100 W, ~1000 TOPS/W. Both training and inference analog. AI climate footprint drops 1000×.
Strategic angle for Türkiye: AI data centers will be major electricity consumers. If Türkiye builds a SIDRA-based national AI infrastructure: (1) it cuts national power demand, (2) carves an independent path in semiconductors, (3) opens an export market. The SIDRA workshop is the first concrete step on that path.
Unexpected future: brain-budgeted AI. A laptop-sized device drawing 20 W and running a GPT-class model. Possible at Y100. 2032-2035 horizon, and Türkiye has a real shot at being one of the few countries that can ship such a product. (The US and China will build their own Y100 — but the category has room for multiple players; SIDRA’s workshop + academic ecosystem is in the race.)
Further Reading
- Next chapter: 3.5 — From Artificial Neuron to Transformer
- Previous: 3.3 — Hebbian Learning
- Brain energy budget: Attwell & Laughlin, An energy budget for signaling in the grey matter of the brain, J. Cereb. Blood Flow Metab. 2001.
- Landauer: R. Landauer, Irreversibility and heat generation in the computing process, IBM J. Res. Dev. 1961.
- GPT-3 energy: Patterson et al., Carbon emissions and large neural network training, arXiv:2104.10350 (2021).
- AI climate impact: Strubell, Ganesh, McCallum, Energy and policy considerations for deep learning in NLP, ACL 2019.
- Neuromorphic energy: Davies et al., Loihi: A neuromorphic manycore processor with on-chip learning, IEEE Micro 2018.