End-to-End Production Stack Lab
Module 6's closing dive — a SIDRA app from scratch to deploy.
Prerequisites
What you'll learn here
- Tie together Module 6's 9 chapters with one end-to-end project
- Trace the PyTorch model → SIDRA inference flow step by step
- Validate how the 5 software layers interact
- Summarize production CI/CD and version management
- Prepare for Module 7 (manufacturing)
Hook: The Whole Module 6 Stack
Module 6 covered 9 chapters: driver → kernel module → firmware → ISPP → SDK → PyTorch → compiler → simulator → test. This chapter assembles them in a real-world scenario:
“Deploy a Turkish speech-recognition + translation app on SIDRA Y1.”
Then Module 7 (manufacturing) starts — software ready, time for real wafers.
Intuition: 9 Steps, 1 Product
Scenario: a Turkish startup ships a “Local AI Assistant” product. SIDRA Y1 + ARM CPU + microphone. Whisper-tiny (ASR) + MarianNMT (TR→EN) on SIDRA.
Development steps:
- Pick models (HuggingFace Whisper + Marian).
- Quantization-aware training (with Turkish data).
- Compiler (PyTorch → SIDRA binary).
- Simulator validation (accuracy 95%).
- FPGA prototype test.
- Driver + firmware test.
- Y1 prototype deploy.
- CI/CD pipeline.
- Production and distribution.
Formalism: End-to-End Pipeline
Step 1: Pick and download models.
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny")
processor = WhisperProcessor.from_pretrained("openai/whisper-tiny")39M parameters. Uses 9% of SIDRA Y1.
Step 2: Fine-tune with Turkish (CPU/GPU).
import torch
from datasets import load_dataset
tr_dataset = load_dataset("mozilla-foundation/common_voice_13_0", "tr")
# Standard PyTorch training
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
for epoch in range(3):
for batch in tr_dataset:
loss = model(**batch).loss
loss.backward()
optimizer.step()
# Result: Turkish WER 15% (acceptable).Step 3: QAT (Quantization-Aware Training).
from torch.quantization import prepare_qat, convert
model_qat = prepare_qat(model)
# 1 epoch QAT
for batch in tr_dataset:
model_qat(**batch).loss.backward()
optimizer.step()
model_int8 = convert(model_qat)Result: INT8 model. Accuracy loss 0.5%.
Step 4: SIDRA compile.
import sidra
compiled = sidra.compile(
model_int8,
calib_data=tr_dataset[:100],
target="y1",
optimization_level=2
)
print(compiled.size_mb, compiled.crossbar_count)
# 78 MB, 600 crossbars Step 5: Simulator validation.
sim = sidra.AccurateSimulator(compiled)
test_wer = sim.benchmark_wer(tr_test_set)
print(f"Sim WER: {test_wer:.1%}") # 16.5%Acceptance criterion: WER under 20%. Pass.
Step 6: FPGA test.
Run the same binary on the FPGA prototype. Accuracy confirmed.
fpga = sidra.FPGADevice("xilinx_u280")
fpga.deploy(compiled)
fpga_wer = fpga.benchmark_wer(tr_test_set)
# 16.7% (matches sim)Step 7: Y1 deploy.
chip = sidra.Chip(0) # real Y1
chip.deploy(compiled)
# Inference
audio = record_microphone(seconds=5)
text = chip.infer_whisper(audio)
print(text) # "Merhaba, nasılsın?"5 seconds of audio → 100 ms inference. 50 mJ energy.
Step 8: Add the translation model.
Marian NMT TR→EN follows a similar pipeline:
nmt = sidra.compile(marian_model, ...)
chip.deploy([whisper_compiled, nmt_compiled]) # two models in parallel
text = chip.infer_whisper(audio) # TR
translation = chip.infer_marian(text) # EN
print(translation) # "Hello, how are you?"Total pipeline: 200 ms audio → text → translation.
Step 9: CI/CD.
GitHub Actions workflow:
name: SIDRA Build & Test
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install SIDRA SDK
run: pip install sidra
- name: Test on simulator
run: python tests/test_whisper.py
- name: Test on FPGA (self-hosted runner)
run: python tests/test_fpga.py
if: github.ref == 'refs/heads/main'Auto-tests every commit. PR reviews show CI status.
Version management:
my-app v1.0.0
├── sidra-sdk 1.2.3
│ ├── sidra-driver 1.1.0
│ ├── sidra-firmware 1.0.5
│ └── sidra-compiler 1.2.0
├── pytorch 2.4.0
└── transformers 4.50.0SemVer + lock file. Production stable, dev streets bleeding edge.
Production deploy:
Stages on the customer device:
- SIDRA driver (kernel module, system install).
- Firmware (Y1 boot ROM, once).
- SDK runtime (ships with app).
- Compiled model (inside the app).
OTA (Over-The-Air) updates: model + firmware update from the field.
Production scale:
The Local AI Assistant startup ships 100K devices/year. Each uses 1 SIDRA Y1 → 100K chips/year.
SIDRA workshop capacity: 100K/year (chapter 5.14). Match. Domestic production possible.
Y10 + scale:
Same product on Y10: GPT-4-class model. Phone-sized device. Volume 100K → 1M/year. Mini-fab needed (chapter 7.5).
Experiment: Full-Pipeline Performance
Local AI Assistant Y1:
| Stage | Time |
|---|---|
| Microphone recording | 5000 ms (user speaks) |
| Audio preprocessing (CPU) | 50 ms |
| Whisper inference (SIDRA) | 100 ms |
| Marian translation (SIDRA) | 50 ms |
| TTS (CPU, post-process) | 200 ms |
| Total | 5400 ms |
Pure inference: 150 ms (SIDRA share). Low latency.
Energy:
- Inference (SIDRA): 50 + 25 = 75 mJ.
- CPU + sensor: ~500 mJ (5-second active use).
- Total: ~575 mJ/interaction.
Battery: 4000 mAh = 53 kJ. Interactions? 53000 / 0.575 = 92,000 interactions. 100/day → 920 days = 2.5 years battery.
(In reality: idle power dominates. Active inference energy negligible for battery life.)
Module 6 Closing Quiz
Integrated Lab: Local AI Assistant Design
You’re a Turkish startup. Design a “Local AI Assistant” on SIDRA Y1.
Targets:
- Turkish speech + translation.
- 24-hour battery.
- 200 TL retail.
- 100K/year volume.
Decisions:
(a) Which models? Sizes?
(b) How much spare Y1 capacity is enough?
(c) How to optimize battery life?
(d) Version management: how to OTA update?
(e) Production CapEx + opex?
Solutions
(a) Whisper-tiny (39M) + MarianNMT TR-EN (75M) + TTS (50M) = 164M parameters = 39% of SIDRA Y1. Headroom for more models or batches.
(b) Y1 419M, model 164M → 39%. Efficient use.
(c) DVFS Idle (chapter 5.11) → 100 mW idle. Active inference 3 W but ~5% time. Average: 0.3 W. Battery 14.8 Wh / 0.3 W = 50 hours = 2 days. 24-hour target comfortable.
(d) OTA: SIDRA SDK supports firmware + model updates. 4G download 100 MB + ISPP reflash. 640 ms on Y1. End-user invisible.
(e) CapEx: SIDRA Y1 workshop investment 5M (across products). Product-specific R&D 500K. Opex: Y1 chip 50, ARM SoC 20, mic 30 = $243/unit. 100K sales at 200 TL → low margin. Y3 reduces price.
Conclusion: the Local AI Assistant is productizable in 2027. A concrete output of the Turkish AI ecosystem.
Module 6 Cheat Sheet
10 chapters in summary:
- 6.1 OS + PCIe driver basics.
- 6.2 aether-driver internals.
- 6.3 RISC-V firmware.
- 6.4 ISPP algorithm.
- 6.5 SDK layers.
- 6.6 PyTorch backend.
- 6.7 Compiler.
- 6.8 Digital twin.
- 6.9 Test/calibration.
- 6.10 Production-stack lab (this).
Module 6 message: the SIDRA chip’s usefulness depends on the software stack. Five layers co-designed, developed, and tested.
Vision: Hardware + Software Together
Module 6 is software. Module 7 is manufacturing: how the chip is actually made?
- Cleanroom: UNAM, workshop, mini-fab.
- Wafer flow: TSMC 28 nm CMOS + UNAM BEOL memristor.
- Packaging: ASE, Amkor (Taiwan).
- Test: SIDRA workshop.
Hardware + software co-design: SIDRA’s holistic approach. Module 7 completes the last side.
For Türkiye: software is Türkiye’s strength. Hardware infrastructure is investment-driven. Together = Türkiye AI leader.
Further Reading
- Next module: 🚧 7.1 · Cleanrooms and ISO Classes — Coming soon
- Previous: 6.9 — Test, Calibration, Verification
- Module 5 review: 5.15 — Thermal and Packaging Deep Dive.
- Module 4 review: 4.8 — Linear Algebra Lab.
- Production software: Continuous deployment, GitHub Actions documentation.