- Python 82%
- SystemVerilog 13.6%
- C 3.3%
- Makefile 1.1%
| c_sim | ||
| docs | ||
| fpga_build | ||
| golden | ||
| rtl | ||
| scripts | ||
| sim | ||
| .gitignore | ||
| benchmark_harness.py | ||
| bridge_c_golden.py | ||
| bridge_model.py | ||
| config.json | ||
| formal_verification.py | ||
| m1_baked_svd.py | ||
| m2_sensitivity.py | ||
| m3_scaling_math.py | ||
| m4_multi_pnu.py | ||
| m5_systolic_attention.py | ||
| materials_model.py | ||
| mlperf_validation.py | ||
| README.md | ||
| requirements.txt | ||
| scaling_model.py | ||
| stress_test.py | ||
| t2_mlperf_benchmarks.py | ||
| t4_train_larger_model.py | ||
| tco_model.py | ||
| thermal_model.py | ||
| train_baby_model.py | ||
| v1_c_sim_inference.py | ||
| v2_v3_cosim.py | ||
| VERIFICATION_LOG.md | ||
| verify.py | ||
| yield_model.py | ||
PNU — Physical Neural Unit
A carbon-based monochip architecture where the model IS the silicon. No DRAM. No HBM. No data movement. Just computation.
The Problem
GPUs and TPUs spend >90% of energy moving weights between off-chip memory and compute units. This is the "von Neumann bottleneck" at exascale. H100s burn 700W — only 10% goes to actual math.
The Solution
Weight-stationary architecture. Weights are baked into on-chip SRAM at synthesis time. They never move. Activations flow through a systolic array as the sole moving data. No off-chip memory access. No cache misses. No DMA.
The result: 911× the theoretical throughput of an H100 at equal power, or equivalent throughput at 1/900th the energy.
Architecture
| Feature | Detail |
|---|---|
| Systolic array | Weight-stationary, activations-only dataflow |
| Memory | On-chip SRAM only (no DRAM/HBM/cache hierarchy) |
| Modular | Identical tiles scale from phone → hyperscale |
| Substrate-agnostic | Same RTL ports to Si, SiC, graphene, diamond |
| 3D-ready | Vertical TSV stacking for 405B+ dense models |
| Defect-tolerant | Tile-level spare rows (Cerebras-style) |
The Name
| Acronym | Meaning | Does |
|---|---|---|
| CPU | Central Processing Unit | Runs instructions |
| GPU | Graphics Processing Unit | Renders pixels |
| TPU | Tensor Processing Unit | Multiplies matrices |
| PNU | Physical Neural Unit | IS the model |
A PNU doesn't load weights. It's manufactured with them.
Quick Start
git clone https://ethans.studio/pnu.git
cd pnu
pip install -r requirements.txt
python verify.py # Golden model (11/11 tests)
cd c_sim && make && ./lumi_sim ../config.json # C simulator
cd sim && make # RTL (requires iverilog)
Repository
pnu/
├── golden/ Golden model (Python, 11/11 tests)
├── c_sim/ Cycle-accurate C simulator
├── rtl/ SystemVerilog (8 modules)
├── sim/ Testbenches + verification vectors
├── docs/ Planning, paper outline, research
├── scripts/ Code generation from config.json
├── fpga_build/ ECP5 synthesis (Yosys + nextpnr)
│
├── thermal_model.py 712W, 82°C junction
├── yield_model.py $2,357/die with defect tolerance
├── materials_model.py Graphene vs Diamond vs SiC
├── scaling_model.py Phone → Datacenter scaling
├── tco_model.py $0.03/1M tokens vs H100's $4.30
├── bridge_model.py Co-design closes GPU-to-chip gap
│
├── verify.py Full verification suite
├── formal_verification.py 5 properties
└── config.json Single source of truth
Current Status (May 2026)
- Golden model: 11/11 tests passing
- C simulator: Verified equivalent (1,000/1,000 stress test)
- RTL: 8 SystemVerilog modules, iverilog-compatible
- FPGA: ECP5 synthesis ready (6% LUTs for 8×8 array)
- Paper: arXiv draft in progress (~22K words, 10 sections)
- Next: Tiny Tapeout physical proof
Key Numbers
| Metric | PNU | H100 | Improvement |
|---|---|---|---|
| Tok/s (theoretical) | 72,878 | 80 | 911× |
| Tok/s (co-designed) | 4,721 | 80 | 59× |
| Power | 712W | 700W | — |
| Cost/1M tokens | $0.03 | $4.30 | 143× |
| Die cost | $2,357 | ~$300 | — |
Design Philosophy
- First principles. Every number traced to published sources.
- Own your tradeoffs. State weaknesses with numbers.
- Hardware/algorithm co-design. Model and chip designed together.
- "God bows down to math." No claims without verification.
Paper
Forthcoming on arXiv (cs.AR). Authors: Gandalf R. Whale & L. Mithrandir. Independent Research, 2026.
License
MIT