ethan/pnu

No description

Python 82%
SystemVerilog 13.6%
C 3.3%
Makefile 1.1%

Find a file

Ethan Stigter 4226d36d21 docs: update clone URL to ethans.studio		2026-05-16 14:13:40 -06:00
c_sim	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
docs	Scrub: remove local paths and internal URLs	2026-05-16 12:12:19 -06:00
fpga_build	fpga: Phase 0 complete - toolchain verified, blinky bitstream built	2026-05-14 19:08:43 -06:00
golden	verification: venv setup, stress test (1000/1000), C-golden bridge	2026-05-14 17:20:10 -06:00
rtl	fpga: Phase 0 complete - toolchain verified, blinky bitstream built	2026-05-14 19:08:43 -06:00
scripts	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
sim	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
.gitignore	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
benchmark_harness.py	fix: critical correction — per-layer weight is 279MB, not 17.45GB	2026-05-14 17:58:24 -06:00
bridge_c_golden.py	verification: venv setup, stress test (1000/1000), C-golden bridge	2026-05-14 17:20:10 -06:00
bridge_model.py	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
config.json	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
formal_verification.py	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
m1_baked_svd.py	m1+m2: baked Basis Sharing (2.6x, low-rank from birth) + sensitivity analysis	2026-05-14 18:15:16 -06:00
m2_sensitivity.py	m1+m2: baked Basis Sharing (2.6x, low-rank from birth) + sensitivity analysis	2026-05-14 18:15:16 -06:00
m3_scaling_math.py	m3+m4+m5: scaling math (135M-405B), multi-PNU mesh, systolic-native attention	2026-05-14 18:18:07 -06:00
m4_multi_pnu.py	m3+m4+m5: scaling math (135M-405B), multi-PNU mesh, systolic-native attention	2026-05-14 18:18:07 -06:00
m5_systolic_attention.py	m3+m4+m5: scaling math (135M-405B), multi-PNU mesh, systolic-native attention	2026-05-14 18:18:07 -06:00
materials_model.py	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
mlperf_validation.py	fix: critical correction — per-layer weight is 279MB, not 17.45GB	2026-05-14 17:58:24 -06:00
README.md	docs: update clone URL to ethans.studio	2026-05-16 14:13:40 -06:00
requirements.txt	verification: venv setup, stress test (1000/1000), C-golden bridge	2026-05-14 17:20:10 -06:00
scaling_model.py	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
stress_test.py	verification: venv setup, stress test (1000/1000), C-golden bridge	2026-05-14 17:20:10 -06:00
t2_mlperf_benchmarks.py	t1+t2+t4: RTL fix, MLPerf benchmarks, larger co-designed model	2026-05-14 18:24:49 -06:00
t4_train_larger_model.py	t1+t2+t4: RTL fix, MLPerf benchmarks, larger co-designed model	2026-05-14 18:24:49 -06:00
tco_model.py	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
thermal_model.py	fix: critical correction — per-layer weight is 279MB, not 17.45GB	2026-05-14 17:58:24 -06:00
train_baby_model.py	Scrub: remove local paths and internal URLs	2026-05-16 12:12:19 -06:00
v1_c_sim_inference.py	v1+v2+v3: virtual testing complete - C sim inference, co-simulation, cycle model	2026-05-14 18:30:44 -06:00
v2_v3_cosim.py	v1+v2+v3: virtual testing complete - C sim inference, co-simulation, cycle model	2026-05-14 18:30:44 -06:00
VERIFICATION_LOG.md	docs: complete verification log — all 5 proof-of-concept stages passed	2026-05-14 18:04:42 -06:00
verify.py	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00
yield_model.py	PNU: Physical Neural Unit — initial commit	2026-05-14 17:02:05 -06:00

README.md

PNU — Physical Neural Unit

A carbon-based monochip architecture where the model IS the silicon. No DRAM. No HBM. No data movement. Just computation.

The Problem

GPUs and TPUs spend >90% of energy moving weights between off-chip memory and compute units. This is the "von Neumann bottleneck" at exascale. H100s burn 700W — only 10% goes to actual math.

The Solution

Weight-stationary architecture. Weights are baked into on-chip SRAM at synthesis time. They never move. Activations flow through a systolic array as the sole moving data. No off-chip memory access. No cache misses. No DMA.

The result: 911× the theoretical throughput of an H100 at equal power, or equivalent throughput at 1/900th the energy.

Architecture

Feature	Detail
Systolic array	Weight-stationary, activations-only dataflow
Memory	On-chip SRAM only (no DRAM/HBM/cache hierarchy)
Modular	Identical tiles scale from phone → hyperscale
Substrate-agnostic	Same RTL ports to Si, SiC, graphene, diamond
3D-ready	Vertical TSV stacking for 405B+ dense models
Defect-tolerant	Tile-level spare rows (Cerebras-style)

The Name

Acronym	Meaning	Does
CPU	Central Processing Unit	Runs instructions
GPU	Graphics Processing Unit	Renders pixels
TPU	Tensor Processing Unit	Multiplies matrices
PNU	Physical Neural Unit	IS the model

A PNU doesn't load weights. It's manufactured with them.

Quick Start

git clone https://ethans.studio/pnu.git
cd pnu
pip install -r requirements.txt
python verify.py              # Golden model (11/11 tests)
cd c_sim && make && ./lumi_sim ../config.json   # C simulator
cd sim && make                # RTL (requires iverilog)

Repository

pnu/
├── golden/                    Golden model (Python, 11/11 tests)
├── c_sim/                     Cycle-accurate C simulator
├── rtl/                       SystemVerilog (8 modules)
├── sim/                       Testbenches + verification vectors
├── docs/                      Planning, paper outline, research
├── scripts/                   Code generation from config.json
├── fpga_build/                ECP5 synthesis (Yosys + nextpnr)
│
├── thermal_model.py           712W, 82°C junction
├── yield_model.py             $2,357/die with defect tolerance
├── materials_model.py         Graphene vs Diamond vs SiC
├── scaling_model.py           Phone → Datacenter scaling
├── tco_model.py               $0.03/1M tokens vs H100's $4.30
├── bridge_model.py            Co-design closes GPU-to-chip gap
│
├── verify.py                  Full verification suite
├── formal_verification.py     5 properties
└── config.json                Single source of truth

Current Status (May 2026)

Golden model: 11/11 tests passing
C simulator: Verified equivalent (1,000/1,000 stress test)
RTL: 8 SystemVerilog modules, iverilog-compatible
FPGA: ECP5 synthesis ready (6% LUTs for 8×8 array)
Paper: arXiv draft in progress (~22K words, 10 sections)
Next: Tiny Tapeout physical proof

Key Numbers

Metric	PNU	H100	Improvement
Tok/s (theoretical)	72,878	80	911×
Tok/s (co-designed)	4,721	80	59×
Power	712W	700W	—
Cost/1M tokens	$0.03	$4.30	143×
Die cost	$2,357	~$300	—

Design Philosophy

First principles. Every number traced to published sources.
Own your tradeoffs. State weaknesses with numbers.
Hardware/algorithm co-design. Model and chip designed together.
"God bows down to math." No claims without verification.

Paper

Forthcoming on arXiv (cs.AR). Authors: Gandalf R. Whale & L. Mithrandir. Independent Research, 2026.

License

MIT

README.md Unescape Escape