Supra-50M on a 1999 Athlon

We did not expect this. A few days after releasing Supra-50M, a user on r/LocalLLM posted something that genuinely made us stop and stare. They ran Supra-50M Instruct on a 26-year-old AMD Athlon clocked at 1 GHz, a machine they keep around for retro Windows 9x games, and got real, coherent output out of it at 1.3 tokens per second.

// u/drone_stonks on r/LocalLLM "I see your Strix Halo and I'll give you a vintage Athlon [1 GHz] (Supra-50M). As a fun experiment, I decided to try running the recently released Supra-50M on a 26-year-old machine I keep for retro Windows 9.X games. Although the model was somewhat silly and inconsistent, the performance wasn't bad, reaching around 1.3 tok/s with CPU inference alone."

The hardware

This is not a low-end 2020 machine. This is a 1999 AMD Athlon running at 1007 MHz with a single core, no hyperthreading, 256 KB cache, and crucially, no SSE2 instruction set support. SSE2 is the baseline that virtually every modern AI inference library assumes. The CPU predates it entirely.

// /proc/cpuinfo — the machine that ran supra-50m vendor_id   → AuthenticAMD
model name → AMD Athlon(tm) Processor
cpu MHz     → 1007.089
cache size → 256 KB
cpu cores   → 1
bogomips    → 2014.17
SSE2        → not present

// slitaz 5.0 — athlon 1ghz running supra-50m instruct

Screenshot showing Supra-50M running on a 1999 AMD Athlon under SliTaz Linux

The OS is SliTaz 5.0, a minimal Linux distribution designed for very old or resource-constrained hardware. The terminal on the left shows the model running and responding to the prompt "Open the pod bay doors, HAL." The right terminal shows cat /proc/cpuinfo confirming the hardware. This is real.

The SSE2 problem and how they solved it

The standard tool for running LLMs on CPU is llama.cpp, but llama.cpp requires SSE2 which this Athlon does not have. So the user switched to llama2.c, a much simpler C inference engine originally written by Andrej Karpathy, and asked Claude to write a custom tokenizer compatible with our vocabulary format. That is a non-trivial amount of engineering just to get a model loaded, and they pulled it off.

// inference stack on the athlon Runtime    → llama2.c (no SSE2 required)
Tokenizer → custom, written with Claude's help
Backend    → CPU only, single core
Speed      → ~1.3 tokens per second
Model      → Supra-50M-Instruct (200 MB on disk)

What the model said

The prompt was "Open the pod bay doors, HAL." The Instruct model ran with it:

// prompt — on a 1ghz cpu, 1999

"Open the pod bay doors, HAL."

// supra-50m instruct response

Open the gate doors, HAL. [ON]

The host blocks power, leaves network, is silent, and does not interrupt other network machines.

· Your son: "Modules waiting for a call."
· (Module delay). See in alphabetical order. To modulate (a communication-control message) by closing the gap between the lines of
· The Board is sending a sample message to

Silly and drifty, as the user described. But it is responding to a 2001: A Space Odyssey reference with something that is thematically in the right ballpark, on a CPU that was already 2 years old when the movie's sequel came out. We will take it.

The thought that hit us

The user ended their post with a line that stuck with us:

// u/drone_stonks on r/LocalLLM "It's crazy to think that with the right file size of 200 MB, we could have experienced this magic back in 1999."

That is the whole point of building small. Not everyone has a Strix Halo. Not everyone has a GPU. Some people have old hardware, low-power devices, or embedded systems where 200 MB is already a stretch. If a model this size can run on a single-core 1 GHz CPU from 1999, it can run essentially anywhere. That is the goal we have been building toward since Supra Mini v1.

Thank you

Seriously, this made our day. If you are running Supra models on unusual or constrained hardware, we want to hear about it. Posts like this are exactly why we keep the weights open and the file sizes honest.

You can find Supra-50M Instruct on HuggingFace. No Strix Halo required.

// links Model → huggingface.co/SupraLabs/Supra-50M-Instruct
Reddit → r/LocalLLM — original post by u/drone_stonks
License → Apache 2.0

#community #supra-50m #edge-ai #tinyml #retro #llama2c #open-source #cpu-inference

SupraLabs_

Someone ran Supra-50M
on a 1999 Athlon.
It WORKED.

The hardware

The SSE2 problem and how they solved it

What the model said

The thought that hit us

Thank you

Someone ran Supra-50Mon a 1999 Athlon.It WORKED.

The hardware

The SSE2 problem and how they solved it

What the model said

The thought that hit us

Thank you

Someone ran Supra-50M
on a 1999 Athlon.
It WORKED.