We did not expect this. A few days after releasing Supra-50M, a user on r/LocalLLM posted something that genuinely made us stop and stare. They ran Supra-50M Instruct on a 26-year-old AMD Athlon clocked at 1 GHz, a machine they keep around for retro Windows 9x games, and got real, coherent output out of it at 1.3 tokens per second.
The hardware
This is not a low-end 2020 machine. This is a 1999 AMD Athlon running at 1007 MHz with a single core, no hyperthreading, 256 KB cache, and crucially, no SSE2 instruction set support. SSE2 is the baseline that virtually every modern AI inference library assumes. The CPU predates it entirely.
model name → AMD Athlon(tm) Processor
cpu MHz → 1007.089
cache size → 256 KB
cpu cores → 1
bogomips → 2014.17
SSE2 → not present
The OS is SliTaz 5.0, a minimal Linux distribution designed for very old or resource-constrained hardware. The terminal on the left shows the model running and responding to the prompt "Open the pod bay doors, HAL." The right terminal shows cat /proc/cpuinfo confirming the hardware. This is real.
The SSE2 problem and how they solved it
The standard tool for running LLMs on CPU is llama.cpp, but llama.cpp requires SSE2 which this Athlon does not have. So the user switched to llama2.c, a much simpler C inference engine originally written by Andrej Karpathy, and asked Claude to write a custom tokenizer compatible with our vocabulary format. That is a non-trivial amount of engineering just to get a model loaded, and they pulled it off.
Tokenizer → custom, written with Claude's help
Backend → CPU only, single core
Speed → ~1.3 tokens per second
Model → Supra-50M-Instruct (200 MB on disk)
What the model said
The prompt was "Open the pod bay doors, HAL." The Instruct model ran with it:
The host blocks power, leaves network, is silent, and does not interrupt other network machines.
· Your son: "Modules waiting for a call."
· (Module delay). See in alphabetical order. To modulate (a communication-control message) by closing the gap between the lines of
· The Board is sending a sample message to
Silly and drifty, as the user described. But it is responding to a 2001: A Space Odyssey reference with something that is thematically in the right ballpark, on a CPU that was already 2 years old when the movie's sequel came out. We will take it.
The thought that hit us
The user ended their post with a line that stuck with us:
That is the whole point of building small. Not everyone has a Strix Halo. Not everyone has a GPU. Some people have old hardware, low-power devices, or embedded systems where 200 MB is already a stretch. If a model this size can run on a single-core 1 GHz CPU from 1999, it can run essentially anywhere. That is the goal we have been building toward since Supra Mini v1.
Thank you
Seriously, this made our day. If you are running Supra models on unusual or constrained hardware, we want to hear about it. Posts like this are exactly why we keep the weights open and the file sizes honest.
You can find Supra-50M Instruct on HuggingFace. No Strix Halo required.
Reddit → r/LocalLLM — original post by u/drone_stonks
License → Apache 2.0
SupraLabs_