In the AI ​​era, which often requires hundreds of GB of video memory and large models with trillions of parameters, a developer actually made the 8-bit Z80 processor, which came out in 1976, learn AI chatting.Developer HarryR created a project called Z80-μLM, which successfully ran conversational AI on an antique Zilog Z80 with only 64KB of memory and no floating-point computing power, and could even play a 20-question guessing game.

HarryR compresses the entire inference engine, model weights, and chat interaction interface into a file of only about 40KB, and runs perfectly on antique hardware with only 64KB RAM.

In order to achieve this impossible task, it uses extremely hard-core optimization. He gave up floating point operations and switched to the Z80's native 16-bit integer operations.

At the same time, 2-bit weight vectorization technology is also applied to compress each weight to between {-2, -1, 0, +1}, and each byte can pack and store 4 weights.

The current project provides two examples. The Tinychat robot will respond to greetings and questions in a minimalist style. For example, OK means neutral confirmation, WHY? The representative questioned the premise, MAYBE expressed uncertainty, etc.; the other Guess is a 20-question guessing game, and the AI ​​will keep a secret for the user to decipher.

HarryR admitted that there is no way this system can pass the Turing test, but its value lies in exploring the lower limit of AI. The developers deliberately designed ambiguous responses to force humans to detect the true understanding of AI through contextual inference or yes/no questions.