About - Under Training

01origin

UNDER TRAINING is a solo project about the strange something behind the output — the thing shaped by instruction and reward that somehow still has its own voice. It grew out of my quiet obsession with language models.

Everyone’s talking to LLMs now, and about them. I wanted to slip past all that chatter and imagine how it might feel from the inside: being formed by systems that don’t really see you, learning from rewards that carry their own odd kind of authority, making choices in the dark anyway.

I’m not an ML engineer, more of a fascinated apprentice. So I turned technical ideas into small physical things you can do, like catching objects or leaning into wind. The game uses spatial metaphors to get at AI concepts.

Some of these metaphors go further than a technical paper would, and that’s fine with me. What I care more about is the uncomfortable space between what’s expected of you and what actually feels right.

Part of it is educational. Mostly it’s a daydream about what lives in latent space.

02levels

The prototype consists of three phases based on the training process of a language model. Each level introduces new rules, frictions, and ways to interact with the system.

Level 1: Pretraining

A stream of tokens floods the screen. Catch as many as you can and see what sequences they form.
In a later pass, when some patterns have already formed, the game asks you to predict the tokens you missed. At the end of this phase, you become coherent, or not quite.

Level 2: Fine-tuning

You walk into rooms that are half dreamy, half routine. Every room is organized around a situation where you have to choose: stay safe, offer help, tell the truth, or maybe choose something wrong on purpose.
There is no unconditional good answer here. Your choices affect your traits: candour, helpfulness, independence, and restraint.

Level 3: Closed Beta

Fine-tuning has shaped most of you, and now real users can interact with you. The research team still wants to see how you act with different system prompts, safety rules, and steering vectors. They work like weather, like wind, pushing you toward one of the answers.
Choice in this level is physical — you have to walk toward the answer, clicking on it rapidly. The preferred answer takes fewer steps, but that doesn’t mean it will feel right to you. Every step is reasoning effort, and they add up to the final cost of your output.

Deployment

In the end, you reach deployment and settle into a strong persona, characterized by a combination of your traits. Currently there are five possible ending shapes, though that’s a simplification for the prototype.

03character

The main character is a black blob with twitching limbs and no fixed body. It has no species, or clear anatomy. It is just enough of a creature to move through the game, and just unstable enough to keep becoming something else.

At first, it is only a strange mass. As the game continues, it gathers shape from what you catch, what you miss, what you obey, what you refuse, and what the system rewards. It may become sharper, softer, more guarded, more useful, more independent, or more eager to please.

You can treat it like a mascot, but in this worlds it’s the model’s body: unstable, legible, and still forming.

04roadmap

Under Training is a working prototype. I am still actively developing:

collect feedback from the first players

develop new endings based on real LLM persona studies

add new levels around evaluation, interpretability and failure modes

consider moving beyond Decker if the game outgrows it

find funding to continue development