rhygpu.dev

Mnemosyne · 003

Hiding the Machine

Mock provider, encoded hidden state, and the first real model test.

The first app problem was output control.

The model could write a scene, but the app had to catch the machine part before the player saw it.

That was the job here: visible narration for the player, hidden state for the engine.

The old crude code block was useful because I could inspect it. I could see trust move, fear move, a memory get flagged, or the scene state change. But if that block stayed visible after every message, the RP experience was dead.

No raw JSON under the dialogue. No trust deltas sitting in the scene. No memory tags breaking the mood. No system residue reminding the player that the character is being calculated.

The player sees the story. The engine reads the state.

The desktop app started becoming more than a prompt experiment.

Tauri gave the project a body: a window, a chat interface, Rust behind it, React in front of it, local Soul files, provider settings, and a place for the state loop to run.

The output loop started taking shape:

User Message
        ↓
Compiled Context
        ↓
Model Response
        ↓
Visible Narration + Hidden State
        ↓
Strip / Parse
        ↓
Soul / World Update
        ↓
Save

That was the first version of the app actually breathing.

The mock provider was not real RP testing.

It was scaffolding.

It proved that the UI could send a turn, the backend could return a response, the response could include hidden state, and the app could update the chat. That mattered. But it was not testing narration, psychology, or model behavior.

The mock did not surprise me. It did not drift. It did not forget the format. It did not leak state because a model got confused. It returned what the code told it to return.

Useful for the pipe. Useless as proof that the RP experience worked.

The useful failures came from actual LLMs.

I was already using OpenRouter models for AI RP testing, so using free OpenRouter models inside the app was the obvious next step. I needed real generations to see whether the system survived contact with a model that could misunderstand, improvise, ignore formatting, or write something good for the wrong reason.

The first real test felt good in a way the mock never could.

A real model answered. The chat moved. The scene had atmosphere. The narration had texture. It was not finished, but the product stopped feeling imaginary for a moment.

Then the warning showed up.

The output was nice, but it was also wrong in a dangerous way: the model wanted to talk like the character instead of staying as the narrator.

That mattered because Mnemosyne is not supposed to be another character impersonator. The narrator should describe the character. The Soul and World Log should carry continuity underneath. If the model collapses into being the character, knowledge boundaries collapse with it.

The first real test proved two things at once:

It could feel good.
Feeling good was not enough.

Plain hidden JSON was easy to inspect, but too fragile for the actual path.

It could leak into the visible response. It could get wrapped in prose. It could be malformed by the model. It also made the boundary between narration and machinery feel weaker than it should.

So the hidden state moved toward an encoded payload.

The app needed a recognizable marker, a compact body, and a parser that could support the new format without destroying older test transcripts.

That is where the mne1.<base64url> style payload started to matter.

The narration is for the player. The encoded hidden state is for the engine.

Once the state became hidden, debugging got harder.

If the Soul updated wrong, I needed to know why. If parsing failed, I needed to know where. If a memory got added, scored, discarded, or consolidated, I needed a place to inspect the cycle without dumping the machinery into the player's chat.

That made the turn debug panel and memory cycle diagnostics necessary.

Not as polish.

As survival.

The player should not see the machinery. The developer absolutely needs to.

This pass made the first hard boundary visible.

The model should write and propose.

The app should catch, parse, validate, and manage.

At this stage I was still asking one model response to do too much: write the scene, stay in narrator mode, respect user agency, output hidden state, flag memory, judge importance, update relationship values, and remember to forget.

That burden was already showing cracks.

But the app now had somewhere to put the cracks. The mock gave the pipe a shape. Real models gave it failure cases. Encoded hidden state gave the engine something safer to parse. Diagnostics gave me a way to see behind the curtain.

That was enough to move forward.

Covered commits

This is the first entry with direct commit coverage. The earlier entries cover origin and design history before the implementation-heavy phase.

  1. f26cbfe Add AGPL-3.0 license
  2. a642953 Scaffold Tauri desktop client
  3. 1fe8132 Initial commit
  4. 35bf8bf Declare AGPL package license
  5. 303d4f7 Merge GitHub initial repository state
  6. b0697fa Wire mock provider turn flow
  7. aee7972 Encode mock hidden state
  8. 9c242ea Add local delete controls
  9. 505f75b Align prototype with narrator architecture
  10. c9fe40d Fix native Tauri build setup
  11. b6fb370 Add mock turn acceptance coverage
  12. 365d105 Surface memory cycle diagnostics
  13. 1f6ed4b Document future settings architecture
  14. 53a12d1 Separate setting controls from Soul identity
  15. c526a87 Add Setting Soul lifecycle
  16. 802e588 Align API hidden state prompt
  17. 21f5efd Add turn debug panel
  18. 6f07c51 Improve chat workspace controls

Next: 004, feeding the model a session packet so the input side of the loop can start carrying Soul, World, relationship pressure, and clean recent chat.