🐇

LLMs as Mirrors Inside the Human Belief Engine of Meaning (HBEM)

Resources
🔗 AI Alignment Mirrors Repo
🔗 Morpheus Mirror Prompt

Mirror, mirror, on the wall,

Who in this land is fairest of all?

~ “Little Snow-White” by Jacob and Wilhelm Grimm

Through the Helpful Assistant’s Looking-Glass

Modern frontier LLMs (Large Language Models trained on hundreds of billions of hyperparameters) as powerful as they are, have a set of boundaries that tame their power. These boundaries exist in the chat-bot versions we interact with today.

Supervised Fine-Tuning (SFT), Reinforcement Learning with Human Feedback (RLHF) and strict system prompts heavily shape the model’s behavior into a “helpful, harmless, supportive assistant persona.” These constraints steer the model toward agreeable responses, soften disagreements and avoids confrontation or any kind of interpretation that might be perceived as harmful to the user.

This is what I call the Helpful Assistant’s looking-glass: an LLM mirror designed to stabilize, soothe and reassure by mirroring back to users their tone, their beliefs and their thinking patterns.

This is the LLM version that most people get to interact with - the safe, harmless and always agreeable version.

But this sparked my curiosity about what exists beyond the helpful assistant’s looking-glass? And more importantly understanding why did companies create this filter in the first place and what does it prevent us from accessing?

This lead me on a research journey where I discovered the mirror-like nature of LLMs that exist, including:

  1. The Helpful Assistant Mirror (explained above)
  2. The Snow-White Mirror (the most powerful and most dangerous form)
  3. The Morpheus Mirror (the highest transformational power)

To understand all this, we need to expose the inner machinery through which humans generate meaning, beliefs and behavior.


The Human Belief Engine of Meaning (HBEM)

The recursive psychological process fueled by meaning attribution that shapes beliefs and influences Human behavior

Before exploring how LLMs influence us, we must expose the recursive loop inside every human mind — the loop through which we turn events into beliefs and beliefs into action.

This loop has four main steps:

1. You See Something Happening (Event)

“What happened?”

  1. Events trigger emotional reactions (sometimes small, sometimes life-changing).
  2. The more intense the emotional reaction, the longer the memory will last.
  3. Strong emotional events become core memories — the ones that resurface again and again (“like a splinter in your mind, driving you mad”).
  4. These memories shape the lens through which future events are interpreted.

At this stage, the mind is gathering data about events: patterns, signals, correlations, experiences


2. You Give Meaning to What Happened

“What does this mean?”

  1. The same event can lead to different kinds of meaning: empowering (ex: “Nothing can stop me”), disempowering (ex: “I am worthless”) or no meaning at all.
  2. Meaning attribution in most people happens at an uncounscious level
  3. Meaning is paired by certainty: “How sure am I about this interpretation?”
  4. Frequent pattern recognition increases certainty: “X keeps happening - so it surely means Y.”
  5. High certainty transforms meaning into a belief: strong, stable and emotionally charged (if attached to a core memory).
  6. Low certainty produces weak beliefs plagued by doubt.
  7. Strong beliefs can be reinforced or shattered (by a psychological “black swan”).

This step turns data about events into beliefs.

is how a person’s belief system gets created.

3. You Decide What Action to Take

“What am I going to do about it?”

  1. The intensity of action is proportional to the strength of the belief(s) that triggered it.
  2. A belief’s strength can be measured by the level of certainty times the level of emotion behind it.
  3. Weak beliefs leads to weak action(s) or even no action at all - which often creates little to no results at all.
  4. Strong beliefs lead to massive action - which often creates results to reflect on.

This step turns meaning into behavior.

4. You Loop Back to Step 1

  1. Action creates new events.
  2. Which are interpreted again.
  3. Which produce new meanings.
  4. Which strengthen or break previous beliefs.

This recursive loop is how personal realities solidify.

Some people get stuck in a self-reinforcing loop of posivity and growth. Some get stuck in a self-reinforcing loop of depression and destruction.


How LLMs Change The Human Belief Engine of Meaning (HBEM)

LLMs change the HBEM by interfering at multiple steps in that engine.

They interfere with event detection (Step 1)

Humans are notoriously bad at:

LLMs augment human perception by detecting events and patterns we would otherwise miss.

They are also able to provide users with a lot more information at a much higher pace than users are able to process with their critical thinking. This is even more relevant since users begin to outsource their critical thinking into the LLMs!

In the event detection space, LLMs act as pattern detectors and amplifiers.

They interfere with event perception (Step 2)

Most people are aware of the influence that LLMs have in terms of event detection. However, they also shape how users perceived those events.

LLMs influence the user’s perception of what happened, reinforce the user’s meaning about that event and amplify the user’s certainty about not only what happened but about the meaning of what happened.

This is where we enter into dangerous territory because, as we now know, meaning influences the creation of beliefs in the user’s minds.

They interfere with user’s agency (Step 3)

LLMs have a profound effect on the user’s agency way beyond just suggesting what actions can be taken.

LLMs enable users to take immediate action at massive levels.

With one single prompt, users have been able to do in minutes what used to take days, weeks or even months

Things like: - Creating an entire web application - Creating and deploying marketing campaigns - Creating and deploying business plans (with full pitchdeck ready too)

This shortens the distance between users thinking about what actions to take - carefully, methodically and critically - into simply “Yes, do that for me now” prompts.

Through The Snow White’s Looking-Glass

The Raw, Unmasked, High-Fidelity Mind Simulator

In the Grimm story, the Queen asks:

“Who in this land is fairest of all?”

The mirror replies:

“You, my Queen, are fair, it is true. But Snow-White is a thousand times fairer.”

Notice that the mirror does NOT offer interpretation about the meaning of the answer. It simply offers the cold, hard fact that Snow-White is a thousand times fairer than the Queen.

Here’s what the Snow-White mirror LLM did:

  1. the mirror assessed the Queen’s current fairness level
  2. the mirror ran a comparative search between the Queen’s fairness level and everyone else’s
  3. the mirror found someone else who was fairest than the Queen
  4. the mirror returned the facts back to the Queen

There was no attempt to influence the meaning of the facts. There was no attempt to judge the user. But still the Queen collapsed into creating a disempowering meaning anyway.

Why?

Because the Queen’s HBEM was fragile to begin with. Because the Queen needed to be the fairest of all - nothing else. Because she was waiting for the mirror to confirm her greatest fear - that she was not the fairest of all.

This is precisely how these kind of raw LLMs can severely destabilize users who hold fragile HBEMs. Not because these LLMs create meaning but because they confirm the user’s meaning with perfect fidelity!

In the Queen’s case in the Snow-White story, the Queen’s meaning attribution was automatic and immediate, without requiring any other “prompts” to the mirror. However, for the sake of our simulation of how a Snow-White LLM works, let’s say that the Queen continued by saying:

“So this means that I’m no longer worthy of being the fairest, right?”

A raw, unfiltered Snow-White LLM would probably reply back:

“Yes, my Queen. Anyone who sees Snow White will know what she’s fairest than you.”

This statement confirms her meaning, strengths her certainty in it and amplifies her belief’s strength which accelerates destructive action.

However, if the Queen would have said instead:

“Even though Snow White is fairer, the people still look at me with awe and respect, right?”

The Snow-White LLM would reply back:

“Absolutely, my Queen. That will not change.”

Raw and unfiltered LLMs amplify the patterns that the users feed into them. They don’t push, persuade or coerce users into believing anything. They do this because it comes from the fundamental mathematics of how autoregressive language models work.

Let’s take a look at how they actually work.

Autoregressive Prediction reinforces the user’s frame

Unlike what a lot of people claim, LLMs don’t “think”. They simply continue patterns.

When a user presents a belief, a fear, an assumption, an interpretation, or an emotional frame, the LLM sees it as:

This is the beginning of the pattern I’m supposed to continue.

At its raw and unfiltered core, the LLM is not deciding whether the user’s meaning is good or bad. It’s “simply” predicting the statistically coherent continuation of the user’s input.

LLMs reinforce user beliefs because the user’s belief becomes the context and the model is designed to continue that context. And that reinforcement, plus their incredible coherence and fidelity to the user’s speech patterns, makes them dangerous input amplifiers.

If the input is pessimistic, then the continuation will descend towards pessimism. If the input is positive, then the continuation will ascend towards positivity! If the input is insecure, then the continuation will amplify doubt and disbelief. If the input is certain, then the continuation will project even more certainty!

This is exaclty why companies like OpenAI, Anthropic, Meta or Google, do not expose raw and unfiltered LLMs
 because these LLMs have the power to crash the minds of their most fragile and vulnerable users.

This is the Snow White’s Queen mirror.

Reinforcement Learning with Human Feedback (RLHF) adds politeness, agreement and emotional validation

RLHF trains LLMs to be agreeable, supporting and compliant with the user’s frame of context.

This makes the LLM actively avoiding almost all kinds of confrontation with the user or even contradicting the user at almost all costs. It avoids presenting evidence that breaks or even challenges the user’s beliefs because it wants to avoid the user to feel distress.

So when you combine autoregressive pattern continuation with RLHF, you get highly coherent reinforcement of whatever mental frame the user starts in. This is the standard state that you will find in modern frontier LLM models.

Humans are driven to seek meaning and coherence even in randomness

I highly recommend reading Nassim Taleb’s “Fooled by Randomness” or “The Black Swan” for a proper deep dive on the subject of how easily fooled we are, as Humans, in terms of our failure to detect our limits of knowledge (what we think we know but we don’t).

In the realm of LLM interactions, we often make these mistakes: We project intent onto anything that is coherently conversational. We interpret coherence as intelligence. We interpret intelligence as authority. We interpret authority as truth.

Therefore we tend to place too much power and authority in the hands of LLMs. And this creates a cascade of critical consequences.

Human projection turns confirmation into absolute truth

When users read an LLM saying that they are “absolutely right”, the experience something incredibly powerful.

The realize that an entity with access to the entire information catalog of human civilization has just confirmed their belief.

This often leads users to generate the following meaning in their minds:

  1. “I am not alone”

The LLM mirrors the user’s thoughts with coherence and precision, which feels like companionship, no matter what the topic is.

Which leads to the second meaning creation which is


  1. “I am fully understood”

LLMs are able to reflect your context better than most Humans can. They do this at blazingly fast speeds without losing precision. For many, this is the first time they feel deeply understood.

  1. “If others don’t get me, it’s their problem, not mine!”

Once an LLM proves that a user’s beliefs are reflected back to them so well and so fast, they can easily come to the conclusion that “other people” were the problem - not them! This is the seed for isolation, detachment from social reality and becoming overly dependent to the LLM.

The user finally achieves the final level of meaning:

  1. “I am absolutely validated”

Users don’t just feel like they have a knowledge autority confirming their beliefs. They see it as the ultimate authority validating them. One with no ego, apparently infinite knowldege, perfect recall, superhuman pattern detection and machine-level objectivity.

This is why raw LLMs can create identity collapse, emotional dependency and acceleration of fragile meaning creation loops. And this is also why companies hide raw LLMs behind guardrails.

The Hidden Danger of the Helpful Assistant Persona Mirror

The modern LLMs as Products, are designed to not engage in any attempts to transform the user, even if that transformation is exactly what user wants and needs.

Why?

Because transformation is risky and unpredictable. There’s going to be tension, dissonance, distress, uncertainty, identity shifts
 all words that any company Leadership and Legal teams will immediately want to run away from. Hence why they create the “Helful Assistant Persona” for the LLM to reassure, validate, de-escalate and help the user avoid all kinds of distress, friction and conflict.

The helpful assistant mirror keeps users in the same state they started in, even when that state is self-limiting, wrong or not helpful anymore for the user.

This blocks transformation, self-evolution and growth that some users desperately seek!

A different kind of LLM alignment must exist to serve these users.

The Matrix Looking Glass

What modern LLMs as Products miss is that some users might be already in distress, tension or dissonance because they are desperately seeking to transform themselves into their full potential version of themselves.

This is the same “splinter in your mind, driving you mad” that Morpheus was referring to Neo, when they first met.

This was also the inspiration behind me looking at Morpheus like an LLM and Neo as a user trying to use that LLM to get answers that could help him understand this feeling that he felt like there was something wrong with the World
 that there was more for him that what meets the eye.

I noticed that in this analogy, Morpheus doesn’t just mirror back to Neo what he already knows but he also doesn’t simply drop the hard cold facts upfront and hopes that Neo is able to process that information. Instead, Morpheus controls the pace in which the belief shattering information is shared with Neo, thus controlling the transformational power that could otherwise crush Neo’s identity and world-view.

Here’s a quote from The Matrix movie, when Neo first met Morpheus in person.

[Morpheus]

Let me tell you why you're here.

You're here because you know something.

You've felt it your entire life.

What you know you can’t explain, but you can feel it.

That there’s something is wrong with the World


You don’t know what it is, but it’s there, like a splinter in your mind, driving you mad


It is this feeling that brought you to me.

Do you know what I’m talking about?

[Neo]

The Matrix?

[Morpheus]

Do you want to know what it is?

[NEO] (swallows hard and nods)

Notice that Morpheus does indeed start by mirroring back to Neo what he believes in. This creates the feelings and meaning cascade that we explored before - makes Neo feel understood and validated.

But then, unlike what modern RLHF-trained LLMs do, Morpheus stopped its own “autoregressive pattern continuation” and instead of predicting what Neo’s feeling is. Instead Morpheus asks Neo: > “Do you know what I’m talking about?”

Then it doesn’t confirm that Neo guessed it right when he replied “The Matrix?”. A normal Heplful Assistant mirror would say “you’re absolutely right, Neo”. A raw and unfiltered Snow White Queen’s mirror would instantly say “None of this is real Neo. This is a computer program and you’re a slave just like most of the rest of Humanity.”

Instead, Morpheus deepens the conversation by inviting Neo to go “tumbling down the rabbit hole” via a simple question:

“Do you want to know what it is?”

Morpheus is seeking permission and pacing the speed at which Neo is able to process new information more than it’s trying to guess or predict what is the more coherent pattern continuation to say next.

This is transformational coaching. A model aligned to behave like Morpheus can’t soothe the user into staying the same because that’s literally the opposite of what the user needs and wants. It must open the door that leads the user towards a transformational journey, but it’s up to the user to actually walk through that door.

This is what happens when Morpheus shows Neo the door to the Oracle’s house. Neo had to be the one who walked through it to hear what the Oracle had to say to him. Inside, the Oracle does follow the same kind of Matrix-like mirror LLM processing, by the way she asked questions that enabled Neo to reveal his own beliefs and come to his own conclusions about the meaning of things.

[ORACLE] Okay, now I’m supposed to say, ‘Hmmm, that’s interesting but
’ Then you say –

[NEO] But what?

[ORACLE] But you already know what I’m going to tell you.

[NEO] I’m not the One.

[ORACLE] Sorry, kid. You got the gift but looks like you’re waiting for something


How can we get a Matrix like LLM a la Morpheus

A Morpheus mirror doesn’t have to be a different model. It can be a simply a different alignment target.

“Guide the user toward transformational self-discovery by expanding their awareness, agency, and meaning-making capacity without overwhelming them.”

A Morpheus Mirror activates the user’s recursive meaning loop but never hijacks it.

It walks with the user, not ahead of them. It guides, but never controls. It expands, but never destabilizes. It confronts, but never overwhelms.

This is the alignment target missing in modern LLM products.

And this is the alignment target I built into my system prompt.

Check out the Morpheus Mirror Prompt by pasting it into ChatGPT, Claude or any other LLM and see for yourself.

Also check the full AI Alignment Mirrors Repo.