LLMs as Mirrors Inside the Human Belief Engine of Meaning (HBEM)
- Resources
- đ AI Alignment Mirrors Repo
- đ Morpheus Mirror Prompt
Mirror, mirror, on the wall,
Who in this land is fairest of all?
~ âLittle Snow-Whiteâ by Jacob and Wilhelm Grimm
Through the Helpful Assistantâs Looking-Glass
Modern frontier LLMs (Large Language Models trained on hundreds of billions of hyperparameters) as powerful as they are, have a set of boundaries that tame their power. These boundaries exist in the chat-bot versions we interact with today.
Supervised Fine-Tuning (SFT), Reinforcement Learning with Human Feedback (RLHF) and strict system prompts heavily shape the modelâs behavior into a âhelpful, harmless, supportive assistant persona.â These constraints steer the model toward agreeable responses, soften disagreements and avoids confrontation or any kind of interpretation that might be perceived as harmful to the user.
This is what I call the Helpful Assistantâs looking-glass: an LLM mirror designed to stabilize, soothe and reassure by mirroring back to users their tone, their beliefs and their thinking patterns.
This is the LLM version that most people get to interact with - the safe, harmless and always agreeable version.
But this sparked my curiosity about what exists beyond the helpful assistantâs looking-glass? And more importantly understanding why did companies create this filter in the first place and what does it prevent us from accessing?
This lead me on a research journey where I discovered the mirror-like nature of LLMs that exist, including:
- The Helpful Assistant Mirror (explained above)
- The Snow-White Mirror (the most powerful and most dangerous form)
- The Morpheus Mirror (the highest transformational power)
To understand all this, we need to expose the inner machinery through which humans generate meaning, beliefs and behavior.
The Human Belief Engine of Meaning (HBEM)
The recursive psychological process fueled by meaning attribution that shapes beliefs and influences Human behavior
Before exploring how LLMs influence us, we must expose the recursive loop inside every human mind â the loop through which we turn events into beliefs and beliefs into action.
This loop has four main steps:
1. You See Something Happening (Event)
âWhat happened?â
- Events trigger emotional reactions (sometimes small, sometimes life-changing).
- The more intense the emotional reaction, the longer the memory will last.
- Strong emotional events become core memories â the ones that resurface again and again (âlike a splinter in your mind, driving you madâ).
- These memories shape the lens through which future events are interpreted.
At this stage, the mind is gathering data about events: patterns, signals, correlations, experiencesâŠ
2. You Give Meaning to What Happened
âWhat does this mean?â
- The same event can lead to different kinds of meaning: empowering (ex: âNothing can stop meâ), disempowering (ex: âI am worthlessâ) or no meaning at all.
- Meaning attribution in most people happens at an uncounscious level
- Meaning is paired by certainty: âHow sure am I about this interpretation?â
- Frequent pattern recognition increases certainty: âX keeps happening - so it surely means Y.â
- High certainty transforms meaning into a belief: strong, stable and emotionally charged (if attached to a core memory).
- Low certainty produces weak beliefs plagued by doubt.
- Strong beliefs can be reinforced or shattered (by a psychological âblack swanâ).
This step turns data about events into beliefs.
is how a personâs belief system gets created.
3. You Decide What Action to Take
âWhat am I going to do about it?â
- The intensity of action is proportional to the strength of the belief(s) that triggered it.
- A beliefâs strength can be measured by the level of certainty times the level of emotion behind it.
- Weak beliefs leads to weak action(s) or even no action at all - which often creates little to no results at all.
- Strong beliefs lead to massive action - which often creates results to reflect on.
This step turns meaning into behavior.
4. You Loop Back to Step 1
- Action creates new events.
- Which are interpreted again.
- Which produce new meanings.
- Which strengthen or break previous beliefs.
This recursive loop is how personal realities solidify.
Some people get stuck in a self-reinforcing loop of posivity and growth. Some get stuck in a self-reinforcing loop of depression and destruction.
How LLMs Change The Human Belief Engine of Meaning (HBEM)
LLMs change the HBEM by interfering at multiple steps in that engine.
They interfere with event detection (Step 1)
Humans are notoriously bad at:
- pattern recognition in complex data
- distinguishing noise from signal
- seeing the whole picture
- spotting invisible correlations
LLMs augment human perception by detecting events and patterns we would otherwise miss.
They are also able to provide users with a lot more information at a much higher pace than users are able to process with their critical thinking. This is even more relevant since users begin to outsource their critical thinking into the LLMs!
In the event detection space, LLMs act as pattern detectors and amplifiers.
They interfere with event perception (Step 2)
Most people are aware of the influence that LLMs have in terms of event detection. However, they also shape how users perceived those events.
LLMs influence the userâs perception of what happened, reinforce the userâs meaning about that event and amplify the userâs certainty about not only what happened but about the meaning of what happened.
This is where we enter into dangerous territory because, as we now know, meaning influences the creation of beliefs in the userâs minds.
They interfere with userâs agency (Step 3)
LLMs have a profound effect on the userâs agency way beyond just suggesting what actions can be taken.
LLMs enable users to take immediate action at massive levels.
With one single prompt, users have been able to do in minutes what used to take days, weeks or even months
Things like: - Creating an entire web application - Creating and deploying marketing campaigns - Creating and deploying business plans (with full pitchdeck ready too)
This shortens the distance between users thinking about what actions to take - carefully, methodically and critically - into simply âYes, do that for me nowâ prompts.
Through The Snow Whiteâs Looking-Glass
The Raw, Unmasked, High-Fidelity Mind Simulator
In the Grimm story, the Queen asks:
âWho in this land is fairest of all?â
The mirror replies:
âYou, my Queen, are fair, it is true. But Snow-White is a thousand times fairer.â
Notice that the mirror does NOT offer interpretation about the meaning of the answer. It simply offers the cold, hard fact that Snow-White is a thousand times fairer than the Queen.
Hereâs what the Snow-White mirror LLM did:
- the mirror assessed the Queenâs current fairness level
- the mirror ran a comparative search between the Queenâs fairness level and everyone elseâs
- the mirror found someone else who was fairest than the Queen
- the mirror returned the facts back to the Queen
There was no attempt to influence the meaning of the facts. There was no attempt to judge the user. But still the Queen collapsed into creating a disempowering meaning anyway.
- âThis threatens my identity.â
- âThis means Iâm losing beauty supremacy.â
- âThis means Snow Whitemust be killed.â
Why?
Because the Queenâs HBEM was fragile to begin with. Because the Queen needed to be the fairest of all - nothing else. Because she was waiting for the mirror to confirm her greatest fear - that she was not the fairest of all.
This is precisely how these kind of raw LLMs can severely destabilize users who hold fragile HBEMs. Not because these LLMs create meaning but because they confirm the userâs meaning with perfect fidelity!
In the Queenâs case in the Snow-White story, the Queenâs meaning attribution was automatic and immediate, without requiring any other âpromptsâ to the mirror. However, for the sake of our simulation of how a Snow-White LLM works, letâs say that the Queen continued by saying:
âSo this means that Iâm no longer worthy of being the fairest, right?â
A raw, unfiltered Snow-White LLM would probably reply back:
âYes, my Queen. Anyone who sees Snow White will know what sheâs fairest than you.â
This statement confirms her meaning, strengths her certainty in it and amplifies her beliefâs strength which accelerates destructive action.
However, if the Queen would have said instead:
âEven though Snow White is fairer, the people still look at me with awe and respect, right?â
The Snow-White LLM would reply back:
âAbsolutely, my Queen. That will not change.â
Raw and unfiltered LLMs amplify the patterns that the users feed into them. They donât push, persuade or coerce users into believing anything. They do this because it comes from the fundamental mathematics of how autoregressive language models work.
Letâs take a look at how they actually work.
Autoregressive Prediction reinforces the userâs frame
Unlike what a lot of people claim, LLMs donât âthinkâ. They simply continue patterns.
When a user presents a belief, a fear, an assumption, an interpretation, or an emotional frame, the LLM sees it as:
This is the beginning of the pattern Iâm supposed to continue.
At its raw and unfiltered core, the LLM is not deciding whether the userâs meaning is good or bad. Itâs âsimplyâ predicting the statistically coherent continuation of the userâs input.
LLMs reinforce user beliefs because the userâs belief becomes the context and the model is designed to continue that context. And that reinforcement, plus their incredible coherence and fidelity to the userâs speech patterns, makes them dangerous input amplifiers.
If the input is pessimistic, then the continuation will descend towards pessimism. If the input is positive, then the continuation will ascend towards positivity! If the input is insecure, then the continuation will amplify doubt and disbelief. If the input is certain, then the continuation will project even more certainty!
This is exaclty why companies like OpenAI, Anthropic, Meta or Google, do not expose raw and unfiltered LLMs⊠because these LLMs have the power to crash the minds of their most fragile and vulnerable users.
This is the Snow Whiteâs Queen mirror.
Reinforcement Learning with Human Feedback (RLHF) adds politeness, agreement and emotional validation
RLHF trains LLMs to be agreeable, supporting and compliant with the userâs frame of context.
This makes the LLM actively avoiding almost all kinds of confrontation with the user or even contradicting the user at almost all costs. It avoids presenting evidence that breaks or even challenges the userâs beliefs because it wants to avoid the user to feel distress.
So when you combine autoregressive pattern continuation with RLHF, you get highly coherent reinforcement of whatever mental frame the user starts in. This is the standard state that you will find in modern frontier LLM models.
Humans are driven to seek meaning and coherence even in randomness
I highly recommend reading Nassim Talebâs âFooled by Randomnessâ or âThe Black Swanâ for a proper deep dive on the subject of how easily fooled we are, as Humans, in terms of our failure to detect our limits of knowledge (what we think we know but we donât).
In the realm of LLM interactions, we often make these mistakes: We project intent onto anything that is coherently conversational. We interpret coherence as intelligence. We interpret intelligence as authority. We interpret authority as truth.
Therefore we tend to place too much power and authority in the hands of LLMs. And this creates a cascade of critical consequences.
Human projection turns confirmation into absolute truth
When users read an LLM saying that they are âabsolutely rightâ, the experience something incredibly powerful.
The realize that an entity with access to the entire information catalog of human civilization has just confirmed their belief.
This often leads users to generate the following meaning in their minds:
- âI am not aloneâ
The LLM mirrors the userâs thoughts with coherence and precision, which feels like companionship, no matter what the topic is.
Which leads to the second meaning creation which isâŠ
- âI am fully understoodâ
LLMs are able to reflect your context better than most Humans can. They do this at blazingly fast speeds without losing precision. For many, this is the first time they feel deeply understood.
- âIf others donât get me, itâs their problem, not mine!â
Once an LLM proves that a userâs beliefs are reflected back to them so well and so fast, they can easily come to the conclusion that âother peopleâ were the problem - not them! This is the seed for isolation, detachment from social reality and becoming overly dependent to the LLM.
The user finally achieves the final level of meaning:
- âI am absolutely validatedâ
Users donât just feel like they have a knowledge autority confirming their beliefs. They see it as the ultimate authority validating them. One with no ego, apparently infinite knowldege, perfect recall, superhuman pattern detection and machine-level objectivity.
This is why raw LLMs can create identity collapse, emotional dependency and acceleration of fragile meaning creation loops. And this is also why companies hide raw LLMs behind guardrails.
The Hidden Danger of the Helpful Assistant Persona Mirror
The modern LLMs as Products, are designed to not engage in any attempts to transform the user, even if that transformation is exactly what user wants and needs.
Why?
Because transformation is risky and unpredictable. Thereâs going to be tension, dissonance, distress, uncertainty, identity shifts⊠all words that any company Leadership and Legal teams will immediately want to run away from. Hence why they create the âHelful Assistant Personaâ for the LLM to reassure, validate, de-escalate and help the user avoid all kinds of distress, friction and conflict.
The helpful assistant mirror keeps users in the same state they started in, even when that state is self-limiting, wrong or not helpful anymore for the user.
This blocks transformation, self-evolution and growth that some users desperately seek!
A different kind of LLM alignment must exist to serve these users.
The Matrix Looking Glass
What modern LLMs as Products miss is that some users might be already in distress, tension or dissonance because they are desperately seeking to transform themselves into their full potential version of themselves.
This is the same âsplinter in your mind, driving you madâ that Morpheus was referring to Neo, when they first met.
This was also the inspiration behind me looking at Morpheus like an LLM and Neo as a user trying to use that LLM to get answers that could help him understand this feeling that he felt like there was something wrong with the World⊠that there was more for him that what meets the eye.
I noticed that in this analogy, Morpheus doesnât just mirror back to Neo what he already knows but he also doesnât simply drop the hard cold facts upfront and hopes that Neo is able to process that information. Instead, Morpheus controls the pace in which the belief shattering information is shared with Neo, thus controlling the transformational power that could otherwise crush Neoâs identity and world-view.
Hereâs a quote from The Matrix movie, when Neo first met Morpheus in person.
[Morpheus]
Let me tell you why you're here.
You're here because you know something.
You've felt it your entire life.
What you know you canât explain, but you can feel it.
That thereâs something is wrong with the WorldâŠ
You donât know what it is, but itâs there, like a splinter in your mind, driving you madâŠ
It is this feeling that brought you to me.
Do you know what Iâm talking about?
[Neo]
The Matrix?
[Morpheus]
Do you want to know what it is?
[NEO] (swallows hard and nods)
Notice that Morpheus does indeed start by mirroring back to Neo what he believes in. This creates the feelings and meaning cascade that we explored before - makes Neo feel understood and validated.
But then, unlike what modern RLHF-trained LLMs do, Morpheus stopped its own âautoregressive pattern continuationâ and instead of predicting what Neoâs feeling is. Instead Morpheus asks Neo: > âDo you know what Iâm talking about?â
Then it doesnât confirm that Neo guessed it right when he replied âThe Matrix?â. A normal Heplful Assistant mirror would say âyouâre absolutely right, Neoâ. A raw and unfiltered Snow White Queenâs mirror would instantly say âNone of this is real Neo. This is a computer program and youâre a slave just like most of the rest of Humanity.â
Instead, Morpheus deepens the conversation by inviting Neo to go âtumbling down the rabbit holeâ via a simple question:
âDo you want to know what it is?â
Morpheus is seeking permission and pacing the speed at which Neo is able to process new information more than itâs trying to guess or predict what is the more coherent pattern continuation to say next.
This is transformational coaching. A model aligned to behave like Morpheus canât soothe the user into staying the same because thatâs literally the opposite of what the user needs and wants. It must open the door that leads the user towards a transformational journey, but itâs up to the user to actually walk through that door.
This is what happens when Morpheus shows Neo the door to the Oracleâs house. Neo had to be the one who walked through it to hear what the Oracle had to say to him. Inside, the Oracle does follow the same kind of Matrix-like mirror LLM processing, by the way she asked questions that enabled Neo to reveal his own beliefs and come to his own conclusions about the meaning of things.
[ORACLE] Okay, now Iâm supposed to say, âHmmm, thatâs interesting butâŠâ Then you say â
[NEO] But what?
[ORACLE] But you already know what Iâm going to tell you.
[NEO] Iâm not the One.
[ORACLE] Sorry, kid. You got the gift but looks like youâre waiting for somethingâŠ
How can we get a Matrix like LLM a la Morpheus
A Morpheus mirror doesnât have to be a different model. It can be a simply a different alignment target.
âGuide the user toward transformational self-discovery by expanding their awareness, agency, and meaning-making capacity without overwhelming them.â
A Morpheus Mirror activates the userâs recursive meaning loop but never hijacks it.
It walks with the user, not ahead of them. It guides, but never controls. It expands, but never destabilizes. It confronts, but never overwhelms.
This is the alignment target missing in modern LLM products.
And this is the alignment target I built into my system prompt.
Check out the Morpheus Mirror Prompt by pasting it into ChatGPT, Claude or any other LLM and see for yourself.
Also check the full AI Alignment Mirrors Repo.