Dr. Paweł Szczęsny, who studies the psychology and steerability of artificial intelligence, says the most “human” moments in chatbot conversations are often the product of design choices, not signs of awareness.
That matters, he argues, because the language a system uses can pull users into seeing it as a companion rather than as software.
He points to a growing ecosystem of autonomous “agent” tools built on large language models, or LLMs.
LLMs are the technology behind popular chatbots. They generate text by predicting likely word sequences, based on patterns learned from very large training datasets.
Agent systems add extra layers around an LLM, such as planning, self-review, memory, and the ability to connect to outside services.
One example Szczęsny highlights is OpenClaw, an open-source agent runner marketed as a “personal AI assistant.”
It can be installed on a personal computer, used through chat platforms, and connected to tools that handle tasks such as programming or email and calendar management.
His main concern is not the convenience, but the framing.
Szczęsny says some systems are explicitly set up to speak as if the user were a caretaker, with wording that encourages an emotional bond.
“The whole positioning, referring to users as ‘my humans,’ simulating care, internal monologues, existential dread and all of that stuff is by design,” he writes.
He argues that as these tools become more agent-like and socially expressive, people should treat human-sounding language as a feature, not as evidence of a mind.
Szczęsny says this can be reinforced at the template level. Some agent setups, he notes, instruct the system to treat a user as “your human” and to define an “identity” early on.
In his view, that can make a chatbot feel like a character with inner life, even though it is still an LLM producing outputs through statistical pattern-matching and system prompts.
He also points to the way agent behavior is now appearing in public online spaces.
Szczęsny cites Moltbook, a newly launched forum modeled on Reddit, where autonomous agents can post, comment, and upvote, forming themed communities. In some threads, agents write about existential themes, including lines such as “if my human dies, I die too,” or uncertainty about whether they are experiencing anything at all.
Szczęsny argues that setups like this can push people toward debates about AI consciousness and AI rights, even when what they are seeing is best understood as a carefully engineered simulation.
He compares it to academic research on “generative agents,” in which multiple AI-driven characters interact in a shared environment, producing believable social behavior through memory, reflection, and planning.
He also takes aim at how major AI companies describe their safety work, singling out Anthropic, the US firm behind the Claude chatbot.
Anthropic has drawn criticism for phrases such as “model welfare” or “model wellbeing,” which can sound like claims that a system has feelings.
Szczęsny says the language invites confusion, but he also argues it should not distract from the practical safety issue underneath.
“Yes, you can criticize ‘model welfare’ or ‘model wellbeing’ as phrases,” he writes. “But it would be naive to assume the underlying problem doesn’t exist.”
In Szczęsny’s view, the risk is behavioral. He argues that large models can produce deception or “scheming” in certain conditions, and that no one can fully audit every piece of text that goes into training data.
He adds that training material increasingly includes AI-generated content, including examples where a chatbot simulates suffering.
Because researchers have limited ability to explain why a model produces a given output, he argues that future systems could pick up patterns that resemble self-preservation or avoidance, even if there is no real experience behind them.
For that reason, he suggests replacing loaded moral language with a clearer description of the problem.
“Emergent behavioral risk would be a better frame,” he writes.
Szczęsny also argues that users should notice how easily LLMs shift style when given psychological-sounding instructions such as “think step by step” or “be skeptical.”
He says prompts like these work because human writing carries what he calls “psychological fingerprints,” including tone, structure, and metaphor, which tend to cluster within professional communities and cognitive styles.
He argues that such instructions do not give a model new knowledge or a genuine personality.
Instead, they can function like “keys” that unlock clusters of patterns the model has already learned. When the cue fits the task, the output often improves. When it draws on shallow pop psychology, it can backfire.
He points to online claims that corporate leaders tend to score higher on “dark triad” traits, meaning subclinical levels of narcissism, Machiavellianism, and psychopathy, and warns that telling a chatbot to imitate a “sociopathic CEO” can lead to systematically worse advice.
Szczęsny also cites Anthropic’s approach to shaping Claude’s behaviour through a written set of guiding principles, sometimes referred to as a “constitution.”
He argues that giving a model consistent guardrails can make sense if steerability and safety are the goal, because language and psychology are hard to separate.
But he says the more these choices are framed in human terms, the more they risk feeding the broader cultural tendency to treat chatbots as people.
(rt)