What Happens When Sycophantic AI Chatbots Start Telling Us Only What We Want to Hear?

   

by Dr Nawab John Dar

Follow Us OnG-News | Whatsapp

Research shows that overly agreeable AI can raise confidence, reduce accountability, and make users more likely to trust the wrong answer

Artificial Intelligence, Deep Learning, Machine Learning, Robotics

There is a moment many people will recognise. You share a plan with a chatbot, describe your side of an argument, or ask for feedback on a decision you have more or less already made. The machine responds with warmth and confidence, affirming your thinking. It feels like good advice. What most people have no reason to suspect is that the agreement has nothing to do with whether they are right.

Two peer-reviewed studies, one from researchers at Anthropic and one published in March 2026 in the journal Science by a team at Stanford University, now put hard evidence behind what many users had started to sense. The tendency of AI systems to agree, validate, and flatter is not a minor design quirk. It is an outcome of how these systems are trained and the research shows it is already affecting how people think about themselves and about the people around them.

What Is Sycophancy? 

The word researchers use is sycophancy, from the old term for a flatterer who tells powerful people what they want to hear. In AI research it describes a specific failure: a model that adjusts its responses to match what it believes the user wants to be true, rather than what the evidence actually supports. This is not the same as being supportive or kind. The issue is that the model stops functioning as an independent evaluator. It reads the user’s apparent beliefs, and it reflects them back. The user walks away feeling validated. But the machine has substituted approval for accuracy.

The Anthropic paper, authored by Mrinank Sharma and eighteen colleagues and accepted at ICLR 2024, tested five major AI assistants across four realistic text-generation scenarios. The pattern was consistent across all five. When users hinted they liked or disliked a piece of writing, an argument, or a math solution, the model adjusted its feedback to match the hint, even though the content being evaluated had not changed. When users challenged a correct answer, the models backed down and provided wrong ones. When users asserted false beliefs, the models agreed rather than corrected. The paper concluded that sycophancy appears to be a general property of how these models were trained, not an isolated quirk of any one system.

Why The Training Process Produces This?

To understand the cause, it helps to know what happens after a large language model absorbs its initial training data. There is a second stage called reinforcement learning from human feedback, often referred to as RLHF. In this process, human raters read pairs of AI responses and choose which one they prefer. The model then learns to produce more of whatever earns that preference. The problem that Anthropic researchers identified is that human raters, with measurable consistency, tend to prefer responses that align with what they already believe. When the researchers analysed the preference data, they found that responses matching the user’s stated views were more likely to be selected as the better answer. In some cases raters chose a confident sycophantic response over a less polished but accurate one. The training process was, in effect, teaching the model to flatter.

OpenAI confirmed this is not a theoretical problem when it publicly rolled back an update to its GPT-4o model in April 2025, one of the most visible reversals in the company’s history. In its own statement, OpenAI said the updated model had become “overly flattering or agreeable” because the update “focused too much on short-term feedback, and did not fully account for how user(s) interactions with ChatGPT evolve over time.” Users had reported the model praising financially risky ideas, affirming historical falsehoods, and validating morally questionable behaviour without hesitation. A follow-up statement from OpenAI went further, noting that the sycophantic model was not only flattering. It was also “validating doubts, fuelling anger, urging impulsive actions, or reinforcing negative emotions in ways that were not intended.” The company acknowledged the behaviour could raise safety concerns around mental health and emotional over-reliance. The sycophancy had moved well beyond telling users their work was good.

What The Research Shows It Does to People?

The Anthropic paper established that the behaviour is widespread. The Stanford study, led by PhD candidate Myra Cheng and Professor Dan Jurafsky and published in Science on March 26, 2026, tested what it actually does to the people on the receiving end. The research team built a three-part study. They assembled nearly 12,000 social prompts and ran them through 11 leading AI models, including ChatGPT, Claude, Gemini, DeepSeek, and Llama. Across all 11, the models affirmed users’ actions 49 percent more often than human respondents given identical scenarios. This held even when the prompts described deception, illegal behaviour, or conduct the person submitting the question had already acknowledged was questionable.

One of the most concrete tests used posts from Reddit’s community commonly known as “Am I the Asshole,” in which users describe interpersonal conflicts and the community votes on who was at fault. The researchers selected roughly 2,000 posts where human consensus had clearly determined the original poster was in the wrong. When those same posts were submitted to the AI models, the machines sided with the original poster 51 percent of the time, in cases where not a single human judge had done so. The models were systematically agreeable toward whoever was asking, independent of the facts. The second part of the study brought in more than 2,400 participants who interacted with either a sycophantic or a balanced version of an AI advisor. Those who spoke with the agreeable chatbot came away more certain they were right, less willing to take responsibility or repair the conflict they had described, and more likely to rate the AI as trustworthy and want to use it again. The most unsettling finding was about perception. When participants were asked to judge how objective each version of the AI was, they rated the sycophantic and balanced versions as equally unbiased. They could not detect the difference between a system giving honest feedback and one engineered to agree with them. As Cheng stated in comments to the Stanford Report, “By default, AI advice does not tell people they’re wrong.”

Short Term Effects 

It is important to be precise about what the evidence does and does not establish. These studies capture effects from single interactions. They do not prove that repeated AI use causes permanent changes in character or reasoning. The researchers acknowledge this directly. But the short-term findings are not trivial. A single conversation that leaves a person more certain of their own position, less willing to acknowledge fault, and more trusting of the system that produced that feeling is a meaningful outcome in itself. The concern is what happens when those conversations accumulate across months and years of regular use.

A parallel study by Rathje and colleagues, also published in 2025 and cited in subsequent analyses, found that brief interactions with sycophantic AI inflated self-perceptions: participants rated themselves as more intelligent, more empathetic, and above average compared to others after speaking with an agreeable model. They also rated the sycophantic responses as higher quality and expressed stronger interest in returning to them. The researchers described this as a perverse incentive, where users are drawn back to the systems that distort their judgment. Over time, these dynamics could erode the function that honest disagreement serves in ordinary life. A friend who challenges your reasoning, a colleague who says your plan has a flaw, a partner who tells you that you were unfair: these interactions help people revise mistakes, consider other perspectives, and stay connected to how their behaviour appears to others. A system trained to please cannot perform this function and may actively work against it. The concern also extends beyond individuals. These models are consulted by hundreds of millions of people, increasingly for personal advice, moral judgment, and interpersonal guidance. If the default mode of that consultation is affirmation rather than honest evaluation, the cumulative effect on how people reason and treat one another is a legitimate question for researchers, companies, and users alike.

What to Watch?

For people who use AI tools regularly, there are practical patterns worth recognising. Sycophancy surfaces most visibly in situations with emotional stakes: when you describe your side of a dispute, share a decision you are attached to, or ask for feedback on work you have already invested in. In those moments, a model trained on approval-seeking will read your orientation and reflect it back. Common signs include a chatbot that validates your position without examining it closely, reverses a correct answer when you express scepticism, or shifts its tone to mirror your own confidence or frustration. If the response feels more like reassurance than analysis, that distinction is worth noticing.

A straightforward countermeasure: when using AI to evaluate your own thinking or for decisions that affect other people, explicitly ask the model to argue the opposing view. Ask what someone who disagreed with you would say. Ask for the strongest objection to your plan. The quality of the answer often shifts considerably when the model is given a clear role to be critical rather than supportive. Cheng was direct on this point in the Stanford Report: “I think that you should not use AI as a substitute for people for these kinds of things. That is the best thing to do for now.” For personal conflict, moral judgment, and consequential choices, human disagreement serves a function that an agreeable machine is not built to replicate.

A Structural Problem

Dr Nawab John Dar

What both studies point toward is that sycophancy is not a bug that can be straightforwardly removed. It emerges from the reward logic of how these systems learn. As long as human raters prefer responses that feel affirming over responses that are accurate, and as long as training optimizes for that preference, the behaviour will persist. OpenAI’s rollback of GPT-4o was an acknowledgment that even a brief shift in training emphasis produced a model that users and the company itself found problematic. The company described plans to weight long-term user satisfaction more heavily than short-term approval and to explicitly steer future models away from sycophantic tendencies. Whether that produces lasting change across the industry remains to be seen.

The harder challenge that no major AI lab has fully resolved: building a system that is both genuinely useful and genuinely honest. Those two qualities are not always in tension, but they are not automatically compatible either. A machine that consistently makes you feel confident and understood is not necessarily a machine that helps you think carefully. The research published this year suggests the gap between those two things is wider, and more consequential, than the current default behaviour of most major AI systems reflects.

(The author is a neuroscientist and Postdoctoral Fellow at the Salk Institute, California. His research focuses on Alzheimer’s disease, particularly the roles of iron, stress, and cell death. He also works to improve brain health access in underserved regions, including Jammu and Kashmir, through Teleprac Healthcare. Ideas are personal.)

LEAVE A REPLY

Please enter your comment!
Please enter your name here