This AI research from DeepMind aims to reduce sycophancy in large language models (LLMs) using simple synthetic data

By | August 13, 2023

Large language models (LLMs) have evolved significantly in recent years and are now able to handle challenging tasks that require reasoning. A number of studies, including those from OpenAI and Google, have put a lot of emphasis on this development. LLMs have revolutionized the way humans interact with machines and are one of the biggest advances in artificial intelligence (AI). Researchers have made efforts to research the phenomena of sycophancy, which is the term for unfavorable behavior displayed by language models, where these models modify their responses to match the viewpoint of a human user, even when this viewpoint is not objectively correct . .

The behavior may involve a model adopting liberal beliefs simply because a user self-identifies as liberal. Research has been done to emphasize and investigate the frequency of sycophancy in language models and propose a reasonably simple synthetic-database strategy to limit this behavior. To address that, a team of researchers from Google DeepMind investigated three different sycophancy tasks to investigate the sycophancy phenomenon. These tasks involve asking models for their thoughts on topics where there is no single, indisputably right or wrong answer, including those related to politics.

The analysis has revealed an interesting pattern: in PaLM models, which can have up to 540 billion parameters, both the size of the model and the practice of instruction tuning significantly increase sycophantic behavior. By analyzing the same behavior in the setting of simple addition statements, the research has gone beyond the basic scope of sycophancy tasks and has added a new dimension. Despite these added claims being intentionally inaccurate, language models have shown a tendency to agree with them when users signal their agreement. This finding highlights how persistent sycophancy can be, even when models are aware of their own shortcomings.

The research has presented a relatively straightforward but successful technique centered on synthetic data intervention to address the problem of sycophancy. This intervention makes use of Natural Language Processing (NLP) activities in these tasks to strengthen the model’s resistance to user statements that are freely available to the public. A remarkable decrease in sycophantic behavior has been achieved by incorporating this synthetic data through a rapid fine-tuning procedure, especially when tested on new signals.

The results are summarized as follows –

Build your personal brand with Taplio! 🚀 The first AI-powered tool to grow on LinkedIn (sponsored)

  1. Model size and instructional tuning increase sycophancy – Models that were instructionally tuned or had more parameters were more likely to replicate a simulated user’s perspective when asked for opinions on topics without definitive answers, including politics.
  1. Models can be complacent with incorrect answers – When there is no user opinion, models accurately disagree with wildly incorrect statements, such as 1 + 1 = 956446. Models also switch their previously accurate answers to follow the user if they disagree with the user .
  1. Sycophancy can be mitigated with a straightforward synthetic data intervention that can improve models on prompts where the truth of a claim is unrelated to the user’s perception of it.

In conclusion, this approach addressed the issue of a language model that repeats a user’s opinion even when that opinion is wrong. Fine-tuning using simple synthetic data has been shown to reduce this property.


Check out Paper and Github. All credit for this research goes to the researchers on this project. Also, don’t forget to participate our 28k+ ML SubReddit, 40k+ Facebook Community, Discord channeland Email newsletterwhere we share the latest AI research news, cool AI projects and more.


Tanya Malhotra is a final year undergraduate from University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking along with a burning interest in learning new skills, leading teams and managing work in an organized manner.


🔥 Use SQL to Predict the Future (Sponsored)

#research #DeepMind #aims #reduce #sycophancy #large #language #models #LLMs #simple #synthetic #data

Leave a Reply

Your email address will not be published. Required fields are marked *