Anthropic researchers released a new way to bypass LLMs

Many-shot jailbreaking exploits the model's large context windows by embedding fabricated conversations into the prompt.

A long list of fake dialogue followed by a question at the end misleads the model into answering the question.

They found that a 256-shot approach worked consistently on most popular LLMs.

Interesting how the large context window, which was seen as a positive, can be used against itself.

Check out the full paper 📑 - https://www.anthropic.com/research/many-shot-jailbreaking

Previous
Previous

Data is not a tech problem. It's a management problem

Next
Next

The future of 3D modeling will be as effortless as hitting 'Print'