Anthropic researchers released a new way to bypass LLMs
Many-shot jailbreaking exploits the model's large context windows by embedding fabricated conversations into the prompt.
A long list of fake dialogue followed by a question at the end misleads the model into answering the question.
They found that a 256-shot approach worked consistently on most popular LLMs.
Interesting how the large context window, which was seen as a positive, can be used against itself.
Check out the full paper 📑 - https://www.anthropic.com/research/many-shot-jailbreaking