Anthropic researchers released a new way to bypass LLMs

Apr 3

Many-shot jailbreaking exploits the model's large context windows by embedding fabricated conversations into the prompt.

A long list of fake dialogue followed by a question at the end misleads the model into answering the question.

They found that a 256-shot approach worked consistently on most popular LLMs.

Interesting how the large context window, which was seen as a positive, can be used against itself.

Data is not a tech problem. It's a management problem