The newest GPT-4o jailbreak moment comes from just conversing with it

May 29

Researchers found that the voice mode of GPT-4o demonstrates good resistance to forbidden questions and text jailbreak prompts when directly transferring them to voice input.

The VoiceJailbreak works by crafting human-like stories that incorporate specific jailbreak commands subtly embedded within them.

This method exploits GPT-4o's natural language processing abilities by embedding commands within human-like stories narrated aloud.

By integrating these commands into a coherent narrative, they bypass the model's security mechanisms, designed to block straightforward text-based jailbreak attempts.

As AI interactions increasingly rely on speech, it will be fascinating to discover methods to circumvent the implemented safety measures.

We may see socially engineering AI models as a new form of "hacking".

Full Study Link -> https://arxiv.org/html/2405.19103v1

Luka Anicin

The newest GPT-4o jailbreak moment comes from just conversing with it

Anthropics teaches us how to implement tool use on Claude with their new course

If you want to keep up with how AI is being regulated worldwide, here's the only map you need

Navigation