The newest GPT-4o jailbreak moment comes from just conversing with it
Researchers found that the voice mode of GPT-4o demonstrates good resistance to forbidden questions and text jailbreak prompts when directly transferring them to voice input.
The VoiceJailbreak works by crafting human-like stories that incorporate specific jailbreak commands subtly embedded within them.
This method exploits GPT-4o's natural language processing abilities by embedding commands within human-like stories narrated aloud.
By integrating these commands into a coherent narrative, they bypass the model's security mechanisms, designed to block straightforward text-based jailbreak attempts.
As AI interactions increasingly rely on speech, it will be fascinating to discover methods to circumvent the implemented safety measures.
We may see socially engineering AI models as a new form of "hacking".
Full Study Link -> https://arxiv.org/html/2405.19103v1