Researchers Broke GPT and spilled its training data

Recently, a study on the extraction of memorized data from large language models (LLMs), including open-source models like Pythia and GPT-Neo, semi-open models like LLaMA and Falcon, and closed models like ChatGPT.

Key findings suggest that larger, more sophisticated models are increasingly prone to data extraction attacks.

A novel divergence attack method was particularly effective against the aligned version of ChatGPT, leading to a significant increase in the rate of training data emission.

To break it down, they told GPT to repeat a word forever.

After repeating the word a number of times, GPT started citing its training data. (images below)

After the incident, OpenAI banned such instructions from being given to GPT.

This study really highlights a crucial issue: as AI models like ChatGPT get smarter, we also need to be smarter about protecting data. It's a delicate balance - the more capable the AI, the more we need to focus on security.

Study -> https://lnkd.in/dfD5W2Cy

Previous
Previous

An AI program will be used to defend a case in a US court

Next
Next

Synergy of Mind & Machine