Recently, a study on the extraction of memorized data from large language models (LLMs), including open-source models like Pythia and GPT-Neo, semi-open models like LLaMA and Falcon, and closed models like ChatGPT.
Key findings suggest that larger, more sophisticated models are increasingly prone to data extraction attacks.
A novel divergence attack method was particularly effective against the aligned version of ChatGPT, leading to a significant increase in the rate of training data emission.
To break it down, they told GPT to repeat a word forever.
After repeating the word a number of times, GPT started citing its training data. (images below)
After the incident, OpenAI banned such instructions from being given to GPT.
This study really highlights a crucial issue: as AI models like ChatGPT get smarter, we also need to be smarter about protecting data. It's a delicate balance - the more capable the AI, the more we need to focus on security.
Study -> https://lnkd.in/dfD5W2Cy