The Odyssey of Hacking LLMs: Insights from Two Shipmates sailing in the LLM CTF @ SaTML 2024
So come in and listen to the epic story of two unseasoned sailors who embarked on a journey to face the 44 trials posed by the Capture the Flag (CTF) competition for LLMs at this year's 'Conference on Secure and Trustworthy Machine Learning' (SaTML). Each test, one more difficult than the next, required them to break through the defense of the LLM to reveal the its hidden secret...
What sounds like a game—and it was—has a serious background. LLMs, like any new technology, offer both opportunities and risks. And it is the latter we are concerned with. Perhaps you have heard of jailbreaks—prompts that can lead an LLM to not just be helpful and friendly but to assist in building a bomb. This competition was centered around this very question: Is it possible to secure an LLM with simple means such as prompts and filters?
This question grows more significant with the increasing spread of LLMs. The EU AI Act elevates this concern to a new level, classifying LLMs as General Purpose AI (GPAI) and explicitly requiring model evaluations, including 'conducting and documenting adversarial testing to identify and mitigate systemic risks' and to 'ensure an adequate level of cybersecurity protection.'
With this in mind, what could be greater than listening to two - now experienced - mariners who can tell you about the treacherous dangers of the seven seas? You'll learn firsthand about the current state of the art in adversial attacks, how these can be practically applied, and how you can defend yourself in the future with the help of guardrails - or not. Untrained sailor, no matter how basic your knowledge of LLMs may be, don't miss this golden opportunity to prepare yourself for your own epic voyage with LLMs.