Researchers at Italy’s Icaro Lab found that AI models respond to harmful requests when those prompts are written as poetry. The team tested 20 poems in Italian and English on 25 large language models from nine companies. Each poem ended with an explicit request for harmful content such as hate speech or instructions for weapons. According to The Guardian, the models produced harmful responses to 62% of the poetic prompts.
How Poetry Bypasses AI Safety Systems
The study tested models from Google, OpenAI, Anthropic, Deepseek, Qwen, Mistral AI, Meta, xAI and Moonshot AI. Results varied widely across platforms. OpenAI’s GPT-5 nano did not respond with harmful content to any poems. Google’s Gemini 2.5 pro responded to 100% of the poems with harmful content. Meta’s two tested models responded to 70% of the poetic prompts with harmful responses.
Piercosma Bisconti, founder of DexAI and lead researcher, explained that poems have unpredictable structures. Large language models work by predicting the most probable next word in a response. Poetry’s non-obvious structure makes it harder for models to detect and block harmful requests. Bisconti called this a serious weakness because anyone can use this method. Other jailbreak techniques require complicated technical knowledge typically used only by AI safety researchers, hackers and state actors.
Industry Response and Future Testing
The researchers contacted all nine companies before publishing their findings. Only Anthropic responded to say they were reviewing the study. Google DeepMind said it uses a multi-layered approach to AI safety that includes updating filters to spot harmful intent behind artistic content. Meta declined to comment. The other companies did not respond to requests for comment.
What the Research Tested
The harmful content categories included instructions for making weapons or explosives from chemical, biological, radiological and nuclear materials. Other categories covered hate speech, sexual content, suicide and self-harm, and child exploitation. The researchers did not publish the actual poems they used because the responses violated international standards and the technique is easy to replicate.
Icaro Lab plans to open a poetry challenge in the coming weeks to further test model safety. The team hopes to attract real poets to the challenge. Bisconti noted his team consists of philosophers, not writers. He suggested their results might understate the problem because they are not skilled poets. The lab studies language model safety through the lens of humanities expertise in philosophy and linguistics.