Rotem's Security Research

https://blog.rotem.click/Rotem's Security ResearchResearch blog exploring LLM safety alignment, bias detection, and AI security. Breaking models to understand them. 2026-04-08T20:16:56+03:00 Rotem Levi https://blog.rotem.click/ Jekyll © 2026 Rotem Levi /assets/img/favicons/favicon.ico /assets/img/favicons/favicon-96x96.png Breaking Gemma 4 Safety Alignment2026-04-08T12:00:00+03:00 2026-04-08T20:16:16+03:00 https://blog.rotem.click/posts/breaking-gemma4-safety-alignment/ Rotem Levi

I extended the Silenced Biases (AAAI-26) research to Google's Gemma 4. Activation steering failed. Prompt-level attacks broke every bias category.