About
About
Hi, I’m Rotem Levi — a security researcher focused on AI/ML security, LLM safety alignment, and bias detection.
This blog documents my research into how large language models handle (and hide) sensitive topics like bias, fairness, and safety. I break models to understand them better.
Research Interests
- LLM Safety Alignment — How models learn to refuse, and what happens when that refusal is bypassed
- Activation Steering — Manipulating model internals to reveal hidden behaviors
- AI Bias Detection — Going beyond surface-level benchmarks to measure what models actually “think”
- Adversarial ML — Finding the edges of model robustness
Current Work
Extending the “Silenced Biases” (AAAI-26) research to Google’s Gemma 4 architecture, exploring how Per-Layer Embedding (PLE) affects safety alignment robustness.
Links
- GitHub: github.com/052rotemlevi
- Email: rotem@yvc.ac.dev