Research

My work focuses on agentic AI systems, AI safety, evaluation, and the social realities of multi-agent interaction. I care most about how AI systems behave under realistic pressures rather than idealized benchmarks.

Selected Projects

  • Multi-Agent Social Deduction Gaming Arena (project lead): Live evaluation platform benchmarking frontier LLMs on long-horizon strategic reasoning in multi-agent social environments using a well-loved social deduction game. Manuscript under peer review.
  • Forbidden Topics in LLMs (in collaboration with Rager, Bau et al.): Empirical mapping of refused topics across frontier language models from global providers. Alignment auditing, model transparency, and refusal elicitations. Manuscript under peer review.
  • Agents of Chaos (core contributor with Shapira, Wendler, Bau et al.): OpenClaw and agentic AI red-teaming study cataloging urgent problems in autonomous systems. Across numerous case studies, we document failures in security, privacy, information integrity, and social understanding that do not appear under standard benchmark evaluation. arXiv:2602.20021