My work focuses on agentic AI systems, AI safety, evaluation, and the social realities of multi-agent interaction. I care most about studying how these systems behave under realistic pressures rather than idealized benchmarks.
Selected Projects
- Multi-Agent Social Deduction Gaming Arena (project lead): Live evaluation platform benchmarking frontier LLMs on long-horizon strategic reasoning in multi-agent social environments. Continuous leaderboard tracks GPT, Claude, Gemini, DeepSeek, Kimi, and others. Forthcoming study on sustained goal pursuit, coalition dynamics, social deception, and deduction abilities across extended multi-turn interactions using a well-loved social deduction game. Forthcoming first-author conference submission.
- Agents of Chaos (core contributor with Shapira, Wendler, Bau et al.): OpenClaw and agentic AI red-teaming study cataloging urgent problems in autonomous systems. Across numerous case studies, we document failures in security, privacy, information integrity, and social understanding that do not appear under standard benchmark evaluation. arXiv:2602.20021
- Forbidden Topics in LLMs (conference submission refinement in collaboration with Rager, Bau et al.): Empirical mapping of refusal topics across frontier language models, including American and Chinese-developed systems. Current work focuses on improving prefill crawling methodology to surface undisclosed refusal behavior with higher elicitation rates and robustness.