News

News and Updates

08-01-2026

Our paper “AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?” will be presented at IASEAI’26.

14-12-2025

Our workshop paper “Pluralistic AI Alignment: A Cross-Cultural Pilot Survey” will be presented at the Second Workshop on Language Models for Underserved Communities (LM4UC).

27-10-2025

The AI alignment lab at Uni Bonn has started! Learn more on the course page.

13-10-2025

New preprint! Leonard Dung and Florian Mai analyze AI alignment strategies from a risk perspective and compare overlaps in failure modes across alignment techniques. Read the preprint on arXiv.

21-08-2025

Our JQL paper has been accepted at EMNLP 2025! Read the preprint on arXiv.

15-08-2025

New preprint! Survey-to-Behavior aligns language models with human values using survey questions. Check it out on arXiv.

08-08-2025

New preprint! We explore in-training defenses against emergent misalignment in language models. Check it out on arXiv.

23-03-2025

Registrations are now open for the International Conference on Large-Scale AI Risks from 26-28th May 2025 in Leuven, Belgium. I helped organize this event and I look forward to seeing you there!

20-03-2025

I am participating in a panel discussion on trustworthy AI at the Deutsches Museum Bonn.

06-03-2025

Our paper “Superalignment with Dynamic Human Values” was accepted at the BiAlign Workshop at ICLR 2025!

01-01-2025

I started as a Junior Research Group Leader at University of Bonn as a part of the CAISA lab headed by Prof. Lucie Flek. My research will focus on AI safety topics like value alignment, and on reasoning and planning approaches for LLMs.

01-10-2024

I am starting a short-term scholarship at the CAISA lab at the University of Bonn, funded through the DAAD AInet program! Our project will focus on drafting a new approach to the [scalable oversight problem].(https://aisafety.info/questions/8EL8/What-is-scalable-oversight)