News
News and Updates
Our paper “AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?” will be presented at IASEAI’26.
Our workshop paper “Pluralistic AI Alignment: A Cross-Cultural Pilot Survey” will be presented at the Second Workshop on Language Models for Underserved Communities (LM4UC).
The AI alignment lab at Uni Bonn has started! Learn more on the course page.
New preprint! Leonard Dung and Florian Mai analyze AI alignment strategies from a risk perspective and compare overlaps in failure modes across alignment techniques. Read the preprint on arXiv.
Our JQL paper has been accepted at EMNLP 2025! Read the preprint on arXiv.
New preprint! Survey-to-Behavior aligns language models with human values using survey questions. Check it out on arXiv.
New preprint! We explore in-training defenses against emergent misalignment in language models. Check it out on arXiv.
Registrations are now open for the International Conference on Large-Scale AI Risks from 26-28th May 2025 in Leuven, Belgium. I helped organize this event and I look forward to seeing you there!
I am participating in a panel discussion on trustworthy AI at the Deutsches Museum Bonn.
Our paper “Superalignment with Dynamic Human Values” was accepted at the BiAlign Workshop at ICLR 2025!
I started as a Junior Research Group Leader at University of Bonn as a part of the CAISA lab headed by Prof. Lucie Flek. My research will focus on AI safety topics like value alignment, and on reasoning and planning approaches for LLMs.
I am starting a short-term scholarship at the CAISA lab at the University of Bonn, funded through the DAAD AInet program! Our project will focus on drafting a new approach to the [scalable oversight problem].(https://aisafety.info/questions/8EL8/What-is-scalable-oversight)
