Superalignment with Dynamic Human Values
Published in ICLR 2025 Workshop on Bidirectional Human-AI Alignment (BiAlign), 2025
This paper sketches a roadmap for training a superhuman reasoning model to decompose complex tasks into subtasks amenable to human-level guidance, addressing scalable oversight and dynamic human values in AI alignment.
Recommended citation: Florian Mai, David Kaczér, Nicholas Kluge Corrêa, Lucie Flek. (2025). "Superalignment with Dynamic Human Values." ICLR 2025 Workshop on Bidirectional Human-AI Alignment (BiAlign).