Publications

Refereed Publications

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Usman Anwar, Abulhair Saparov*, Javier Rando*, Daniel Paleka*, Miles Turpin*, Peter Hase*, Ekdeep Singh*, Erik Jenner*, Stephen Casper*, Oliver Sourbut*, Benjamin Edelman*, Zhaowei Zhang*, Mario Gunther*, Anton Korinek*, Jose Hernandez-Orallo*, Lewis Hammond†, Eric Bigelow†, Alex Pan†, Lauro Langosco†, Tomasz Korbak†, Heidi Zhang†, Ruiqi Zhong†, Seán Ó hÉigeartaigh‡, Gabriel Rachet†, Giulio Corsi‡, Alan Chan‡, Markus Anderljung‡, Lillian Edwards‡, Yoshua Bengio‡, Danqi Chen‡, Samuel Albanie‡, Tegan Maharaj‡, Jakob Foerster‡, Florian Tramer‡, He He‡, Atoosa Kasirzadeh‡, Yejin Choi‡, David Krueger‡

*indicates major contribution, †indicates minor contribution, ‡indicates advisory role.

Characterizing Manipulation from AI Systems

Micah Carroll†, Alan Chan†, Henry Ashton, David Krueger

ACM conference on Equity and Access in Algorithms, Mechanisms, and Optimization (2023)

Thinker: Learning to Plan and Act

Stephen Chung, Ivan Anokhin, David Krueger.

Neural Information Processing Systems (2023)

Harms from Increasingly Agentic Algorithmic Systems

Alan Chan, Rebecca Salganik, Zhonghao He, John Burden, Yawen Duan, Shalaleh Rismani, Alva Markelius, Katherine Collins, Maryam Molamohammadi, Chris Pang, Lauro Langosco, Konstantinos Voudouris, Wanru Zhao, Dmitrii Krasheninnikov, Michelle Lin, Alex Mayhew, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj.

ACM Conference on Fairness, Accountability, and Transparency (2023)

Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics

Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, Sara Hooker

International Conference on Learning Representations (2023) (spotlight / top 25%)

Broken Neural Scaling Laws

Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger.

International Conference on Learning Representations (2023)

Mechanistic Mode Connectivity

Ekdeep Singh Lubana, Eric J Bigelow, Robert Dick, David Krueger, Hidenori Tanaka.

International Conference on Machine Learning (2023)

Defining and Characterizing Reward Gaming

Joar Skalse, Niki Howe, Dmitrii Krasheninnikov, David Krueger.

Neural Information Processing Systems (2022)

Goal Misgeneralization in Deep Reinforcement Learning

Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, David Krueger.

International Conference on Machine Learning (2022)

Out-of-Distribution Generalization via Risk Extrapolation (REx)

David Krueger, Ethan Caballero, Jörn-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinhuai Zhang, Remi Le Priol, Aaron Courville.

International Conference on Machine Learning (Oral) (2021)

Filling gaps in trustworthy development of AI

Shahar Avin, Haydn Belfield, Miles Brundage, Gretchen Krueger, Jasmine Wang, Adrian Weller, Markus Anderljung, Igor Krawczuk, David Krueger, Jonathan Lebensold, Tegan Maharaj, Noa Zilberman.

Science (2021)