Publications
Refereed Publications
Implicit meta-learning may lead language models to trust more reliable sources
Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, Tegan Maharaj, David Krueger
International Conference on Machine Learning (2024)
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Usman Anwar, Abulhair Saparov*, Javier Rando*, Daniel Paleka*, Miles Turpin*, Peter Hase*, Ekdeep Singh*, Erik Jenner*, Stephen Casper*, Oliver Sourbut*, Benjamin Edelman*, Zhaowei Zhang*, Mario Gunther*, Anton Korinek*, Jose Hernandez-Orallo*, Lewis Hammond†, Eric Bigelow†, Alex Pan†, Lauro Langosco†, Tomasz Korbak†, Heidi Zhang†, Ruiqi Zhong†, Seán Ó hÉigeartaigh‡, Gabriel Rachet†, Giulio Corsi‡, Alan Chan‡, Markus Anderljung‡, Lillian Edwards‡, Yoshua Bengio‡, Danqi Chen‡, Samuel Albanie‡, Tegan Maharaj‡, Jakob Foerster‡, Florian Tramer‡, He He‡, Atoosa Kasirzadeh‡, Yejin Choi‡, David Krueger‡
*indicates major contribution, †indicates minor contribution, ‡indicates advisory role.
Characterizing Manipulation from AI Systems
Micah Carroll†, Alan Chan†, Henry Ashton, David Krueger
ACM conference on Equity and Access in Algorithms,
Mechanisms, and Optimization (2023)
Thinker: Learning to Plan and Act
Stephen Chung, Ivan Anokhin, David Krueger.
Neural Information Processing Systems (2023)
Harms from Increasingly Agentic Algorithmic Systems
Alan Chan, Rebecca Salganik, Zhonghao He, John Burden, Yawen Duan, Shalaleh Rismani, Alva Markelius, Katherine Collins, Maryam Molamohammadi, Chris Pang, Lauro Langosco, Konstantinos Voudouris, Wanru Zhao, Dmitrii Krasheninnikov, Michelle Lin, Alex Mayhew, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj.
ACM Conference on Fairness, Accountability, and Transparency (2023)
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, Sara Hooker
International Conference on Learning Representations (2023) (spotlight / top 25%)
Broken Neural Scaling Laws
Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger.
International Conference on Learning Representations (2023)
Mechanistic Mode Connectivity
Ekdeep Singh Lubana, Eric J Bigelow, Robert Dick, David Krueger, Hidenori Tanaka.
International Conference on Machine Learning (2023)
Defining and Characterizing Reward Gaming
Joar Skalse, Niki Howe, Dmitrii Krasheninnikov, David Krueger.
Neural Information Processing Systems (2022)
Goal Misgeneralization in Deep Reinforcement Learning
Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, David Krueger.
International Conference on Machine Learning (2022)
Out-of-Distribution Generalization via Risk Extrapolation (REx)
David Krueger, Ethan Caballero, Jörn-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinhuai Zhang, Remi Le Priol, Aaron Courville.
International Conference on Machine Learning (Oral) (2021)
Filling gaps in trustworthy development of AI
Shahar Avin, Haydn Belfield, Miles Brundage, Gretchen Krueger, Jasmine Wang, Adrian Weller, Markus Anderljung, Igor Krawczuk, David Krueger, Jonathan Lebensold, Tegan Maharaj, Noa Zilberman.
Science (2021)