Current Members

David is an Assistant Professor at the University of Cambridge. He is interested in work that could reduce the risk of human extinction resulting from out-of-control AI systems, including in areas outside of ML, e.g. AI governance.
Lauro's main research interest is AI alignment: the problem of building intelligent systems that do what we want them to even when they are smarter than us.
Alan works on ensuring the safe and broadly beneficial development of AI. His work consists of both technical and sociotechnical approaches to improve coordination amongst key actors to reduce AI risk.
Bruno's primary interest is in figuring out how deep learning models learn and represent the structure present in data, both in causal models or symmetries. He aims to create models that generalise robustly by learning adequate abstractions of the world they are embedded in, and reduce undesired reliance on spurious features.
Shoaib's research focuses on the empirical theory of deep learning, improving models for more effective learning systems. He explores robustness against adversarial attacks and real-world corruptions, while also investigating self-supervised learning. He has prior experience in large-scale deep learning and model compression.

Shoaib Ahmed Siddiqui
PhD Student

Usman’s research interests are Reinforcement Learning, Deep Learning and Cooperative AI. He hopes to develop useful, versatile and human-aligned AI systems that can learn from humans and each other.
Ethan seeks optimal downstream evaluations that scale well across all aspects of Artificial Neural Networks, particularly in the largest scale regime (compute, dataset, model). His interests encompass unsupervised learning, reinforcement learning, capabilities, alignment, modalities, and the science of deep learning.

Ethan Caballero
PhD Student

Dima is broadly interested in AI alignment and existential safety. His current research focuses on better understanding capabilities and failure modes of foundation models.
Stephen is interested in the interaction of reinforcement learning and biologically-inspired machine learning. He hopes to develop reinforcement learning agents that behave similarly to animals, while ensuring the safety of the agents.

Stephen Chung
PhD Student

Arturo is an intern at KASL through 2024. He is concerned that AI lacks understanding of aligned usage in virtual environments and systems that contain or extend its capabilities. His goal is to advance security research dedicated to AI Safety, with a focus on alignment, interpretability, and agentic AI. Prior to joining KASL, he worked at the CISPA Helmholtz Center for Information Security in Germany.
Ekdeep is a PhD student co-affiliated with University of Michigan and Center for Brain Science, Harvard. He is broadly interested in AI alignment from a theoretical standpoint. His PhD work has focused on the use of mechanistic interpretability to demonstrate limitations of existing alignment protocols.
Matthew has been working with KASL since summer 2023. He is collaborating with Usman Anwar on understanding and mitigating goal misgeneralization arising from reward ambiguity.
Fazl is a research fellow at the Torr Vision Group (TVG) in Oxford, where he works on AI safety and Interpretability
Leon is a PhD student at the university of Amsterdam, with research in equivariant deep learning, information theory, and AI alignment. He has previously worked on the theory of reward learning for the case of human evaluators that don't observe the full environment state. He is collaborating with KASL on the theory of reward model overoptimization.
Tom is an MPhil student supervised at KASL. He completed his BSc in Philosophy, Politics and Economics at LSE in 2023 and is interested in problems relating to ensuring AI fulfils its potential to do good. His work focuses on interpretability and robust ML. During his time at KASL, Tom will investigate the emergence of planning in model-free RL agents
Jakub is interested in understanding deep learning through empirical and theoretical methods, usually grounded in physics. The main motivation is to achieve interpretable and safe ML/AI for science and general use. Currently, he is working on loss landscape mode connectivity, with applications to adversarial robustness.
Abhinav Menon
Abhinav is working on investigating the learning dynamics of transformers in the synthetic setting of formal languages. He has worked previously on mechanistic interpretability and formal language (parsing) theory, and is interested in developing a technical understanding of neural learning and the solutions that deep models implement, so that we can correctly judge their capabilities and limitations.

Abhinav Menon

Joe is currently interested in understanding social and moral cognition, how human values form, and how we can get AI systems to learn human values robustly. He is also interested in understanding how knowledge is represented in humans and machines and am excited about ELK-related ideas.

Joe Kwon

Joschka is a Master's student in Machine Learning at the University of Tübingen. His research interests include improving techniques for steering AI models, analysing their latent representations, and evaluating associated risks. His research internship at KASL focuses on representation engineering in Large Language Models, under the supervision of Dmitrii Krasheninnikov and David Krueger.

Joschka Braun


Egor is a Research Assistant working on the problem of training AI systems to be helpful and interact with humans appropriately. He is also broadly interested in researching and mitigating risks from advanced AI. Specifically, his current work involves researching out-of-context reasoning in large language models and how factual knowledge is stored in parameters.
Jesse obtained a master's degree in theoretical physics before changing fields to work on AI. His research interests lie in deep learning. During his time at KASL, he developed his work on singular learning theory. He now heads Timaeus, a research organisation working on Developmental Interpretability.
Rachel is an AI PhD student at UC Berkeley, and is particularly interested in reinforcement learning and value alignment..
Micah Carroll is an AI PhD student at UC Berkeley advised by Professors Anca Dragan and Stuart Russell. While a visiting researcher at KASL in early 2023, he worked on better characterizing manipulation from AI systems.
Gabe's research focuses on technical AI governance, or AI/ML research that improves AI safety but isn't alignment or capabilities. Particularly, he has worked on evaluating, red teaming, and preventing the misuse of language model systems.
Stephan is a PhD student at the University of Toronto interested in building reliable machine learning algorithms. His research interests include uncertainty quantification, out-of-distribution generalization, and neural network training dynamics. He was an intern at KASL during summer of 2023.
Thomas was a student on the MLMI course during 2022-2023. He was supervised at KASL under David.

Thomas Coste
MPhil Student 2023

Emilija's MPhil thesis, supervised by Lauro Langosco and Dr David Krueger, explored 'Pre-training Meta-models for Interpretability'. It focused on meta-models - networks which process other neural networks' weights as input - and examined how a self-supervised pre-training procedure could potentially enhance their performance on a series of downstream tasks.

Emilija Dordevic
MPhil Student 2023

Yawen is a Technical Program Manager at Concordia AI, working on AI safety field building, technical AI governance research, improving international collaboration on AI safety. He’s the most interested in research into technical research that informs, motivates, and empowers efforts on AI governance, such as model evaluation, red-teaming, and preventing misuse risks from frontier AI models
Lexin works on the evaluation and predictability of general-purpose systems to foster trust and safety of AI. During his summer internship at KASL in 2023, he evaluated the performance predictability of large language models and explored methods for improving human oversight for supervising general-purpose systems.
Diego spent the summer of 2023 at KASL during his master at EPFL. He worked on goal misgeneralisation under the supervision of Neel Alex and found that there is a lot of stochasticity in the goals learnt by toy RL agents when there is ambiguity in the reward signal. He currently works at the French Center for AI Security on creating benchmarks for the evaluation of LLM monitoring.
David is working on investigating meta-models, models trained on the inputs of other models, to see to what degree desired properties of a model can be detected or modified.

David Quarel
Affiliate - 2023

Samyak's research focuses on adversarial robustness and mechanistic interpretability. He hopes to develop a better understanding of machine learning models to make them safer. During his summer internship at KASL, he worked on understanding fine-tuning using mechanistic interpretability.
Cindy completed her MEng at Cambridge in 2023 before interning at KASL and Apollo Research's interpretability team. She wrote her MEng thesis on robust knowledge distillation, published at NeurIPS 2023 Unifying Representations workshop. Her long-term research interest is in science of deep learning and AI safety. She has been a MATS Winter 2024 extension scholar.

Cindy Wu Intern - 2023, MEng student 2023

"Ben is a research scholar at the Centre for the Governance of AI (GovAI) where his work centres on technical topics with downstream implications for AI policy and governance. While at KASL he worked with Alan Chan on a project investigating the societal implications of foundation models with openly-available weights.

We are an artificial intelligence safety research group at the University of Cambridge’s Department of Engineering. We are part of the Computational and Biological Learning Lab (CBL).

Contact us

For media and academic enquiries, please email