Team


Current Members

David-Krueger
David is an Assistant Professor at the University of Cambridge. He is interested in work that could reduce the risk of human extinction resulting from out-of-control AI systems, including in areas outside of ML, e.g. AI governance.
Lauro-Langosco
Lauro's main research interest is AI alignment: the problem of building intelligent systems that do what we want them to even when they are smarter than us.
Alan-Chan
Alan works on ensuring the safe and broadly beneficial development of AI. His work consists of both technical and sociotechnical approaches to improve coordination amongst key actors to reduce AI risk.
Bruno
Bruno's primary interest is in figuring out how deep learning models learn and represent the structure present in data, both in causal models or symmetries. He aims to create models that generalise robustly by learning adequate abstractions of the world they are embedded in, and reduce undesired reliance on spurious features.
Shoaib-Ahmed-Siddiqui
Shoaib's research focuses on the empirical theory of deep learning, improving models for more effective learning systems. He explores robustness against adversarial attacks and real-world corruptions, while also investigating self-supervised learning. He has prior experience in large-scale deep learning and model compression.


Shoaib Ahmed Siddiqui
PhD Student

Usman-Anwar
Usman’s research interests are Reinforcement Learning, Deep Learning and Cooperative AI. He hopes to develop useful, versatile and human-aligned AI systems that can learn from humans and each other.
Ethan-Caballero
Ethan seeks optimal downstream evaluations that scale well across all aspects of Artificial Neural Networks, particularly in the largest scale regime (compute, dataset, model). His interests encompass unsupervised learning, reinforcement learning, capabilities, alignment, modalities, and the science of deep learning.


Ethan Caballero
PhD Student

Dmitrii-Krasheninnikov
Dima is broadly interested in AI alignment and existential safety. His current research focuses on better understanding capabilities and failure modes of foundation models.
Stephen-Chung
Stephen is interested in the interaction of reinforcement learning and biologically-inspired machine learning. He hopes to develop reinforcement learning agents that behave similarly to animals, while ensuring the safety of the agents.
kasl-arturo
Arturo is an intern at KASL through 2024. He is concerned that AI lacks understanding of aligned usage in virtual environments and systems that contain or extend its capabilities. His goal is to advance security research dedicated to AI Safety, with a focus on alignment, interpretability, and agentic AI. Prior to joining KASL, he worked at the CISPA Helmholtz Center for Information Security in Germany.
ekdeep
Ekdeep is a PhD student co-affiliated with University of Michigan and Center for Brain Science, Harvard. He is broadly interested in AI alignment from a theoretical standpoint. His PhD work has focused on the use of mechanistic interpretability to demonstrate limitations of existing alignment protocols.
matt-2022
Matthew has been working with KASL since summer 2023. He is collaborating with Usman Anwar on understanding and mitigating goal misgeneralization arising from reward ambiguity.
fazlbarez_002
Fazl is a research fellow at the Torr Vision Group (TVG) in Oxford, where he works on AI safety and Interpretability
leon
Leon is a PhD student at the university of Amsterdam, with research in equivariant deep learning, information theory, and AI alignment. He has previously worked on the theory of reward learning for the case of human evaluators that don't observe the full environment state. He is collaborating with KASL on the theory of reward model overoptimization.
Tom Bush
Tom is an MPhil student supervised at KASL. He completed his BSc in Philosophy, Politics and Economics at LSE in 2023 and is interested in problems relating to ensuring AI fulfils its potential to do good. His work focuses on interpretability and robust ML. During his time at KASL, Tom will investigate the emergence of planning in model-free RL agents
profil
Jakub is interested in understanding deep learning through empirical and theoretical methods, usually grounded in physics. The main motivation is to achieve interpretable and safe ML/AI for science and general use. Currently, he is working on loss landscape mode connectivity, with applications to adversarial robustness.
Abhinav Menon
Abhinav is working on investigating the learning dynamics of transformers in the synthetic setting of formal languages. He has worked previously on mechanistic interpretability and formal language (parsing) theory, and is interested in developing a technical understanding of neural learning and the solutions that deep models implement, so that we can correctly judge their capabilities and limitations.


Abhinav Menon
Intern

Joe Kwon
Joe is currently interested in understanding social and moral cognition, how human values form, and how we can get AI systems to learn human values robustly. He is also interested in understanding how knowledge is represented in humans and machines and am excited about ELK-related ideas.


Joe Kwon
Intern

Joschka Braun
Joschka is a Master's student in Machine Learning at the University of Tübingen. His research interests include improving techniques for steering AI models, analysing their latent representations, and evaluating associated risks. His research internship at KASL focuses on representation engineering in Large Language Models, under the supervision of Dmitrii Krasheninnikov and David Krueger.
Karim Abdel Sadek
Karim is an MSc student at the University of Amsterdam, and an intern at KASL through 2024. At KASL he has been working on Goal Misgeneralization, UED and reward specification with Usman Anwar, Michael Dennis and David Krueger. More broadly, he is also interested in Cooperative AI. Previously, he did research in Theoretical Computer Science, specifically on the topic of algorithms with predictions


Karim Abdel Sadek
Intern

Itamar Pres
Itamar is an undergraduate at the University of Michigan investigating AI constraint enforcement for Large Language Models (LLMs). His previous research has investigated how DPO mitigates toxicity in LLMs, while also uncovering vulnerabilities that explain post-alignment jailbreaking. At KASL, he's developing a tool for composing inference-time constraints on LLM generation.


Itamar Pres
Intern

Michael Lan
I am a PhD graduate who is working with David Krueger, Fazl Barez and Philip Torr on mechanistic interpretability.


Alumni

Egor_Krasheninnikov
Egor is a Research Assistant working on the problem of training AI systems to be helpful and interact with humans appropriately. He is also broadly interested in researching and mitigating risks from advanced AI. Specifically, his current work involves researching out-of-context reasoning in large language models and how factual knowledge is stored in parameters.
jesse
Jesse obtained a master's degree in theoretical physics before changing fields to work on AI. His research interests lie in deep learning. During his time at KASL, he developed his work on singular learning theory. He now heads Timaeus, a research organisation working on Developmental Interpretability.
rachel_headshot
Rachel is an AI PhD student at UC Berkeley, and is particularly interested in reinforcement learning and value alignment..
a_micah
Micah Carroll is an AI PhD student at UC Berkeley advised by Professors Anca Dragan and Stuart Russell. While a visiting researcher at KASL in early 2023, he worked on better characterizing manipulation from AI systems.
Gabriel Mukobi 2023-03-23
Gabe's research focuses on technical AI governance, or AI/ML research that improves AI safety but isn't alignment or capabilities. Particularly, he has worked on evaluating, red teaming, and preventing the misuse of language model systems.
stephan
Stephan is a PhD student at the University of Toronto interested in building reliable machine learning algorithms. His research interests include uncertainty quantification, out-of-distribution generalization, and neural network training dynamics. He was an intern at KASL during summer of 2023.
thomas_coste
Thomas was a student on the MLMI course during 2022-2023. He was supervised at KASL under David.


Thomas Coste
MPhil Student 2023

Emilija1
Emilija's MPhil thesis, supervised by Lauro Langosco and Dr David Krueger, explored 'Pre-training Meta-models for Interpretability'. It focused on meta-models - networks which process other neural networks' weights as input - and examined how a self-supervised pre-training procedure could potentially enhance their performance on a series of downstream tasks.


Emilija Dordevic
MPhil Student 2023

[2023-07-26] yawen1
Yawen is a Technical Program Manager at Concordia AI, working on AI safety field building, technical AI governance research, improving international collaboration on AI safety. He’s the most interested in research into technical research that informs, motivates, and empowers efforts on AI governance, such as model evaluation, red-teaming, and preventing misuse risks from frontier AI models
lex_cam1_shorter
Lexin works on the evaluation and predictability of general-purpose systems to foster trust and safety of AI. During his summer internship at KASL in 2023, he evaluated the performance predictability of large language models and explored methods for improving human oversight for supervising general-purpose systems.
diego-id-2022
Diego spent the summer of 2023 at KASL during his master at EPFL. He worked on goal misgeneralisation under the supervision of Neel Alex and found that there is a lot of stochasticity in the goals learnt by toy RL agents when there is ambiguity in the reward signal. He currently works at the French Center for AI Security on creating benchmarks for the evaluation of LLM monitoring.
david_headshot
David is working on investigating meta-models, models trained on the inputs of other models, to see to what degree desired properties of a model can be detected or modified.


David Quarel
Affiliate - 2023

Samyak
Samyak's research focuses on adversarial robustness and mechanistic interpretability. He hopes to develop a better understanding of machine learning models to make them safer. During his summer internship at KASL, he worked on understanding fine-tuning using mechanistic interpretability.
graduation_edited
Cindy completed her MEng at Cambridge in 2023 before interning at KASL and Apollo Research's interpretability team. She wrote her MEng thesis on robust knowledge distillation, published at NeurIPS 2023 Unifying Representations workshop. Her long-term research interest is in science of deep learning and AI safety. She has been a MATS Winter 2024 extension scholar.


Cindy Wu
Intern - 2023, MEng student 2023

IMG_8084
"Ben is a research scholar at the Centre for the Governance of AI (GovAI) where his work centres on technical topics with downstream implications for AI policy and governance. While at KASL he worked with Alan Chan on a project investigating the societal implications of foundation models with openly-available weights.

We are an artificial intelligence safety research group at the University of Cambridge’s Department of Engineering. We are part of the Computational and Biological Learning Lab (CBL).

Contact us

For media and academic enquiries, please email contact@kasl.ai