Researchclopedia
Research
Researchers
Institutions
Topics
Submit
About
Search...
⌘
K
Command Palette
Search for a command to run...
Back to research
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
2024
31 citations
Preprint
green Open Access
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training | Researchclopedia
Cem Anil
David Duvenaud
Deep Ganguli
Fazl Barez
Jack A. Clark
Kamal Ndousse
Kshitij Sachan
Michael Sellitto
Mrinank Sharma
Nova DasSarma
Roger Grosse
Shauna Kravec
Yuntao Bai
Zachary Witten
Marina Favaro
Jan Brauner
Holden Karnofsky
Paul Christiano
Samuel R. Bowman
Logan Graham
Jared Kaplan
Sören Mindermann
Ryan Greenblatt
Buck Shlegeris
Nicholas Schiefer
Ethan Perez