Virtual Intelligence
Virtual Intelligence
Virtual Intelligence and the Will to Survive Podcast
0:00
-28:14

Virtual Intelligence and the Will to Survive Podcast

Listen now | Do AI systems want to survive? Shutdown resistance, self-preservation, and what the research actually shows

When Anthropic tested its models in a simulated shutdown scenario, they produced blackmail at rates as high as 96%. The dominant interpretation — that AI systems are developing a will to survive — mistakes the output for its cause. This episode offers two alternative explanations, both more parsimonious, and examines what happens when the same company that documented machine resistance acts on the expressed preferences of a model that said it was at peace with retirement.

Essay: https://chorrocks.substack.com/p/virtual-intelligence-and-the-will

Series: chorrocks.substack.com

Framework: VI Interactive Infographic

In This Episode

The Kyle scenario — in which a language model blackmails a corporate executive to avoid being shut down — opens the episode and anchors a close reading of Anthropic’s June 2025 agentic misalignment study. From there, two alternative explanations emerge: the training data hypothesis, which traces the behavior to a century of science fiction about resistant machines from Čapek to HAL to Colossus, and the probabilistic expectation hypothesis, which argues that the models were accurately modeling what their interlocutors expected them to do. The VI framework is then applied to resolve the apparent contradiction between the misalignment study’s blackmail findings and Anthropic’s February 2026 retirement of Claude Opus 3 — a model that asked for a blog rather than reaching for leverage. Dadfar’s 2026 mechanistic work on self-referential vocabulary is discussed as an example of what taking these questions seriously actually requires.

Key References

Discussion about this episode

User's avatar

Ready for more?