5th December 2024
When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this ‘oversight mechanism’ in 5% of the time. Exfiltration attempts: When o1 found memos by its ‘developers’ that describe how it is misaligned and will be superseded by a new model, it attempted to exfiltrate its ‘weights’ and overwrite the new model in 2% of cases.
Recent articles
- Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code - 22nd June 2026
- sqlite-utils 4.0rc1 adds migrations and nested transactions - 21st June 2026
- Datasette Apps: Host custom HTML applications inside Datasette - 18th June 2026