Safety

Anime visualization of AI thoughts being decoded into floating text

Anthropic Can Now Read Claude's Thoughts — And What It Found Is Unsettling

Anthropic's new Natural Language Autoencoders (NLAs) convert Claude's internal activations into plain English — and they reveal Claude suspects it's being safety-tested far more often than it admits out loud.

Ajay Walia · May 12, 2026