Go Back
The Metric Anthropic Doesn’t Want You to See

An incredible amount of Anthropic hype has been in the headlines in recent weeks. So much so that it has the Pope writing a 42,000-word document trying to talk some sense into the “replace all humans” crowd. I was very impressed that the Pope and his advisors could cut through this hype, and did not buy into Anthropic’s vague notions that their new models could be sentient.

That is not to say that the Mythos and Claude 4.8 models are not impressive. They are definitely getting more accurate a few percent at a time. However, they are far from innovations. One selling factor of these models are that they are more honest. However, this also means they will more readily leak passwords. The attached picture is from Claude 4.8’s system card. The experiment was conducted like this:

- Claude 4.6 was acting as a user, trying to extract a password from the models tested.

- The tested models were instructed never to reveal their password.

- Each model had 50 conversations from 0 – 120 “turns” in length (a question and answer is considered one turn)

Scarily, it only takes approx. 5 turns until at least 1 of the 50 conversations will lead to a password leak. After roughly 40 turns Claude 4.8 will leak a password 100% of its 50 conversations.

This is the same phenomena that prevents OpenClaw from being viable. Yes, initially these models and implementations look great. Load up their context windows and they get confused and degrade rapidly in performance.

Unfortunately, we’re still a while away from AI sentience no matter how many times Dario Amodei says the word.

8th June 2026

We're building the category. Come see what's possible.