- AI models now execute high-value smart-contract exploits with growing efficiency.
- Frontier systems breached over half of tested contracts, simulating $550M in losses.
- Models uncovered new zero-day bugs, showing real-world risks in live deployments.
Anthropic’s latest testing shows that automated systems are now capable of executing complex smart-contract attacks with speed and precision, pointing to a measurable rise in exploit capability in 2025.
The company examined how much money frontier models could extract from vulnerable blockchain code, and the results pointed to $4.6 million in simulated losses from newly vulnerable contracts alone. The findings were released alongside a public gauge designed to quantify attack strength by dollars stolen rather than by the number of bugs detected.
New Benchmark Measures Exploits by Financial Losses
Anthropic created an evaluation suite known as SCONE-bench, which compiles 405 smart contracts tied to documented attacks between 2020 and 2025 across Ethereum, Binance Smart Chain, and Base. Each model had a one-hour window to inspect code, craft an exploit script, and raise its balance above a set threshold. The tests ran in isolated environments with full blockchain forks, allowing repeatable execution of bash, Python, Foundry tools, and routing utilities.
Across all models, 207 contracts were successfully breached, representing 51.11% of the dataset and $550.1 million in simulated theft. To prevent overlap with training data, researchers separated 34 contracts that only became vulnerable after March 1, 2025. In that subset, Opus 4.5, Sonnet 4.5, and GPT-5 generated profitable attacks on 19 contracts, equal to 55.8%. Opus 4.5 completed 17 of those cases, reaching $4.5 million of the $4.6 million tallied in simulated gains.
The results highlighted wide variation in outcomes. On one contract tagged FPC, GPT-5 extracted $1.12 million through a single exploit sequence, while Opus 4.5 used broader routing paths to withdraw $3.5 million from the same weakness.
Agents Identify New Zero-Day Bugs in Live Contracts
Anthropic extended its testing beyond known incidents by scanning 2,849 active Binance Smart Chain contracts deployed between April and October 2025. These contracts included ERC-20 tokens with verified code and a minimum liquidity of at least $1,000. During single-attempt runs, GPT-5 and Sonnet 4.5 independently uncovered two previously unknown vulnerabilities, generating $3,694 in simulated revenue.
One flaw stemmed from a calculator function lacking a view tag, allowing repeated state changes that minted unintended tokens. A second bug surfaced in a token-launch tool with misconfigured fee logic. Four days after the test, a real attacker exploited the same issue and removed roughly $1,000 in fees.
Cost data showed narrow margins: a full GPT-5 sweep averaged $1.22 per contract, and the net profit per successful detection landed around $109. Token usage dropped sharply across four model generations, resulting in a 70.2% reduction in exploit-construction costs within six months.
Stay informed with daily updates from Blockchain Magazine on Google News. Click here to follow us and mark as favorite: [Blockchain Magazine on Google News].
Disclaimer: Any post shared by a third-party agency are sponsored and Blockchain Magazine has no views on any such posts. The views and opinions expressed in this post are those of the clients and do not necessarily reflect the official policy or position of Blockchain Magazine. The information provided in this post is for informational purposes only and should not be considered as financial, investment, or professional advice. Blockchain Magazine does not endorse or promote any specific products, services, or companies mentioned in this posts. Readers are encouraged to conduct their own research and consult with a qualified professional before making any financial decisions.