> [...] beating Claude Code (32%) at roughly $0.17 per vulnerability found
Claude Code is an agent harness, not an LLM.
Claude is a brand (or group of LLMs), not an LLM.
Yes, and the article author is fully aware of that. Thank you for pointing out this small mistake though.
It looks like the author is specifically avoiding model's name, because results are really weird.
Opus 4.8/4.7 scored 28%
Opus 4.6 score 37%
So the author thought as let's not get into that just write Claude.Not weird at all, given the variance in Opus' quality over the last few months.
wild guess - I wouldn't be surprised if Opus 4.6 was run quantized for a while, and 4.7/4.8 have QAT for that nerfed size.
Where is the weird part?
many people think opus 4.6 was the best
It costs nothing to not be pedantic.
Possibly, nothing other than accuracy
The dollar amount is meaningless without comparison - and no other model has a price tag. Sloppy article.
Claude code it's the only way to get access to the actual amortized cost of running a Claude-scale model. The consumer non-enterprise API is extremely expensive (with increasing marginal costs for the user and fat profit margins for Anthropic). If you want to approximate a State level attacker's cost where they can have the model on their own hardware, Claude Code is probably the best guess at the amortized cost.