by raincole 5 hours ago
Yes, and the article author is fully aware of that. Thank you for pointing out this small mistake though.
It looks like the author is specifically avoiding model's name, because results are really weird.
Opus 4.8/4.7 scored 28%
Opus 4.6 score 37%
So the author thought as let's not get into that just write Claude.Not weird at all, given the variance in Opus' quality over the last few months.
wild guess - I wouldn't be surprised if Opus 4.6 was run quantized for a while, and 4.7/4.8 have QAT for that nerfed size.
Where is the weird part?
many people think opus 4.6 was the best