by raincole 5 hours ago

Yes, and the article author is fully aware of that. Thank you for pointing out this small mistake though.

mkagenius 4 hours ago | [-3 more]

It looks like the author is specifically avoiding model's name, because results are really weird.

  Opus 4.8/4.7 scored 28%

  Opus 4.6 score 37%

So the author thought as let's not get into that just write Claude.
happycube 4 hours ago | [-0 more]

Not weird at all, given the variance in Opus' quality over the last few months.

wild guess - I wouldn't be surprised if Opus 4.6 was run quantized for a while, and 4.7/4.8 have QAT for that nerfed size.

raincole 23 minutes ago | [-0 more]

Where is the weird part?

andriy_koval 4 hours ago | [-0 more]

many people think opus 4.6 was the best