Yeah, the funniest thing about everyone freaking out about Fable's capabilities recently was that for most of the stuff they were amazed by, you could get roughly the same result from DeepSeek Flash.
I used to be obsessed with what's the best model. Then a while back when the new best model came out, I tested it on a task. I also tested its little brother (much smaller model from same company).
They both completed the task perfectly except the "best" model (the bigger one) cost 5x more and took 3x longer...
> They both completed the task perfectly except the "best" model (the bigger one) cost 5x more and took 3x longer...
Same for me, I certainly don't have the same definition of success and failure either.
A more expensive model has *less* rooms for wandering around than a cheaper model.
If Claude wanders around during 10min until finding the most obvious solution, then I count it as a failure.