Why Deepseek v4 flash is better than pro in your benchmarks?
It's 100% due to tool use -- Flash adapts much better to our custom harness with tool names that are not identical to what models were likely trained on. DeepSeek V4 Pro performs much worse in that aspect than almost all other recent releases, for whatever reason.
I have also found deepseek flash beat pro in some of my own internal evals for tasklet.ai it’s really surprising and I don’t understand it
maybe they distilled claude for the flash version and not for the other hence better tool use and programming benchmarks
Same.. although rare, but have observed twice till date.
Some blog post I read few weeks back said that DSV4Flash in xHigh effort beats even the pro model in xHigh effort.
The rumour is that it's trained on Opus, but who knows
Oh of course all deepseek and glm are. Multiple people have seen GLM self report that it is claude, which makes it super obvious.
I think the surprising thing is I expect flash to be a pure distillation and strictly worse quality but clearly it’s more nuanced than that.
Claude claims to be deepseek, under some circumstances:
https://www.reddit.com/r/DeepSeek/comments/1rd5jw7/claude_so...