HN via remix.js for vilnius.js

by skeptic_ai 3 hours ago

Why Deepseek v4 flash is better than pro in your benchmarks?

It's 100% due to tool use -- Flash adapts much better to our custom harness with tool names that are not identical to what models were likely trained on. DeepSeek V4 Pro performs much worse in that aspect than almost all other recent releases, for whatever reason.

rockwotj 3 hours ago | [-5 more]

I have also found deepseek flash beat pro in some of my own internal evals for tasklet.ai it’s really surprising and I don’t understand it

xbmcuser 39 minutes ago | [-0 more]

maybe they distilled claude for the flash version and not for the other hence better tool use and programming benchmarks

freakynit 2 hours ago | [-0 more]

Same.. although rare, but have observed twice till date.

Some blog post I read few weeks back said that DSV4Flash in xHigh effort beats even the pro model in xHigh effort.

onoesworkacct 2 hours ago | [-2 more]

The rumour is that it's trained on Opus, but who knows

rockwotj 2 hours ago | [-1 more]

Oh of course all deepseek and glm are. Multiple people have seen GLM self report that it is claude, which makes it super obvious.

I think the surprising thing is I expect flash to be a pure distillation and strictly worse quality but clearly it’s more nuanced than that.

kennywinker 2 hours ago | [-0 more]

Claude claims to be deepseek, under some circumstances:

https://www.reddit.com/r/DeepSeek/comments/1rd5jw7/claude_so...