by jfaat 2 hours ago

> but if you only want to use the best model available, it isn't there yet

I'm trying to wrap my head around exactly why so may people seem to want the best model available when it has recently become clear that most halfway decent models can write damn good code for a fraction of the price. And the frontier models get nerfed constantly so you with open weight you can get something slightly less performant but way more stable. Almost like buying a Ferrari for your daily commute instead of a Toyota or even a Mercedes.

I think there are several factors. Certainly marketing making us think we need the shiny thing which is rampant online and very smart people think they aren't susceptible to. There's a lot of really odd 'I trust Anthropic/OpenAI more than Deepseek' which tends to ignore, for starters, that you can run choose your provider and still save a ton. I also think there's some amount of addiction and brand loyalty where a Ferrari is one hell of a drive so that you turn your nose up at that sensible Toyota. Oh the other one I see used is like oh only fable can oneshot updating my embedded systems thing from 1975 to rust which is great but let's recognize how niche that is.

And it ends up just coming across as people are getting SO reliant on the tools so fast. Maybe it's ok to think and like read a few lines of code and work with these agents to convert your thing to rust or center your div. Even if coding is over which in some sense it certainly is, don't turn your mind into the wall-e people yet. I found myself guilty of this so often. It takes way more time and effort to do things via prompt and I wouldn't just open the editor and fix it because that dopamine hit of the magic the abstraction provided was so strong.

So I'm pretty much done using the 'best' (on benchmarks, if money isn't an object, etc etc) models available. After a year on Sonnet/Opus/GPT5x I'm having way better results with open weights models that don't get lobotomized weekly. I'm finding ways to do the crafting part of building software by focusing on honing my harness and workflow. I'm enjoying changing the oil on my Toyota after a year of almost flying off cliffs in my Ferrari and if I can check my ego it's a purely positive thing.

andai an hour ago | [-1 more]

Yeah, the funniest thing about everyone freaking out about Fable's capabilities recently was that for most of the stuff they were amazed by, you could get roughly the same result from DeepSeek Flash.

I used to be obsessed with what's the best model. Then a while back when the new best model came out, I tested it on a task. I also tested its little brother (much smaller model from same company).

They both completed the task perfectly except the "best" model (the bigger one) cost 5x more and took 3x longer...

realusername 35 minutes ago | [-0 more]

> They both completed the task perfectly except the "best" model (the bigger one) cost 5x more and took 3x longer...

Same for me, I certainly don't have the same definition of success and failure either.

A more expensive model has *less* rooms for wandering around than a cheaper model.

If Claude wanders around during 10min until finding the most obvious solution, then I count it as a failure.

cik 36 minutes ago | [-0 more]

I've landed in a similar place by reducing effort and cutting up tasks. I find that more exacting specifications to the models, yield significantly less need for "effort". Combining each with multjple git worktrees and an integration branch for the current worktrees themselves has yielded incresible results.

This also allows me to play with, and mix codex, claude cli, and others. This is my happy spot for the last two months.

ssk42 2 hours ago | [-2 more]

What is your favorite harness for the open weights?

jfaat an hour ago | [-0 more]

We built our own and aren't done open sourcing it but before that I got to a really good place with opencode plus some custom agents, pi family is good too although I haven't used it as much. We made an agent to design a spec, one to implement by dispatching subagents, one to validate against the plan, things like that. All of this helps claude/gpt too IME. For open models it has helped them stay out of loops (e.g. Kimi's but WAIT) and for frontier it helps them stay on task and not invent bloated patterns

NamlchakKhandro 2 hours ago | [-0 more]

pi-mono