by sluongng 15 hours ago

Yeah the 8 agents limit aligns well with my conversations with folks in the leading labs

https://open.substack.com/pub/sluongng/p/stages-of-coding-ag...

I think we need much different toolings to go beyond 1 human - 10 agents ratio. And much much different tooling to achieve a higher ratio than that

Scea91 11 hours ago | [-1 more]

I don't think number of parallel agents is the right productivity metric, or at least you need to account for agent efficiency.

Imagine a superhuman agent who does not need to run in endless loops. It could generate 100k line code-base in a few minutes or solve smaller features in seconds.

In a way, the inefficiency is what leads people to parallelism. There is only room for it because the agents are slow, perhaps the more inefficient and slower the individual agents are, the more parallel we can be.

sluongng 24 minutes ago | [-0 more]

Yeah, I don't disagree with your assessment at all. I think the H2A ratio is still a good metric for the AI adoption rate of an organization. At a higher H2A ratio, you will also start to hear people measuring things using token volumes, which I think is also a similar metric (because most models nowadays run on a relatively fixed Tokens/second speed).

All of this is not a direct signal to a productivity boost. I think at higher volumes, you will need to start to account for the "yield" rate of the token volumes above: what are the volumes of tokens that get to the final production deployment? At which stage is it a constraint on the yield? Is it the models, or is it the harness, or something else (i.e. Code Review, CI/CD, Security Scans etc...)? And then it becomes an optimization problem to reduce the Cost of Goods Sold while improving/maintaining Revenues. The "productivity" will then be dissolved into multiple separate but more tangible metrics.

schipperai 14 hours ago | [-1 more]

Few experiments like gas town, the compiler from Anthropic or the browser from Cursor managed to reach the Rocket stage, though in their reports the jagged intelligence of the LLMs was eerily apparent. Do you think we also need better models?

sluongng 13 hours ago | [-0 more]

I do. The reason why the current generation of agents are good at coding is because the labs have sufficient time and computes to generate synthetic chain-of-thoughts data, feed those data through RL before use them to train the LLMs. These distillation takes time, time which starts from the release of the previous generation of models.

So we are just now getting agents which can reliably loop themselves for medium size tasks. This generation opens a new door towards agent-managing-agents chain of thoughts data. I think we would only get multi-agents with high reliability sometimes by the mid to end of 2026, assuming no major geopolitical disruption.