HN via remix.js for vilnius.js

by gas9S9zw3P9c 16 hours ago

I'd love to see what is being achieved by these massive parallel agent approaches. If it's so much more productive, where is all the great software that's being built with it? What is the OP building?

Most of what I'm seeing is AI influencers promoting their shovels.

TheCowboy 7 hours ago | [-5 more]

> If it's so much more productive, where is all the great software that's being built with it?

This is such a new and emerging area that I don't understand how this is a constructive comment on any level.

You can be skeptical of the technology in good faith, but I think one shouldn't be against people being curious and engaging in experimentation. A lot of us are actively trying to see what exactly we can build with this, and I'm not an AI influencer by any means. How do we find out without trying?

I still feel like we're still at a "building tools to build tools" stage in multi-agent coding. A lot of interesting projects springing up to see if they can get many agents to effectively coordinate on a project. If anything, it would be useful to understand what failed and why so one can have an informed opinion.

tedeh 7 hours ago | [-3 more]

I don't think it is unreasonable to ask where all the great AI built software is. There has been comments here on HN about people becoming 30 to 50 times more productive than before.

To put a statement like that into perspective (50 times more productive): The first week of the year about as much was accomplished as the whole previous year put together.

theshrike79 2 hours ago | [-0 more]

I haven't made any "great" software ever in my life. With AI or without.

But with AI assistance I've made SO MANY "useful", "handy" and "nifty" tools that I would've never bothered to spend the time on.

Like just last night I had Claude make a shell script on a whim that lets me use fzf to choose a running tmux session - with a preview of what the session's screen looks like.

Could I make it by hand? Yep. Would I have bothered? Most likely no.

Now it got done and iterated on my second monitor while I was watching 21 Bridges on my main monitor and eating snacks. (Chadwick Boseman was great in it)

sofal 6 hours ago | [-0 more]

I'd question your assumption that the software would be "great". I think we're seeing the volume of software increase faster than before. The average quality of the total volume of software will almost certainly decrease. It's not a contradiction for productivity in that respect to increase while quality decreases.

TheCowboy 6 hours ago | [-0 more]

I'm honestly not a big fan of when people throw out numbers implying a high degree of rigor without actually showing me evidence so I can judge for myself. If you're this much more productive, then use some % of that newly discovered productivity to show us.

But building software does tend to come with a lag even with AI. And we're also just more likely to see its influence in existing software first.

I'd rather be asking where it is AND actively trying to explore this space so I have a better grasp of the engineering challenges. I think there's just too many interesting things happening to be able to just wave it off.

mycall 7 hours ago | [-0 more]

The hard part about extracting patterns right now is that they shift every 2-4 months now (was every 6-12 month in 2024-2025). What works for you today might be obsolete in May.

mitjam 2 hours ago | [-0 more]

From personal experience, SW that was developed with agent does not hit the road because:

a) learning and adapting is at first more effort, not less, b) learning with experiments is faster, c) experiencing the acceleration first hand is demoralising, d) distribution/marketing is on an accelerated declining efficiency trajectory (if you want to keep it human-generated) e) maintenance effort is not decelerating as fast as creation effort

Yet, I believe your statement is wrong, in the first place. A lot of new code is created with AI assistance, already and part of the acceleration in AI itself can be attributed to increased use of ai in software engineering (from research to planning to execution).

ecliptik 15 hours ago | [-3 more]

It's for personal use, and I wouldn't call it great software, but I used Claude Code Teams in parallel to create a Fluxbox-compatible window compositor for Wayland [1].

Overall effort was a few days of agentic vibe-coding over a period of about 3 weeks. Would have been faster, but the parallel agents burn though tokens extremely quickly and hit Max plan limits in under an hour.

1. https://github.com/ecliptik/fluxland

choiway 9 hours ago | [-1 more]

This is really cool. Out of curiosity did you know how to do this sort of programming prior to LLMs?

ecliptik 8 hours ago | [-0 more]

Not really, most of my programming experience is for devops/sysadmin scripts with shell/perl. I can read python/ruby from supporting application teams, but never programmed a large project or application with it myself. Last I used C was 25 years ago in some college courses and was never very good with it.

indigodaddy 14 hours ago | [-0 more]

Pretty cool!

fhd2 14 hours ago | [-15 more]

Even if somebody shows you what they've built with it, you're none the wiser. All you'll know is that it seemingly works well enough for a greenfield project.

The jury is still very far out on how agentic development affects mid/long term speed and quality. Those feedback cycles are measured in years, not weeks. If we bother to measure at all.

People in our field generally don't do what they know works, because by and large, nobody really knows, beyond personal experiences, and I guess a critical mass doesn't even really care. We do what we believe works. Programming is a pop culture.

suzzer99 11 hours ago | [-4 more]

Does good design up front matter as much if an AI can refactor in a few hours something that would take a good developer a month? Refactoring is one of those tasks that's tedious, and too non-trivial for automation, but seems perfect for an AI. Especially if you already have all the tests.

qudat 9 hours ago | [-1 more]

I’m constantly using code agents to work on feature development and they are constantly getting things wrong. They can refactor high level concepts but I have to nudge them to think about the proper abstractions. I don’t see how a multiagent flow could handle those interactions. The bus factor is 1, me.

cloverich 3 hours ago | [-0 more]

Try building review skills based on how you review. I built one recently based on how I review some of the concurrent backend stuff one of our tools does. I have it auto-run on every PR. It's great, it catches tons of stuff, and ranks the issues by severity. Over 10 reviews, only 1 false positive (hallucination) and several critical catches. I wish I'd set it up sooner.

Can also after those sessions where they get stuff wrong, ask for an analysis of what it got wrong that session, and produce a ranked list. I just started that and wow, it comes up with pretty solid lists. I'm not sure if its sustainable to simply consolidate and prune it, but maybe it is?

veilrap 11 hours ago | [-0 more]

Upgrades, API compatibility, and cross version communication are really important in some domains. A bad design can cause huge pain downstream when you need to make a change.

Jensson 9 hours ago | [-0 more]

> Especially if you already have all the tests.

Most tests people write have to be changed if you refactor.

briantakita 13 hours ago | [-4 more]

I am now releasing software for projects that have spent years on the back-burner. From my perspective, agent loops have been a success. It makes the impractical pipe-dream doable.

Nadya 12 hours ago | [-0 more]

Yeah, I have a never ending need of things I could easily make myself I I could set aside 7-10 hours to plan it out, develop and troubleshoot but are also low priority enough that they sit on the back burner perpetually.

Now these things are being made. I can justify spending 5-10 minutes on something without being upset if AI can't solve the problem yet.

And if not, I'll try again in 6 months. These aren't time sensitive problems to begin with or they wouldn't be rotting on the back burner in the first place.

sarchertech 12 hours ago | [-2 more]

That’s completely ignoring the point of the person you are responding to. They weren’t talking about small greenfield projects.

briantakita 9 hours ago | [-1 more]

Agent loops also enables the "hard discipline" of making sure all of the tests are written, documentation is up to date, specs are explicitly documented, etc. Stuff that often gets dropped/deprioritized due to time pressure & exhaustion. Gains from automation applies to greenfield & complex legacy projects.

sarchertech 8 hours ago | [-0 more]

Well that’s more on topic as a response to the original poster. Still not really in keeping with the original thread question though of show me the beef.

echelon 13 hours ago | [-4 more]

I'm using Claude Code (loving it) and haven't dipped into the agentic parallel worker stuff yet.

Where does one get started?

How do you manage multiple agents working in parallel on a single project? Surely not the same working directory tree, right? Copies? Different branches / PRs?

You can't use your Claude Code login and have to pay API prices, right? How expensive does it get?

ecliptik 12 hours ago | [-0 more]

Check out Claude Code Team Orchestration [1].

Set an env var and ask to create a team. If you're running in tmux it will take over the session and spawn multiple agents all coordinated through a "manager" agent. Recommend running it sandboxed with skip-dangerous-permissions otherwise it's endless approvals

Churns through tokens extremely quickly, so be mindful of your plan/budget.

1. https://code.claude.com/docs/en/agent-teams

bernardom 11 hours ago | [-2 more]

git checkout four copies of your repo (repo, repo_2, repo_3, repo_4) within each one open claude code Works pretty well! With the $100 subscription I usually don't get limited in a day. A lot of thinking needs to go into giving it the right context (markdown specs in repo works for us)

Obv, work on things that don't affect each other, otherwise you'll be asking them to look across PRs and that's messy.

cronin101 9 hours ago | [-1 more]

Look into git worktrees and thank me later!

tbcj 7 hours ago | [-0 more]

Agreed here’s a blog I read where I found out about Claude’s support https://jamesanglin.com/blog/claude-code-worktrees

jjmarr 11 hours ago | [-4 more]

I just avoided $1.8 million/year in review time w/ parallel agents for a code review workflow.

We have 500+ custom rules that are context sensitive because I work on a large and performance sensitive C++ codebase with cooperative multitasking. Many things that are good are non-intuitive and commercial code review tools don't get 100% coverage of the rules. This took a lot of senior engineering time to review.

Anyways, I set up a massive parallel agent infrastructure in CI that chunks the review guidelines into tickets, adds to a queue, and has agents spit up GitHub code review comments. Then a manager agent validates the comments/suggestions using scripts and posts the review. Since these are coding agents they can autonomously gather context or run code to validate their suggestions.

Instantly reduced mean time to merge by 20% in an A/B test. Assuming 50% of time on review, my org would've needed 285 more review hours a week for the same effect. Super high signal as well, it catches far more than any human can and never gets tired.

Likewise, we can scale this to any arbitrary review task, so I'm looking at adding benchmarking and performance tuning suggestions for menial profiling tasks like "what data structure should I use".

sarchertech 7 hours ago | [-3 more]

>$1.8 million

That sounds like a completely made up bullshit number that a junior engineer would put on a resume. There’s absolutely no way you have enough data to state that with anything approaching the confidence you just did.

jjmarr 7 hours ago | [-2 more]

It's definitely a resume number I calculated as a junior engineer. Feel free to give feedback on my math.

It is based on $125/hr and it assumes review time is inversely proportional to number of review hours.

Then time to merge can be modelled as

T_total = T_fixed + T_review

where fixed time is stuff like CI. For the sake of this T_fixed = T_review i.e. 50% of time is spent in review. (If 100% of time is spent in review it's more like $800k so I'm being optimistic)

T_review is proportional to 1/(review hours).

We know the T_total has been reduced by 23.4% in an A/B test, roughly, due to this AI tool, so I calculate how much equivalent human reviewer time would've been needed to get the same result under the above assumptions. This creates the following system of equations:

T_total_new = T_fixed + T_review_new

T_total_new = T_total * (1 - r)

where r = 23.4%. This simplifies to:

T_review_new = T_review - r * T_total

since T_review / T_review_new = capacity_new / capacity_old (because inverse proportionality assumption). Call this capacity ratio `d`. Then d simplifies to:

d = 1/(1 - r/(T_review/T_total))

T_review/T_total is % of total review time spent on PR, so we call that `a` and get the expression:

d = 1 / (1 - r/a)

Then at 50% of total time spent on review a=0.5 and r = 0.234 as stated. Then capacity ratio is calculated at:

d ≈ 1.8797

Likewise, we have like 40 reviewers devoting 20% of a 40 hr workweek giving us 320 hours. Multiply by original d and get roughly 281.504 hours of additional time or $31588/week which over 52 weeks is little over $1.8 million/year.

Ofc I think we cost more than $125 once you consider health insurance and all that, likewise our reviewers are probably not doing 20% of their time consistently, but all of those would make my dollar value higher.

The most optimistic assumption I made is 50% of time spent on review.

sarchertech 6 hours ago | [-1 more]

The feedback is don’t put it on a resume because it looks ridiculous. I can almost guarantee you that an A/B test design wasn’t rigorous enough for you to be that confident in your numbers.

But even if that is correct you need a much longer time frame to tell if reviews using this new tool are equivalent as a quality control measure.

And you have so many assumptions built in to this that are your number is worthless. You aren’t controlling for all the variables you need to control for. How do you know that workers spend 8 hours a week on reviews vs spending 2 hours and slacking off the other 6 hours? How do you know that the change of process created by using this tool doesn’t just cause the reviewers to work harder, but they’ll stop doing that once the novelty wears off? What if reviewers start relying on this tool to catch a certain class of errors for which it has low sensitivity?

It’s also a moot point if they don’t actually end up saving the money you say they will. It could be that all the savings is eaten up because of the reviewers just use the extra time to dick around on hacker news. It could just be that people aren’t able to make productive use of their time saved. Maybe they were already maxing out their time doing other useful activities.

All of this screams junior engineer took very limited results and extrapolated to say “saved the company millions” without nearly enough supporting evidence. Run your tool for 6 months, take an actual business outcome like time to merge PRs, measure that, and put that on your resume.

It’s incredibly common for a junior engineer to create some new tooling, and come up with some numbers to justify how this new tooling saves the company millions in labor. I have never once seen these “savings” actually pan out.

jjmarr 5 hours ago | [-0 more]

I took it off LinkedIn and replaced with time to merge reduction of 20% over two weeks of PRs (rounding down). I expect to justify the expenditure to non-technical managers in my current role, which is why I picked $s.

> All of this screams junior engineer took very limited results and extrapolated to say “saved the company millions” without nearly enough supporting evidence.

That's what the only person in my major who got a job at FAANG in California did, which is why I borrowed the strategy since it seems to work.

> I can almost guarantee you that an A/B test design wasn’t rigorous enough for you to be that confident in your numbers.

Shoot me an email about methodology! It's my username at gmail. I'd be happy to get more mentorship about more rigorous strategies and I can respond to concerns in less of a PR voice.

vishnugupta 6 hours ago | [-0 more]

> great software

Most software is mundane run of the mill CRUD feature set. Just yesterday I rolled out 5 new web pages and revamped a landing page in under an hour that would have easily taken 3-4 days of back and forth.

There are lot of similar coding happening.

This is the space AI coding truly shines. Repetitive work, all the wiring and routing around adding links, SEO elements and what not.

Either way, you can try to incorporate AI coding in your coding flow and where it takes.

schipperai 15 hours ago | [-0 more]

I work for Snowflake and the code I'm building is internal. I'm exploring open sourcing my main project which I built with this system. I'd love to share it one day!

onion2k 12 hours ago | [-0 more]

I'm experimenting with building an agent swarm to take a very large existing app that's been built over the past two decades (internal to the company I work for) and reverse engineer documentation from the code so I can then use that documentation as the basis for my teams to refactor big chunks of old-no-longer-owned-by-anyone features and to build new features using AI better. The initial work to just build a large-scale understanding of exactly what we actually run in prod is a massively parallelizable task that should be a good fit for some documentation writing agents. Early days but so far my experiments seem to be working out.

Obviously no users will see a benefit directly but I reckon it'll speed up delivery of code a lot.

conception 15 hours ago | [-3 more]

People are building software for themselves.

jvanderbot 15 hours ago | [-1 more]

Correct. I've started recording what I've built (here https://jodavaho.io/posts/dev-what-have-i-wrought.html ), and it's 90% for myself.

The long tail of deployable software always strikes at some point, and monetization is not the first thing I think of when I look at my personal backlog.

I also am a tmux+claude enjoyer, highly recommended.

digitalbase 14 hours ago | [-0 more]

tmux too.

Trying workmux with claude. Really cool combo

hinkley 15 hours ago | [-0 more]

I’ve known too many developers and seen their half-assed definition of Done-Done.

I actually had a manager once who would say Done-Done-Done. He’s clearly seen some shit too.

haolez 16 hours ago | [-0 more]

The influencers generate noise, but the progress is still there. The real productivity gains will start showing up at market scale eventually.

linsomniac 15 hours ago | [-4 more]

In my view, these agent teams have really only become mainstream in the last ~3 weeks since Claude Code released them. Before that they were out there but were much more niche, like in Factory or Ralphie Wiggum.

There is a component to this that keeps a lot of the software being built with these tools underground: There are a lot of very vocal people who are quick with downvotes and criticisms about things that have been built with the AI tooling, which wouldn't have been applied to the same result (or even poorer result) if generated by human.

This is largely why I haven't released one of the tools I've built for internal use: an easy status dashboard for operations people.

Things I've done with agent teams: Added a first-class ZFS backend to ganeti, rebuilt our "icebreaker" app that we use internally (largely to add special effects and make it more fun), built a "filesystem swiss army knife" for Ansible, converted a Lambda function that does image manipulation and watermarking from Pillow to pyvips, also had it build versions of it in go, rust, and zig for comparison sake, build tooling for regenerating our cache of watermarked images using new branding, have it connect to a pair of MS SQL test servers and identify why logshipping was broken between them, build an Ansible playbook to deploy a new AWS account, make a web app that does a simple video poker app (demo to show the local users group, someone there was asking how to get started with AI), having it brainstorm and build 3 versions of a crossword-themed daily puzzle (just to see what it'd come up with, my wife and I are enjoying TiledWords and I wanted to see what AI would come up with).

Those are the most memorable things I've used the agent teams to build in the last 3 weeks. Many of those things are internal tools or just toys, as another reply said. Some of those are publicly released or in progress for release. Most of these are in addition to my normal work, rather than as a part of it.

schipperai 14 hours ago | [-1 more]

Further, my POV is that coding agents crossed a chasm only last December with Opus 4.5 release. Only since then these kinds of agent teams setups actually work. It’s early days for agent orchestration

13 hours ago | [-0 more]

[deleted]

gooob 11 hours ago | [-1 more]

can you tell us about this "ansible filesystem swiss army knife"?

linsomniac 8 hours ago | [-0 more]

I'd be happy to! I find in my playbooks that it is fairly cumbersome to set up files and related because of the module distinction between copying files, rendering templates, directories... There's a lot of boilerplate that has to be repeated.

For 3-4 years I've been toying with this in various forms. The idea is a "fsbuilder" module that make a task that logically groups filesystem setup (as opposed to grouping by operation as the ansible.builtin modules do).

You set up in the main part of the task the defaults (mode, owner/group, etc), then in your "loop" you list the fs components and any necessary overrides for the defaults. The simplest could for example be:

    - name: Set up app config
      linsomniac.fsbuilder.fsbuilder:
        dest: /etc/myapp.conf

Which defaults to a template with the source of "myapp.conf.j2". But you can also do more complex things like:

    - name: Deploy myapp - comprehensive example with loop
      linsomniac.fsbuilder.fsbuilder:
        owner: root
        group: myapp
        mode: a=rX,u+w
      loop:
        - dest: /etc/myapp/conf.d
          state: directory
        - dest: /etc/myapp/config.ini
          validate: "myapp --check-config %s"
          backup: true
          notify: Restart myapp
        - dest: /etc/myapp/version.txt
          content: "version={{ app_version }}"
        - dest: "/etc/myapp/passwd"
          group: secrets

I am using this extensively in our infrastructure and run ~20 runs a day, so it's fairly well tested.

More information at: https://galaxy.ansible.com/ui/repo/published/linsomniac/fsbu...

CuriouslyC 8 hours ago | [-0 more]

You're not wrong. The current bottleneck is validation. If you use orchestration to ship faster, you have less time to validate what you're building, and the quality goes down.

If you have a really big test suite to build against, you can do more, but we're still a ways off from dark software factories being viable. I guessed ~3 years back in mid 2025 and people thought I was crazy at the time, but I think it's a safe time frame.

hombre_fatal 6 hours ago | [-0 more]

There’s so much more iOS apps being published that it takes a week to get a dev account, review times are longer, and app volume is way up. It’s not really a thing you’re going to notice or not if you’re just going by vibes.

Reebz 11 hours ago | [-0 more]

People are building for themselves. However I’d also reference www.Every.to

They built the popular compound-engineering plugin and have shipped a set of production grade consumer apps. They offer a monthly subscription and keep adding to that subscription by shipping more tools.

verdverm 16 hours ago | [-1 more]

There are dozens and dozens of these submitted to Show HN, though increasingly without the title prefix now. This one doesn't seem any more interesting than the others.

schipperai 15 hours ago | [-0 more]

I picked up a number things from others sharing their setup. While I agree some aspects of these are repetitive (like using md files for planning), I do find useful things here and there.

calvinmorrison 14 hours ago | [-0 more]

I wrote a Cash flow tracking finance app in Qt6 using claude and have been using it since Jan 1 to replace my old spreadsheets!

https://git.ceux.org/cashflow.git/

calvinmorrison 14 hours ago | [-2 more]

I built a Erlang based chat server implementing a JMAP extension that Claude wrote the RFC and then wrote the server for

mrorigo 14 hours ago | [-1 more]

Erlang FTW. I remember the days at the ol' lab!

calvinmorrison 14 hours ago | [-0 more]

i have no use for it at my work, i wish i did, so i did this project for run intead.

karel-3d 14 hours ago | [-0 more]

look at Show HN. Half of it is vibe-coded now.