by motbus3 7 hours ago

The only part of the message I think it would be interesting to the author: what if you set two instances to prove each other arguments wrong considering that each reads one of the report as their POV?

I didnt see the full process but I used unet models for tumor detection so I am somewhat familiar with the possible caveats of any evaluation from a engineer perspective.

First, I would like to point that unfortunately, it is not uncommon to go to two different human doctors and also get two unreliable diagnosis and treatment. The biggest problem, in the way people plan to use ai on health is the lack of liability.

A bug on a regular old web site doesn't kill anyway nor cause pain and suffering (most of the times) but misdiagnosis + the fact that a model is very good on presenting arguments even when it is completely wrong.

Claude code, and I am talking about opus 4.8 here, can tell rivers of information about code pattern and develop the poopiest code the next line.

This is a machine that will deliver a sort of templates document based on the input information but it is not exactly doing the work if you don't directly it to do it right constantly.

Because the model isn't thinking I wonder what happens if you set multiple agents to communicate and defend their point with some sort of harsh penalty prompt for not fulfilling its goal. There are some safety system prompts on Claude models that will trigger it to be very carefully to write. Like: you cannot make mistakes. "You need to ensure that it is correct or someone might end up hurt or even dead"

But you would need two agents and a setup to communicate via pipes or files.