by AceJohnny2 8 hours ago

> There's something incredibly peaceful about being in the hands of an expert you trust. [...] AI can absolutely shatter that feeling in an uncomfortable way [...] but I don't know if I can fully trust AI either.

This really is key. We know we can't trust the AI, but at the same time we're also more comfortable asking the AI for clarifications or confronting it. Not having a time-bound appointment or paying by the hour helps a lot. But even then, more information doesn't necessarily help!

I once brought my 11-year-old car, a Civic with 150k miles, to multiple garages. I figured I'd play the "second opinion" game to correlate what the garages recommended to decide on what needed to be done...

I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started!

The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that.

Aurornis 8 hours ago | [-7 more]

I have multiple LLM subscriptions at any given time, plus an array of local models.

When I ask a question outside of my domain of expertise I like to ask all of the LLMs I have access to. I also create separate sessions and ask the same question multiple ways.

It’s revealing to see how many different and contradictory answers I get, most of which are presented confidently.

The last time I ran a medical question through Claude I couldn’t even get consistent answers between sessions.

It’s also scary how easily you can lead each LLM to the answer you have in mind. When I would start asking questions about different options that other LLMs had presented, each session would drift toward that explanation.

marcus_holmes 16 minutes ago | [-0 more]

In my day job we tried creating a credit assessor tool using LLM as the credit assessor.

It did great, generated a report on the assessed business that was incredibly detailed and plausible.

Then I started running tests and getting into the details, and found that if you ran the same report on the same data, it generated completely different, still very plausible, results. I could run the same source data through the assessment process 10 times and get 10 very different results. We had to can the project and go a different route.

LLMs are designed to produce plausible results, not factual results. We can fix this when using them for software dev by using linters and tests (though we've all had the experience where the LLM invents an API endpoint). I would not trust raw LLM output in any situation where that kind of testing and verification capability isn't present.

Esophagus4 6 hours ago | [-5 more]

Have you ever let the LLMs “discuss” with each other to see if that would give better answers?

You might end up with the answer from the most persuasive LLM, but you might also end up with better results.

Wonder if there is a paper out there on this.

scheme271 6 hours ago | [-2 more]

The problem is how do you know whether the answer is just the most persuasive or actually the most accurate one? It's hard to figure this out without domain knowledge.

Esophagus4 16 minutes ago | [-0 more]

I dunno, I could see it working.

I do something similar with reviewing code: I have one agent write the code and another reviews it, then they go back and forth for a bit improving the code. Seems to yield better results than one agent alone.

Seems like a similar principle.

XorNot 4 hours ago | [-0 more]

Worse is that LLMs are trained to be persuasive by default. The "you're absolutely right..." stereotype is because these things are A/B tested on response quality and we know from studies people reliably rate vibes better then anything else - e.g. while the quality of hospital accomodations likely has some impact on patient outcomes, the view and decor of the room certainly did not fundamentally change the quality of the care provided but it is the largest determinant in how well people rate that care.

cadamsdotcom 5 hours ago | [-1 more]

The problem with trying to write a paper is the results depend on RNG.

NonHyloMorph 5 hours ago | [-0 more]

That doesn't make it differrnt from any other problem measured by statistical significance in averaged over a big enough series of comparisons, no?

john-tells-all 8 hours ago | [-5 more]

There's a big difference between a _puzzle_ and a _mystery_. In a puzzle, the goal state is known, and as more pieces - data - appears, the goal gets closer. You know how far you are from the goal.

A mystery is worse. With each additional piece of data, the goal gets farther away. Everything is more and more confusing.

(Popularized by Malcom Gladwell)

mrlongroots 5 hours ago | [-4 more]

Maybe I am missing something but I just find this wrong.

Everything is a puzzle: there is one "Truth" or one diagnosis. You (a smart human) should be able to converge on it by cross-examining your LLMs. By themselves, they have no interest in revealing this, no stakes, which makes them tools only useful at the hands of a capable investigator.

scheme271 an hour ago | [-0 more]

The problem is that the diagnosis might not be known for a while. There's a few conditions and diseases that require an autopsy for a guaranteed diagnosis and therefore are diagnosis based on symptoms in clinical settings.

Paracompact 5 hours ago | [-2 more]

> You (a smart human) should be able to converge on it by cross-examining your LLMs.

What makes you think this is fundamentally different from cross-examining ELIZA? There is no guarantee that the LLM will help you converge on anything. Indeed actually calling out an LLM on BS tends to eventually produce an "I don't know and can't help you further" answer (as it should).

mrlongroots 5 hours ago | [-0 more]

> There is no guarantee that the LLM will help you converge on anything.

Absolutely. The guarantee does not come from the LLM. The LLM is a simply an improved version of Google Search.

The guarantee can only come from a systemic application of epistemic discipline and reasoning, which is very much (smart) human territory.

Put it another way, I could make good decisions with/without LLMs, with some uncertain diagnostics as input. I would have to trawl through 50 papers myself, and it is possible that my decision arrives 5 years too late as a result. LLMs enable trawling and do some of the legwork in connecting the dots, but are ultimately only as capable as the orchestrating human.

fc417fc802 4 hours ago | [-0 more]

The same goes for a human expert. There's no guarantee of convergence and you could eventually end up at "I don't know".

010101010101 8 hours ago | [-4 more]

> The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that.

I'd argue that AI _can_ currently provide that, but that it can't do it _reliably_, and that to non-experts it's impossible to differentiate, which makes it all the more dangerous.

margorczynski 8 hours ago | [-3 more]

Isn't that the case with human "experts"? If you had encounters with doctors, mechanics, etc. you'll know you can get a completely different diagnosis for the same problem which obviously means (in most cases) that the person you thought an expert is wrong.

What is needed are studies that will take a cold look at the actual results because AI seems to be required to be perfect or it is useless. It just needs to be as good as a human for most stuff, but in the long run it will be much better. At least that what extrapolating current reality shows us.

wwweston 5 hours ago | [-2 more]

We have systems around humans that exist to manage expertise gaps, credibility signals, and accountability. This is part of what makes humans as good as they are, along with specialized training and some measure of meritocratic selection. We license and regulate and account and litigate to make a system that responds and improves.

Some of this might be applicable to LLMs, but some isn’t and much of it would be resisted. This is one reason we’re not likely to get “as good as a human” because at some level we’re not optimizing for the outcomes; we’re optimizing for speed, convenience, some participant’s economics, and underlying beliefs.

malfist 5 hours ago | [-1 more]

I've been going through PT for a hypermobility disorder related injury and I've use an AI to help me figure out "interview questions" to see if a PT knows anything about hypermobility or is willing to learn. I found it helpful to select a new PT after my first PT I trusted made things worse by prescribing stretches and no load progression from rest and recovery back to deadlifts

kerabatsos 3 hours ago | [-0 more]

People put a lot of faith in human “guardrails”, standards, etc. But the same argument could be made that trusting human experts without discernment is as dangerous as trusting AI or Google or whatever other non-human source. It’s always been the case.

Bratmon 7 hours ago | [-4 more]

To provide a competing point of anecdata: A Gemini diagnosis saved me $3,000 in unnecessary repairs on my Civic.

fluidcruft 5 hours ago | [-0 more]

YouTube has saved me at least that much in appliance repairs... and it doesn't even have an AI. It's amazing how valuable access to information can be.

5 hours ago | [-0 more]
[deleted]
ahepp 4 hours ago | [-0 more]

I would love to hear more about this

dyauspitr 6 hours ago | [-0 more]

Saved me $2000 on a koi pond pump and filtration system

ed_elliott_asc 8 hours ago | [-1 more]

The soothing sound of ChatGPT telling us how right and clever we are…how could it possibly hallucinate, certainly not 5.5

nonethewiser 2 hours ago | [-0 more]

You’ve really honed in on the key issue. This is exactly how keen hackers news commenters approach this.

serial_dev 5 hours ago | [-1 more]

These tools can’t reliably fix a 4px misalignment on my icon, better ask them about a medical report… but honestly, I would do the same.

Gigachad 3 hours ago | [-0 more]

Tbh LLMs pulling data out of medical documents in it's training set and searchable online is likely a much easier task than fixing some weird CSS alignment issue.

nonethewiser 2 hours ago | [-0 more]

You only got 3 opinions on your car? Why not 50? You could have found a more useful signal by getting more information.

I get it - getting an opinion from a mechanic is time consuming. Not true of AI though.

UltraSane 3 hours ago | [-0 more]

> There's something incredibly peaceful about being in the hands of an expert you trust

This is the primary business model of enterprise IT and is why companies pay so much for 4 hour disk replacement.