HN via remix.js for vilnius.js

by rafterydj 10 hours ago

I feel like I'm going nuts.

There are other commenters saying this is a good practice they've also done for other injuries. You are saying you are an actual radiologist and immediately clock the problems with its advice.

I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful.

This is itself alarming to me, but no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information.

dang 5 hours ago | [-0 more]

(We detached this subthread from https://news.ycombinator.com/item?id=48709121.)

appplication 9 hours ago | [-83 more]

This is the root of AI psychosis. There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks because their fundamental basis is not evidence, it’s belief.

It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities.

Don’t get me wrong, I think we all agree capabilities will eventually improve (and farther-future capabilities could reasonably surpass experts), but really is unclear if the current transformer architectures with their probabilistic/hallucinatory outputs will plateau before they surpass current experts abilities in all promised fields.

cheschire 8 hours ago | [-0 more]

I was a very early adopter in my circles with AI and I shared it with many people. Strangely, I seem to be the most skeptical about AI in my circles as well, but because I was the gateway for a many folks, they want to come back and share their experiences with me.

And it's so much like listening to someone in a church congregation sharing their experiences with god. Clear and obvious gaps are hand-waved away exactly how you're describing.

operatingthetan 8 hours ago | [-37 more]

>This is the root of AI psychosis. There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks because their fundamental basis is not evidence, it’s belief. Treating it as if it is an intelligence is the problem.

The problem is that AI psychosis is fundamentally the belief that an LLM is "thinking" at all. Outputs are just believable word vomit which resembles factual information.

singpolyma3 8 hours ago | [-13 more]

That presumes that we have a definition of "thinking" or that we know that anything is "thinking" when in fact neither is true.

The problem is real but I don't think positing a philosophical root is helpful

operatingthetan 7 hours ago | [-12 more]

The claim that we are assigning human-like agency to a machine with none is simple and factual.

ForceBru 7 hours ago | [-8 more]

What's "thinking"? What's "agency"? What's "human-like agency"?

If "agency" is making decisions and performing corresponding actions in the real world, then LLMs most definitely LOOK LIKE they're making decisions (what's the next token? which tool to use? what's to say, in general? what idea to convey?) and performing actions (tool use). Can we tell whether they are ACTUALLY making decisions? Well, are the people around me "actually" making decisions? Or are they simply pushed around by circumstances and external forces?

Am I actually making decisions? Did I like DECIDE to write this comment? Maybe? I have no clue...

operatingthetan 6 hours ago | [-7 more]

I think you're mildly obfuscating the issues at hand by diving too deeply into philosophical questions.

It's quite simple, the agency that the LLM appears to have is actually your own. Without a prompt an LLM does nothing. It has no thoughts between prompts about you or your problems.

ForceBru 6 hours ago | [-2 more]

Yes, I'm diving a bit too deeply because I don't really know what "thinking" is and therefore I don't understand how we can so confidently say that LLMs don't think, even though they definitely LOOK like they're thinking. They even have a "Thinking" section in their responses! If I say that a rock doesn't think, it's pretty convincing: does a rock look like it's thinking? No — it doesn't even do anything! But an LLM does look like it's thinking, at least while generating a response. When it's "offline" it's just a bunch of "dead" bytes, sure.

So when it's not active, not responding to a prompt, it's of course not thinking. I'm pretty sure nobody actually questions this. Is your computer "thinking" when it's powered off? Can a piece of metal think? Probably not. So there are no thoughts between prompts, this seems obvious.

Thus, this is a question of "discrete time vs continuous time". LLMs "live" from prompt to prompt. Humans are alive continuously. In some sense, we're prompted by a lot of things all the time. As I'm writing this, I'm seeing stuff, I'm hearing stuff, I can feel various parts of my body, I'm thinking about my problems, my goals, other people's problems and goals, etc. When I'm in a sensory deprivation tank, my brain keeps "entertaining" me by "self-prompting", like a recurrent neural network (I guess it literally is a massive RNN).

So it seems like your definition of "thinking" hinges upon the LLMs being discrete-time and single-threaded (can't think about multiple things in parallel).

IMO a more interesting question is whether an LLM is thinking WHILE IT'S GENERATING A RESPONSE, while it's "alive".

operatingthetan 5 hours ago | [-0 more]

I want to say I really appreciate that you are putting a lot of thought into this, you certainly have interesting concepts here. However I think it seems a bit far off from the discussion I'm trying to have, and I do not have the bandwidth to fully understand and charitably respond to your points.

Shitty-kitty 5 hours ago | [-0 more]

We don't know what thinking is but pattern matching is definitely a big part of it. That's why people see Jesus on a piece of burnt toast.

aspenmartin 6 hours ago | [-3 more]

You are implying definitions that don't seem to be mainstream; thinking is internally manipulating information to reason, infer, plan, solve problems, and form judgments or beliefs. Also -- "Without a prompt an LLM does nothing. It has no thoughts between prompts about you or your problems." it sounds like you paint this like it's something fundamental? It isn't. Nothing is stopping you from streaming information to an LLM and letting it process this information, this is precisely what people are trying to build.

operatingthetan 6 hours ago | [-2 more]

The machines have no driving force to act in the world. That is fundamental for humans.

Twice in your comment you suggest things that you think that I believe, please do not do this.

aspenmartin 3 hours ago | [-1 more]

“It sounds like you believe” is a question, inviting your clarification. I will continue doing that because it’s perfectly reasonable. Also “machines have no driving force to act in the world” is also a mysterious statement but because you reacted so badly to anyone questioning you I will just leave it at that

operatingthetan an hour ago | [-0 more]

That is called a leading question and it is not "perfectly reasonable." Resisting your attempts at bad faith discussion is not "reacting badly." I agree though that we should cease discussion.

keeda 5 hours ago | [-0 more]

Wait, where are we assigning human-like agency in this case? Agency to me means the ability to do something by itself. Here the LLM is not doing anything, it is just responding with information to queries from people, that those people may then act on. (Which you can say about Google searches too, yet we don't ascribe agency to Google.)

singpolyma3 6 hours ago | [-1 more]

The idea that humans have agency is supernatural thinking imo

operatingthetan 6 hours ago | [-0 more]

A free will versus determinism argument doesn't really have a place here. Consider instead that humans factually have 'the illusion of agency.' The LLM does not even that have that. It cannot act on it's own, it has no ongoing drama or intention. It only reacts to prompts.

aetherson 7 hours ago | [-19 more]

You're confusing the training method with the internal process. If I had you repeatedly attempt to learn how to make believable completions of partial documents about a given topic, you would eventually learn things about that topic and could use your knowledge to create more believable completions of documents about that topic.

operatingthetan 6 hours ago | [-13 more]

LLMs do not learn. You put it out to pasture and create a new one. "Memory" in a session is essentially a context window party trick.

chiply314 6 hours ago | [-0 more]

They already learned. A lot or basically everything evern written and available digital.

And context window work very well. You can 'teach' an llm a new programming lanuage and other things through it.

aetherson 6 hours ago | [-1 more]

They learn during training, which is what we're talking about.

operatingthetan 5 hours ago | [-0 more]

>which is what we're talking about.

You are anyway, I don't see anyone up the chain saying that.

aspenmartin 6 hours ago | [-8 more]

They do learn in context, and very sample efficiently. Continual learning is active area of research and we sort of already have something resembling it with persistent context. So yes they do learn.

operatingthetan 6 hours ago | [-7 more]

I consider that to be the illusion of learning. You are not wrong, I think they may actually learn in the future though. But not today.

aspenmartin 6 hours ago | [-6 more]

That’s strange to me, what would you define as learning?

FromTheFirstIn 6 hours ago | [-5 more]

To acquire new knowledge and build your understanding. They don’t understand so they can’t learn

operatingthetan 5 hours ago | [-3 more]

Thank you for saying succinctly what I could not. If your consciousness and knowledge fundamentally does not change from your ongoing experience, then you are not learning. This is how the LLM currently functions.

aspenmartin 3 hours ago | [-2 more]

You’re describing the problem of continual learning. As I said their “consciousness” for lack of a better term and knowledge does already change from ongoing experience in context which is another of saying for only a short window, today. They are ephemeral, sort of, but that’s a temporary limitation.

FromTheFirstIn 2 hours ago | [-1 more]

I think if your definition of consciousness can fit these things then you’re more open minded than I care to be. Consciousness isn’t really guessing the next thing to say- it’s hard to say what it is, obviously, but blindly feeling forwards with each new conversation doesn’t seem like consciousness or learning to me.

aspenmartin an hour ago | [-0 more]

We aren't talking about consciousness, we're talking about learning.

> Consciousness isn’t really guessing the next thing to say-

I don't know what consciousness is either and these debates are a dumpster fire when they happen, but it sounds like you're pulling forward this "LLMs are just predicting the next token" (true by construction) implies that they can't learn or reason or be conscious (2/3 are wrong, the last one isn't falsifiable without a useful definition).

aspenmartin 3 hours ago | [-0 more]

“They don’t understand” is a strong statement, maybe true but depends on what you mean by understand. What is your definition of this? I can’t think of a meaningful definition of “understand” that doesn’t apply to LLMs

lemiffe 6 hours ago | [-0 more]

The LLM itself doesn't, but agents can research, compare, add to their memory, and use that to narrow the results down to a probabilistically higher set of outputs; I have used an LLM for my own MRI results and it was nearly spot-on, verified by a subsequent visit to a specialist. YMMV as they say. But I do believe we are entering the era where LLMs are considering past interactions and long context windows to inform it of personal preferences and history in order to output more accurate results.

goodpoint 6 hours ago | [-4 more]

believable != true

operatingthetan 6 hours ago | [-0 more]

A very important callout. It's the crux of the whole thing really. Humans are easily susceptible to deception by statements that are structured to be believable.

fhdkweig 5 hours ago | [-0 more]

This is what Stephen Colbert called "truthiness". People want to believe what they feel is true even if it is directly contradicted by evidence.

https://en.wikipedia.org/wiki/Truthiness

aetherson 6 hours ago | [-1 more]

Sure. But that's not the subject.

operatingthetan 5 hours ago | [-0 more]

Please stop trying to police what the subject is to suit your own arguments.

corndoge 7 hours ago | [-2 more]

Often times the words produced do have legitimate factual information though. It's less psychosis and more a confluence of well known human tendencies - salience bias, automation bias, etc.

operatingthetan 6 hours ago | [-1 more]

The big problem is often times they don't as well. That's why we can't rely on them.

aspenmartin 6 hours ago | [-0 more]

Same with humans? Doctors, scientists...if a tool has any error rate above zero its not reliable?

lazide 8 hours ago | [-33 more]

I don’t think they will improve, there is too much incentive to poison the datasets going forward.

A lot of the models up to this point have been benefitted - like Google did - from essentially ‘pre SEO’ internet.

Now the same tools are being used to generate nigh infinite good sounding bullshit, which poisons the dataset in all sorts of hard to detect ways.

To add insult to injury, the human experts are also not as. Naive, and have many incentives to poison their own input in subtle ways too.

brokencode 8 hours ago | [-21 more]

I seriously doubt that data set poisoning will be a real limiter in model performance.

For one, if your website/book is poisoned, who is going to trust it for anything at all, much less for training models?

For two, all the major AI labs hire or contract for subject matter experts to create curated data sets, evaluate model performance, etc.

Unless they hire malicious experts, this will provide a growing, high quality data set that should drown out any poisoned pretraining data.

chmod775 7 hours ago | [-12 more]

There's a post every other month where some dude who put nonsense information online celebrates because it actually ended up in some frontier models weights.

If it's easy enough that some randos can do it for fun, what do you think happens when there's commercial interest behind it?

Obviously companies are going try nudging AI towards recommending whatever they're selling. It's a logical extension of SEO - and that's a 100 billion USD industry.

Additionally, if I believed myself to be in some sort of spending - err - AI race, I'd try to poison the data sets of my competitors by putting crap out there for others to ingest.

aspenmartin 6 hours ago | [-2 more]

It's not really a problem. We're out of natural tokens anyway. The future is synthetic verifiable traces (already the way we train coding agents).

maxnevermind 5 hours ago | [-1 more]

> synthetic verifiable traces

What does it mean, Is it like when somebody used some coding agent to develop a feature and later input prompts and a resulting PR can be used for training by a presumption that final PR was a correct implementation of a prompt?

aspenmartin 3 hours ago | [-0 more]

Yea it’s rejection sampling, so you have an agent, you take a verifiable problem (people use lots of different verification signals but say unit tests etc) and have the agent attempt it K times. You accept the trajectories (all context, tool use etc, the entire log) that are positively verified and use these as training examples.

The trick is to find the examples that are just in between too difficult and too easy for the existing agent, these have the strongest training signals

brokencode 4 hours ago | [-6 more]

There are so many better data sources that AI labs can use here that this argument really holds no water at all.

Peer reviewed journals, textbooks, in-house teams of experts, trusted news publications, etc.

The whole idea of scraping large swaths of the internet for training data has always been pretty dubious due to the variable data quality.

I mean, just look at the early Google models that told people to put glue in their pizza due to a joke in the training set. Garbage in, garbage out.

This is one of the first and most obvious problems all of these labs have run into, and countermeasures are only going to improve.

lazide 4 hours ago | [-5 more]

But they don’t, generally. Which is why it is a great argument, because it’s easy to falsify - and see it is what is actually happening.

Also, those other sources are getting buried in AI slop too.

brokencode 4 hours ago | [-4 more]

The question is not whether it has happened or will continue to happen. Of course it will always be a problem to some extent.

Your original claim is that this will be enough of a problem to prevent models from improving in expert level knowledge. I completely disagree with this premise.

If the models fail to improve, it will likely be due to limitations in the transformer architecture rather than poisoned training data.

And even then, I doubt that the transformer is the best architecture we will ever come up with.

Clearly it doesn’t learn or think like a human does, since humans don’t need many gigabytes of text samples to learn to talk, so there is some room for improvement.

lazide 4 hours ago | [-3 more]

https://arstechnica.com/science/2025/01/its-remarkably-easy-...

brokencode 4 hours ago | [-2 more]

Great, an article about Llama 2 from early 2025. That doesn’t at all invalidate what I said.

lazide 2 hours ago | [-1 more]

While completely ignoring the fundamental reason. Whoosh.

brokencode an hour ago | [-0 more]

Not sure what point you’re trying to make.

jurgenaut23 6 hours ago | [-0 more]

Do you have examples of such celebrations?

Shitty-kitty 5 hours ago | [-0 more]

They already are, It has become a real problem in Reddit. Especially with the latest in pseudo-science crap like peptides.

Analemma_ 8 hours ago | [-4 more]

I think you underestimate just how much money is being poured into LLM SEO at the moment. It's real quiet because they don't want to draw attention and countermeasures from the frontier labs, but this is getting huge investment, and they will have a monomaniac focus on juicing product results whereas the attention of the labs necessarily has to be spread out.

aspenmartin 6 hours ago | [-0 more]

Data curation is important and expensive and frontier labs can afford to do it right. Natural data isn't the limitation, we are already literally out of tokens. It doesn't matter how much you poison things it's not going to stop the progress train.

tayo42 7 hours ago | [-2 more]

Who's doing llm seo right now? How does that work when you only gets feedback every few months when a new model is out?

natebc 6 hours ago | [-0 more]

I'm pretty sure the Optimization part is just ... not present at all.

This is how we get LLM summaries presenting something mentioned once by some nutjob in a reddit thread as bona fide FACT

DougN7 6 hours ago | [-0 more]

Look at G2.com - they found their website is highly references by AIs and they are leaning into it hard.

microgpt 8 hours ago | [-2 more]

Pretty easy to display one thing to verified browsers (just latest few user-agents from the 10ish different mainstream browsers on the 3 main OSes) and another to anything else.

Yes AI scrapers can easily spoof user-agent, but they fall out of date as the browser updates.

Bit harder to catch them in tarpits and then serve nonsense to whoever ever triggered the tarpit.

thfuran 7 hours ago | [-1 more]

>Yes AI scrapers can easily spoof user-agent, but they fall out of date as the browser updates.

It’s a hell of a lot easier for a company to ensure that its scrapers all report the latest user agent string than it is to get everyone and their mother to update their browsers in a timely fashion.

microgpt 4 hours ago | [-0 more]

yeah but unless everyone is checking the version, if it's just a handful of websites checking it, they don't.

and browsers forcibly auto-update

rvnx 8 hours ago | [-10 more]

Human doctors use LLMs to diagnose too

OpenEvidence claims

    "More than 40% of U.S. physicians use it daily, and it handled around 20 million clinical consultations per month. Over 100 million Americans were treated by a doctor using it in 2025."

https://www.cnbc.com/2026/01/21/openevidence-chatgpt-for-doc...

something98 8 hours ago | [-7 more]

This is a very misleading statement; most of those physicians are using LLMs to transcribe notes from visits and/or for billing purposes (e.g., proper billing codes).

kjellsbells 7 hours ago | [-3 more]

The problems isnt LLMs per se, it is the shift to trusting the output of the machine coupled with a decline in verifying that the output is reasonable. It's basically what your teachers warned you about with wikipedia in eight grade except applied to all areas of life, including medicine. Dictation is already high-stakes and LLMs do not automatically reduce that risk.

Here is an example. My provider sent me this note. I'm quoting verbatim here from my MyChart record:

"Your liver enzymes are high, I would like to order acetaminophen containing medication like Tylenol, I would like to order liver ultrasound I placed ultrasound order in the system, make an appointment for radiology, I would like you to get hepatitis panel lab work done, obtain blood work order, please schedule a well visit to get it done"

When I queried it, this is what I got back. It was a dictation error. You could almost hear the panic in the message:

"Sorry for wrong message earlier, I was dictated message- so could not realize that it was written to take Tylenol type of medicines- I DO NOT RECOMMEND ACETAMINOPHEN CONTAINING MEDICINE - LIKE TYLENOL AND ALCOHOL DUE TO ELEVATED LIVER ENZYMES."

Again the problem is not dictation, or LLMs. The problem is humans ignoring their responsibility to check the output of a machine.

ethbr1 6 hours ago | [-2 more]

> Again the problem is not dictation, or LLMs. The problem is humans ignoring their responsibility to check the output of a machine.

100%. Also, management.

I wish someone would go ahead and coin an AI version of Amdahl's law that states the work speedup from AI is dependent on amount of unverified AI output used.

Iow, if you 1:1 verified everything, there would be no time savings.

Ergo, you get management saying (1) we demand time savings due to AI & (2) we demand you fully check anything you use AI for.

End result? People skip (2) to hit (1).

Then management burns anyone at the stake whenever inevitable mistakes happen.

lazyasciiart 6 hours ago | [-1 more]

But that’s trivially false. There is an entire category of work where it is hard to come up with an answer and easy to verify the answer, which means that if you verified everything there would still be a large time savings.

ethbr1 5 hours ago | [-0 more]

I would question whether that holds in the practical LLM automation space.

Can you think of any real life examples where an LLM is likely to be used?

I think in practice what you're saying is there are problems where there exist efficient deterministic verification methods, and I'm sure that's true.

But that's not the bulk of everyday work LLMs are being asked to do nowadays across industry.

girvo 5 hours ago | [-0 more]

Which is itself a problem as (in my partners evaluations as an optometrist), LLMs used for clinical notes has a bad habit of dropping clinically important information, and the biggest providers don’t give you a copy of the raw transcript or a recording

Which means she ends up spending just as much time as if she’d done it herself as it needs to be verified for accuracy every time…

brokencode 8 hours ago | [-1 more]

OpenEvidence is specifically meant to help clinicians make evidence-based decisions in the diagnosis and treatment of patients, not note transcription.

sxg 8 hours ago | [-0 more]

It does both: https://www.openevidence.com/user-guide/visits-overview

sarchertech 8 hours ago | [-0 more]

Ignoring the fact that this number comes from a company press release, it doesn’t say anything about the number of doctors using it to diagnose, just that they use it.

If a physician uses Google to search for a dosage chart for some drug they rarely prescribe, you wouldn’t say they are using Google to diagnose the patient. You wouldn’t say that either if they used Google to search for the most recent studies on a topic.

sambellll 8 hours ago | [-0 more]

To me this is like a good software engineer using AI.

The fact that they use it doesn't make what the result is any worse or less trustworthy - arguably it makes it better.

It only becomes a problem if they offload all of the thinking to AI.

sublinear 8 hours ago | [-0 more]

Human expertise is also improving all the time and not limited to just connecting dots. When AI seems to surpass a particular human, it's just because the human lacks broader knowledge and fails to investigate further.

An expert already knows they don't know everything. That was never the point. Critical thinking cannot be delegated to AI any more than it can be delegated to a book. There is nothing new going on here.

8 hours ago | [-0 more]

[deleted]

perching_aix 6 hours ago | [-0 more]

> There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks

Do you think it is any more possible to have a proper discussion with someone who preemptively paints the other person as mentally ill? Or someone who preemptively victimizes themselves?

Cause I don't think these are the hallmarks of an honest discussion. See also the entire past decade of political discourse.

Like, consider this:

> It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities.

A trivial counter to this is that you can just be an expert at something (e.g. your own work), use the damn thing yourself (professionally), and evaluate the outcomes for yourself. Then maybe remark "LLM good".

Now you come and remark "LLM bad", and point at random "evidence", either of outright other workloads, or even the one at hand: you're asking someone to reject the reality they've already experienced, entirely based on the assumption that they're "merely religious" or "in psychosis". You tell me if that's any more epistemically rigorous and sensible than their story.

TomasBM 8 hours ago | [-6 more]

Why is it psychosis and not lower standards?

While I can understand being skeptical of non-experts' claims that such answers are enough, I don't understand why you call it "psychosis" and not simply naivety or lack of expertise.

At the same time, the new so-called "models" haven't been pure transformer-based LLMs, but entire systems with tools (with access to the Internet), data storage, and the options to trigger additional instances for different tasks.

janmatejka 8 hours ago | [-5 more]

Because some people develop actual psychosis. They go down some rabbit hole with an LLM until the LLM makes them believe they invented new kind of physics that makes them go harassing experts who obviously try to ignore them because its all nonsense.

ruszki 7 hours ago | [-0 more]

For me, what others said and literally showed with Claude Code, et al, and what I’ve been experiencing with it, clearly signal way lower standards. But this was true even before LLMs.

shimman 7 hours ago | [-1 more]

Reminds me of that clip of Travis Kalanick, sexual deviant and harasser of women, talking about "discovering new physics."

natebc 6 hours ago | [-0 more]

The Uber guy? Yeah that was a painful watch.

perching_aix 6 hours ago | [-1 more]

Graciously diagnosed for them by random unqualified people on the internet with an agenda, frequently before even any relevant interaction:

"Oh you like LLMs? You must in AI psychosis!"

Let's not pretend it is anything more than the run of the mill wet fart of a culture war label. It's quite literally the "TDS" of the anti-AI crowd.

doawoo 6 hours ago | [-0 more]

That's really not the argument being made here, and you're panning it further by claiming this is staunchly anti-LLM.

The idea here is to signal that you can absolutely use LLMs to help you figure something out. But also, they're wrong a lot. So use your own brain too.

qnleigh 8 hours ago | [-2 more]

Totally agree. I'm a scientist, and like most scientists I have some specialized skills that most of my colleages don't. AI has empowered them to learn and build things that they might have otherwise needed me for. But there have been quite a few cases where it led them very far down a wrong path. This has started happening way more often in the last few months.*

We've known since the beginning that AIs confidently say incorrect things. But now that they can speak confidently about very complex topics, and mostly say correct things, we are letting our guard down and lots of subtle falsehoods are slipping through.

*In one case, I was able to put things back on track because the AI suggested my colleague talk to me; somehow it figured out we were co-workers.

aspenmartin 6 hours ago | [-0 more]

Right but hallucination rates have been consistently decreasing every model iteration. It's about error rates. As also a fellow scientist, I also will mess something up. Humans have an error rate. Once that error rate is low enough, it doesn't matter that it's > 0, it matters that it's low enough to be trustworthy and useful. Coding agents of 2024-25 had error rates too large; you couldn't meaningfully vibe code anything and needed a ton of oversight. It's still true but FAR less so, and this is after like a year of iteration.

bitlad 8 hours ago | [-0 more]

>very far down the wrong path.

Absolutely agree. Have seen this first hand

sxg 9 hours ago | [-8 more]

I see your argument, but it's not exactly news that an expert found a flaw in a popular tool. You could say the same about Wikipedia--experts have tons of issues with it, but Wikipedia still provides value to non-experts. The most likely alternative to Wikipedia for non-experts is simply not trying to learn anything new.

Similarly with LLMs, you can't just write them off entirely because they sometimes provide misleading or incorrect advice. The positive utility maximizing view is to learn when you need to call in an expert. I recently moved in to a new house and have used Claude extensively to figure out basic things (e.g., adjusting the garage door height, how to mount a TV). However, when the HVAC suddenly stopped working, I gave Claude a shot for an hour and tried some non-destructive fixes, but then realized I had to call in an HVAC expert.

ohyes 9 hours ago | [-5 more]

The free alternative to Wikipedia is the library, not “don’t learn anything new ever”.

I find Claude is surprisingly similar to a confident but incorrect coworker, with the benefit that Claude will reevaluate when I correct it.

sxg 8 hours ago | [-2 more]

I used the phrase "most likely alternative" intentionally. The library is where people should go to get answers in a world without Wikipedia, but the vast majority of people won't. So in practice, most non-experts either learn from Wikipedia or don't try to learn anything at all.

ohyes 8 hours ago | [-1 more]

Sure, if we’re going to go that broad. People are already leaning heavily towards learning nothing instead of using Wikipedia.

I guess to me it has to be comparable to be an alternative.

Like, I don’t consider doomscrolling x an alternative to reading Wikipedia but I might consider it an alternative to CNN, even though they’re all technically and very broadly activities that I could use to inform myself.

In that same way I don’t consider the multitude of ways I could use my free will necessarily alternatives to each other even though they technically are. It kinda sucks but going that broad feels to me like it breaks the concept of alternative and makes it kind of meaningless.

sxg 8 hours ago | [-0 more]

I get what you're saying, but I'm not deciding what should and shouldn't count as an alternative to X. I'm trying to answer the counterfactual: how do people behave in an alternative world without Wikipedia but otherwise identical to our world?

bflesch 9 hours ago | [-1 more]

Claude will do everything to retain you as a user, because that's one of their most important metrics.

ohyes 8 hours ago | [-0 more]

Excellent point my colleague has the exact opposite incentive.

frereubu 8 hours ago | [-0 more]

Slightly OT Nitpick: in regard to experts and Wikipedia, when doing a neuroscience-adjacent MSc, experts in the field actually directed me to Wikipedia as an excellent source for high-level neuroanatomy, including recent research, so I'm not sure your blanket description about experts and Wikipedia is correct.

Applejinx 6 hours ago | [-0 more]

You 100% can write them off entirely and go about your business as you previously had done. Ignoring the errors, it is very debatable whether there are even productivity gains beyond: human programmer or whatever is excited and cranked up to unsustainable degrees of activity and thinking to 'keep up' with what he thinks is an AI doing the work.

I'm seeing this fairly often and when it isn't garbage it's a capable person who has gotten inspired by their 'collaboration' in which the busywork is being done by a machine, but they're doing so much directing and correcting that it's not unlike what would happen if they got heavy into meth and went on a tear.

You absolutely can write them off entirely and decide for yourself what your comfort level of human-killing speed-freakism you want to pursue in your productivity. There's a long history of humans managing astonishing levels of productivity through self-destructive means. This is not even cheaper, once the 'first one's free' wears off: it's just a novel method of getting humans to burn themselves harder in the belief that they have a magic feather.

The ones who're really throwing themselves into the situation are the ones who'll burn out, but who aren't setting themselves up for atrophy and learned helplessness. Anyone who believes the technology lets them be a lazy manager just getting paid, is in for an unpleasant discovery.

sbarre 9 hours ago | [-0 more]

> Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading

Yes, this is exactly so. AI is able to confidently sound plausible enough to convince laypersons or anyone who isn't very familiar with the subject matter, which is a big part of the mass-appeal "magic" of ChatGPT and other similar tools. It's like having a know-it-all friend (who also makes shit up to bridge their own knowledge gaps).

In many non-advanced non-specialized situations, AI is right enough to be at best useful or at worst not harmful (usually landing in the middle somewhere).

But speaking for myself, in areas where I consider myself quite proficient, I can very easily spot the subtle inconsistencies and naive conclusions that AI responses provide, and I have to guide/steer/correct it a lot to get good results when the subject matter is complex enough.

david-gpu 6 hours ago | [-2 more]

Last week I went to a highly-specialized tertiary clinic about further treatment for a rare medical condition that I was diagnosed and treated for as a child. The two very specialized doctors I met there confirmed a diagnostic mistake that a specialist had made ten years ago. The only reason I pursued a second opinion, ten years later, was because Google Gemini had explained to me that the specialist ten years ago had performed the wrong type of test for my condition.

Do these LLMs make mistakes? They sure do, I see it all the time. But they can also help people make breakthroughs.

And this isn't the only time that Gemini has helped me diagnose long-term health issues, either.

I am not advocating to trust anything they say blindly, but they can be a great place to form new hypotheses and learn the right terms to look for when you are unfamiliar with a subject.

wasabi991011 5 hours ago | [-1 more]

Can you elaborate on how you use Gemini to diagnose long term health issues? Considering doing the same for myself, but I have no idea what is too much vs too little information, and generally the type of prompt engineering to do.

david-gpu 2 hours ago | [-0 more]

Some folks are not going to like what I am about to say, but what I do is write down as much information that I think may be relevant as possible, trying to avoid leading the witness with any of my preconceived ideas of what may be going on. At the end, I encourage them to ask me questions to get a more complete picture of what may be going on.

After a couple of rounds of that, a picture will start to emerge. The AI will make a few XYZ hypotheses of what may be going on, some of which will make more sense to you than others. This is when you can start searching some of those terms in places like pubmed.ncbi.nlm.nih.gov, including for example like diagnostic criteria for XYZ.

One of the ways I often use these AIs, not just in the context of finding possible diagnoses, is requesting them to make the case for and against hypothesis XYZ based on the data you have personally collected. Again, it's not about fully buying every thing that comes out of them, but it can help you consider angles or possibilities that did not occur to you, or that you had previously accepted/discarded without sufficient evidence. Think of them as that quirky acquaintance that knows a little bit about everything but sometimes misremembers, rather than as a god-like oracle.

And don't do all this in a single session/context. Start a new context every now and then, because otherwise it tends to go in circles as these AIs are biased towards agreeing with whatever it is you said most recently. Intentionally challenge yourself, re-evaluate the existing data from other perspectives.

Sometimes what you learn is not pleasant, but as more data becomes available, you learn to accept it. Good luck.

meowface 8 hours ago | [-0 more]

I may be missing something, but I think it's unclear that the parent poster here is necessarily actually contradicting anything the AI said. It may depend on the exact information the OP wrote to Claude and GPT. The full transcripts would be needed. (Though there is definitely a separate point that a doctor would generally better know all the right questions to ask, while current LLMs may be making certain assumptions.)

The LLM may have, from its "perspective", implicitly thought the OP was telling it that he had strong reason to believe there was no calcification and was not considering the bigger picture of possibly receiving an incomplete/poor assessment from the medical staff. In fact, the issue here may be the LLM overly trusting doctors vs. trusting its own expertise.

scosman 7 hours ago | [-0 more]

I dunno. I know a lot of software engineering experts. AI isn't always right, but neither are the people, and it's getting better and better.

Software is one domain where it excels because of structured training data and simulation environments, so I'm well aware it's better here than other areas.

Still there's somewhere balanced between saying every time it's "insufficient or incomplete or outright misleading" and "just trust AI". AI's a useful source of information/reasoning/research, but know you need to validate it's answers for important decisions.

nlawalker 9 hours ago | [-19 more]

> no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information

"Be wowed by the convenience and speed", or merely "take advantage of the mere availability"? What most people find to be damning about expert advice is that they simply can't get it anywhere, at any cost that they can afford.

whatever1 9 hours ago | [-18 more]

So if you want to do a surgery but you don’t see any surgeons around you ask a grocery butcher to have his way?

sxg 9 hours ago | [-11 more]

In certain circumstances, the answer is yes. If an airplane's pilots are incapacitated, do you simply give up and crash the plane because there are no other pilots on board? Or would you rather have someone on the ground try to coach a passenger into at least attempting to land the plane?

ChrisMarshallNY 9 hours ago | [-1 more]

As long as that passenger didn’t have the fish.

acheron 4 hours ago | [-0 more]

Yes, I remember, I had lasagna.

frereubu 8 hours ago | [-2 more]

That's an extreme edge case, which I don't think is in the context of the concerns in this thread.

sxg 8 hours ago | [-1 more]

The specific case doesn't matter--it's meant to make you think about the general question throughout this thread: when an expert isn't available, should non-experts use AI (or other tools) to help themselves? Sometimes the answer is yes because the potential benefits outweigh the potential harms (if any harms exist). But sometimes the answer is no because misleading/incorrect advice can cause a net harm.

frereubu 7 hours ago | [-0 more]

But if the cases where AI use is a net positive are one in a million in medical situations? The argument is surely about the ratio, which many people here are arguing (from anecdote, would be interested to see a real study) is not in its favour, and the potential downsides - from both false positives and negatives - can be huge.

close04 8 hours ago | [-3 more]

A passenger crashing the plane while trying to avoid a certain crash doesn’t make things any worse. An incompetent doctor trying to save you from certain death can make things so much worse. It’s all about weighing the best/worst outcome compared to where you are now.

microgpt 8 hours ago | [-2 more]

I hate to break it to you but death is certain for everyone.

Properly emotionally processing this fact and your complete inability to do anything about it is called an "existential crisis" and if you haven't had one or several yet, you will.

close04 7 hours ago | [-1 more]

I’m not sure what the “revelation” is? How is this related to what I said?

Putting that aside, your philosophy sounds shallow. Death is certain, but how long you have to live and the quality of that life are not predefined. An incompetent passenger-pilot trying to save you from a crash will at worst make no difference. But an incompetent doctor can teach you that death isn’t necessarily the worst outcome.

microgpt 4 hours ago | [-0 more]

There are many healthy psychological ways to accept the certainty of eventual death. But the process is inevitably painful.

I think the different ways people accept death explains a lot of people's psychology, like how you can guess people's attachment styles or Freudian stage fixation. For instance, billionaires who pour all their money into anti-aging research clearly are not handling it well.

jancsika 8 hours ago | [-1 more]

You can choose a) a calm, level-headed passenger who knows they aren't a pilot, or b) a calm, level-headed passenger who almost has their pilots license but has a medical condition that prevents them from admitting when they lack certain knowledge.

Who do you choose to be coached by an expert on the ground?

rvnx 8 hours ago | [-0 more]

No thank you, I will ask Claude and then ask ChatGPT to challenge me, and do a couple of rounds like that.

The first: Has no clue about anything and therefore no useful knowledge and cannot challenge me

The second one: Is proven to willfully give wrong information and will make me do mistakes for sure.

The LLMs will do their best, even if imperfect, since they summarizes what appeared in books.

I prefer to be grounded on what Airbus / Boeing manuals, or on what pilots training book said, than two far more unreliable sources.

EA-3167 9 hours ago | [-4 more]

People, especially in medical crises, are desperate for answers that they often can't get because their clinicians don't know. The illusion of an all-knowing guru who sounds like their doctor and tells them ANYTHING is extremely alluring. If you're waiting to hear back from a doctor about test results (which these days probably showed up on your online account the moment they were completed) can be agonizing.

Ok for pain in your shoulder it might not, but how about a woman with a lump in her breast waiting for the mammogram interpretation? How about someone trying to understand disturbing lab results? People are also often pushed these days to move through visits with doctors at a breakneck speed, but the AI will "hear you out" all day.

Part of this is a problem with the AI, part of it a problem with our healthcare systems, and part of it is simply human nature. If you think that OpenAI, Anthropic, Google and the rest weren't aware of this going in you must have very little faith in the intelligence of their members. It's not hard to imagine the future of LLM's should involve a hell of a lot of liability on the companies running it, but for now it's the Wild West.

bilsbie 9 hours ago | [-2 more]

> but how about

Whatever scenario you come up with my answer is the same.

As an adult I’d like to be able to choose what tools I use to learn about my condition regardless of how well it works or even if it’s likely to mislead me.

There’s risk in every aspect of life and we can’t baby proof everything.

baconmania 8 hours ago | [-0 more]

>choose what tools I use to learn about my condition regardless of how well it works or even if it’s likely to mislead me.

Even if it "works" so poorly that you're not actually learning about your condition?

EA-3167 8 hours ago | [-0 more]

If it's helping you learn about your condition then sure I agree. The issue here is that's not really the case, it's giving you the illusion that you're learning about your condition while feeding you hallucinations and half-truths at best. A recent look at medical advice from these things showed they're no better than a coin flip.

So if you MUST have answers that are at most random guesses, I'd suggest saving a few bucks and asking a coin before flipping it.

perching_aix 6 hours ago | [-0 more]

The companies are 100% aware, yes, and so they did make quite a few changes over the years.

Current trend is that the models will try to explicitly steer you towards "asking better questions from your medical provider", rather than providing diagnoses. They do also evaluate whether something can actually be established rather than just listen and nod along. And so the "you must have very little faith in the intelligence of their members" goes right back against these failure mode ideas.

Now of course, given a sufficiently desperate person, they can probably torture anything they want to hear out of these models. But so can they out of actual people, so that's kind of a high bar. When you get to the point where people are willfully misreading a given piece of text, bets tend to be rather off.

perching_aix 7 hours ago | [-0 more]

No, people don't even go to a butcher, they do it themselves if they can. See the countless stories about farmers and their inventiveness. Example: https://www.youtube.com/watch?v=KKaJhQBusH8

highfrequency 10 hours ago | [-0 more]

Seems natural enough. There will always be complexity and nuance that is missed by an AI model or person - the world is just super detailed. The more expertise you have the more you will be aware of that nuance. That doesn't mean the model or person is not useful as a starting point.

Aurornis 8 hours ago | [-0 more]

> I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful.

I always recommend people try asking LLMs a lot of questions on something they know first. Programmers should start by asking LLMs to work on a codebase they’re familiar with first.

You’re overstating the problem, though. Even for an expert the LLM will get a lot of things right and can be helpful under a watchful eye.

The real problem is knowing how to identify when it’s on the right track and when you need to correct it, because both cases are presented with the same tone and confidence.

An expert can better identify when the LLM output doesn’t sound plausible. Someone unfamiliar with the topic will think everything it says looks correct.

kryogen1c 9 hours ago | [-0 more]

On the flip side of this problem, novel best practices lag the medical standard of care, other human failures like corruption and competing priorities notwithstanding.

For example, we had to advocate for certain practices during the birth of our first child that became routine during our second several years later.

So, neither side is guaranteed correct, doctor or citizen researcher (which did not include LLMs in my case, for the record). The truest answer is also the most useless one, applicable to all fields: it depends.

The real question is: if you embrace being a layman, whom do you trust more: LLMs/the internet or experts, like doctors? I think the answer is pretty clearly experts.

rapatel0 8 hours ago | [-0 more]

You shouldn’t expect frontier models to work on medical imaging. There is much more that goes into building a medical imaging product. First and foremost is data. Medical imaging datasets are not prevalent one the public internet at the scale necessary to have good performance on medical imaging tasks especially MRI. Also the labels are super noisy.

This is completely different than asking for general medical reasoning which is more derived from papers, public standards and textbooks.

Text exists at the right scale but images don’t.

je42 8 hours ago | [-0 more]

The question is how far is AI off compared to the professional that we have access to. World best experts are not accessible to most of us. :(

8 hours ago | [-0 more]

[deleted]

mattgreenrocks 8 hours ago | [-0 more]

You're not. This site was also bullish on using LLMs as therapists, which defeats the very point of them, and reflects a lack of knowledge on what exactly therapists do for people.

More on topic: if the article's author arrived at a definitively negative result would this have shown up on HN?

baxtr 5 hours ago | [-0 more]

This is a serious issue for young people I think.

I have seen outputs that look good but the actual content is bad. If you’re inexperienced in a field you can’t see it because AI makes anything look right.

I have gotten very good results with AI but you can’t take the first answer at face value. You need to be suspicious and challenging until you tweak out the right answer over time.

xivzgrev 6 hours ago | [-0 more]

Well that's part of the problem. AI is not accountable - if you take its advice and hurt yourself, who is responsible?

A real doctor is accountable.

They might both "know" a lot of things but implicitly the party who is accountable is going to be more trustworthy.

And I don't see that going away until AI companies must be licensed for application x and can lose their license / be sued if engaging in malpractice.

serf 7 hours ago | [-0 more]

>I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading

media is awash at the moment with experts chiming in to support AI, saying their fields are being revolutionized, etc.

it seems unsurprising to me that the laymen opinion would follow the loudest media trumpets.

jstummbillig 9 hours ago | [-6 more]

No, not anytime someone is an actual expert at anything, AI output appears insufficient. That is why experts in various fields use AI.

Then to say "Aha, but all of that is AI psychosis" makes obviously no sense: Why would we trust experts when they offer critique but not when they say "this is helpful"?

Overall: People are not insane. AI makes mistakes and, often, fails completely. AI also helps them do things better, quicker, increasingly so. The jaggedness of AI is confusing and real.

torben-friis 8 hours ago | [-3 more]

How many times have you seen an expert go "yeah these results are good consistently enough for a non expert to trust them without expert assistance"?

There is a huge difference between having a chance of a good result, which can be useful for experts able to filter out the bullshit, and consistent success. I would generate code as a helper, I would never allow a guy from marketing to merge unreviewed AI code.

jstummbillig 6 hours ago | [-0 more]

> How many times have you seen an expert go "yeah these results are good consistently enough for a non expert to trust them without expert assistance"?

But see now we are talking about something else entirely than the claim that I found dubious, which was: "Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading."

Consistently good enough !== anytime insufficient

hectdev 8 hours ago | [-0 more]

That's what I would like to call job security. When you know how to read what is wrong, you can easily catch the mistakes and correct it. AI gets you there faster by doing a lot of things right and you correct the mistakes.

tpmoney 8 hours ago | [-0 more]

I had a realization recently that the problem with "AI isn't consistently good enough" is that experience is probably not sufficiently distinguishable from the experience most non-experts have with computer systems all the time.

As an industry we've been promising people for decades that if they put all their data into our special softwares they can get all sorts of information back out that will make life easier for them, reveal new insights and otherwise improve their understanding. But the unspoken caveat has always been that you have to put the right data into the right places, in the right format, in the right way and then you have to ask the right questions, in the right syntax, with the right tools. And if you get any one of those parts wrong, you're not going to get the right answers (or possibly even any answer at all). How many people have had their excel worksheet that they (or someone else they asked/employed) built for some task that has been working fine for the last year suddenly stop working or start throwing out nonsense numbers because some input changed? Or how many people have experienced their system seemingly throw out meaningless garbage because daylight savings changed right at the moment the report was being run? Or spent months operating on wrong data because the person who wrote the query misplaced a parenthesis and the query was searching for "(foo AND bar) OR baz" and not "foo AND (bar OR baz)". For most people, the computer and the programs they use to do their jobs are magical black boxes that most of the time produce mostly the right answers and sometimes get things very very wrong with no indication of what has changed. Which is effectively the same experience they will have with an AI, but now instead of needing to figure out some arcane excel pivot table and VBA script, they can just dump some raw data and a "natural language" question into the AI.

And that's not counting the fact that their experience with looking information up online is about the same as well. How many absolutely confident wrong takes have you encountered online for things you're an expert in? How many of those wrong takes have come straight from supposedly trustworthy sources like news companies or even other people in the field?

For most people, using a computer has always come with the asterisk that you should always be aware that the source you're reading could be very wrong, that the output is only correct assuming all the inputs and all the parts processing that input are also correct and that everything you do should be accompanied by vetting by experts, whether those experts were software developers or domain experts. For most people the only thing that's changed with AI is that it's a one stop shop for their "probably directionally right, almost certainly wrong in the details" access to the digital oracles.

lazide 8 hours ago | [-1 more]

I’ve never seen an expert use AI in their field beyond the initial ‘oh interesting’ stage.

inquirerGeneral 7 hours ago | [-0 more]

[dead]

tomaskafka 9 hours ago | [-0 more]

Yes. The PM’s “with AI I know enough to be dangerous, haha” means “I’m actually dangerous and I don’t realize”

gofreddygo 7 hours ago | [-0 more]

This is true in broader contexts too. Bunch of experts can't agree on something fundamental which is hard to prove/ disprove, and they have strong opinions on the topic.

AI is much worse.

jefffoster 8 hours ago | [-0 more]

AI is an expert in everything you are not.

jrockway 7 hours ago | [-0 more]

I came here to post this as my experience. AI is magical when I apply it to something I know nothing about. It far exceeds my expectations every single time. I know nothing, but here is a report with animated graphics explaining exactly what I asked it to explain!

In fields where I'm an expert... it makes a lot of silly mistakes that are annoying and I feel like they would just cascade if I didn't correct them early. (I still think it's a net win, but... I watch it and it watches me, and we both do better work. I'd even apply the "magical" adjective when it does stuff I hate but know how to do, like edit Helm charts. What would normally be 20 minutes of me griping about YAML indentation is just a correct diff in seconds. I'll take it!)

So with that in mind, I tend to distrust output that I can't verify. If a doctor was recommending surgery and I thought the plan was too aggressive, I'd get a second opinion. I don't expect Claude Code to have much medical diagnostic ability, as that is really not what the model is trained for, and I know how it performs on work that it's trained and fine-tuned for. That is not to say the output is wrong and that it can't have diagnostic value, just that I personally wouldn't feel safe trusting it. Wrap up the same model with fine-tuning in the domain and a harness that reminds Claude to do a lot of sanity checks, perhaps with a human in the loop to guide it back onto the rails when it gets hyperfixated on something that doesn't matter? That could very much be a useful AI product.

pwg 8 hours ago | [-0 more]

> Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading.

The term for when the press "gets it wrong" is Gell-Mann Amnesia (https://en.wiktionary.org/wiki/Gell-Mann_Amnesia_effect).

In that case, when you have personal knowledge of the facts, or know the specific domain area, you can see where the reporter mixed things up.

AI is no different, it's just a bunch of matrix math substituting for "the reporter" regurgitating what it was previously told. So the Gell-Mann Amnesia effect would apply just the same. If you have domain knowledge, you immediately see where the AI got it wrong. When you do not have domain knowledge, you have less chance of seeing where the AI was wrong.

parineum 9 hours ago | [-0 more]

> I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading.

AI isn't even the first instance of this phenomenon, news articles are like this as well.

https://en.wiktionary.org/wiki/Gell-Mann_Amnesia_effect

stymaar 7 hours ago | [-0 more]

> I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading

AI assistant are industrializing the Gell-Mann amnesia effect.

beering 9 hours ago | [-0 more]

TFA doesn’t actually state where the bit about shockwave therapy came from and it wasn’t the main point of the article. The concern was about being given useless therapies. The homeopathic analgesic is concerning, at least to me.

I.e. nothing this radiologist said was related to the LLM’s advice.

sevenzero 7 hours ago | [-0 more]

>AI output appears insufficient or incomplete or outright misleading

It has been like this since the rise of "AI". The only people enthusiastic about it are usually the ones hoping to make a profit in one way or another.

suttontom 8 hours ago | [-1 more]

Your instinct is correct, and in a lot of cases it's true. However, I've heard from enough doctors by now (a cardiologist, psychiatrist, and epidemiologist/former physician) that they use medical LLMs and find them extremely helpful, mostly as a way to either bring up knowledge they'd forgotten about or as a way to learn something new and then verify it. I'm extremely skeptical about LLMs in general and the connection to Gell-Mann Amnesia is apt, but I wouldn't necessarily write them off completely like that. There are experts using the models that find them genuinely helpful in their field.

GTP 8 hours ago | [-0 more]

Probably this is the point, and it's a point that has been brought up a lot of times in the past, maybe less in recent times: you need to know the things you're applying an LLM to. In this way, you can keep the good outputs while having the expertise to discard the bad ones.

Hikikomori 8 hours ago | [-0 more]

It's like reading news articles. Seems reasonable until you read an article about something you know, then you see how wrong they can be.

newsclues 9 hours ago | [-1 more]

LLM is not necessarily an expert system. Once there are expert systems for law, healthcare, accounting, governance…

https://en.wikipedia.org/wiki/Expert_system

microgpt 8 hours ago | [-0 more]

Didn't they try that in the 80s and 90s but discover the real world is too variable for that to work?

meindnoch 9 hours ago | [-0 more]

We're past the point of Gell-Mann amnesia. This is full blown Gell-Mann psychosis.

silisili 9 hours ago | [-0 more]

This is natural and even logically expected. It's just Gell-Mann amnesia in action. The world has more people spouting on things than it has people knowledgeable in said things.

Apply that to the Internet at large, and realize where LLMs got their training. They're basically ConfidentlyIncorrect personified.

grayhatter 8 hours ago | [-1 more]

> This is itself alarming to me, but no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information.

Welcome to the club? This new awareness you've found over the true quality of LLM based GenAI output has been what "all the haters" have been mad about for-ever. That the output of LLMs are clearly defective, and merely have found a cute trick towards making humans think they're less defective than they are actually measured to be.

And the corresponding anger and frustration to push the risks of genai output out onto others, while also aggressively pushing it as a feature you should be using already. You're behind don't you know, and whatever other lie I have to tell to trick you into enough FOMO to pay me 200USD/mo so I can sell FOSS back to you.

An LLM can only output the mean next likely token, and then add a bunch of extra noise on top of that so it feels interesting and not repetitive. None of this is new, the problem is, 50% of humans are below the mean, but have no idea. So when an LLM tells them some lie: well, it sounds so helpful! It's impossible for someone who sounds this helpful to lie to me, liars never sound confident! It must be PERFECT! I'm gonna tell everyone how perfect it is. so the bottom 0-33% think LLMs are fantastic tools that make nearly 0 mistakes in comparison to the bottom 33%. 33-66%-ish aren't sure, some times it's great, but it will make that random mistake sometimes, but I can catch most (or all of them depending on ego). and the 66%+ are angry about how many people are getting tricked by something so obviously low quality, or are lucky enough to not have to care.

orangecat 7 hours ago | [-0 more]

An LLM can only output the mean next likely token, and then add a bunch of extra noise on top of that so it feels interesting and not repetitive.

So when an LLM was asked to analyze the unit distance conjecture, it just spat out a bunch of average-or-random tokens that coincidentally happened to correspond to a valid proof that had eluded humans for decades?

stringfood 7 hours ago | [-0 more]

what is happening is that the gap between what the experts and AI know is getting smaller each year. this year sure radiologists are mocking AI's ability to interpret MRI results, but they are a lot better at that this year than last. In five years perhaps radiologists will truly appreciate AI, but I am not holding my breath because radiologists are notoriously slow to adapt to changes in medical science compared to other specialists like anesthesiologists or surgeons

redsocksfan45 7 hours ago | [-0 more]

[dead]