There's a big difference between a _puzzle_ and a _mystery_. In a puzzle, the goal state is known, and as more pieces - data - appears, the goal gets closer. You know how far you are from the goal.
A mystery is worse. With each additional piece of data, the goal gets farther away. Everything is more and more confusing.
(Popularized by Malcom Gladwell)
Maybe I am missing something but I just find this wrong.
Everything is a puzzle: there is one "Truth" or one diagnosis. You (a smart human) should be able to converge on it by cross-examining your LLMs. By themselves, they have no interest in revealing this, no stakes, which makes them tools only useful at the hands of a capable investigator.
The problem is that the diagnosis might not be known for a while. There's a few conditions and diseases that require an autopsy for a guaranteed diagnosis and therefore are diagnosis based on symptoms in clinical settings.
> You (a smart human) should be able to converge on it by cross-examining your LLMs.
What makes you think this is fundamentally different from cross-examining ELIZA? There is no guarantee that the LLM will help you converge on anything. Indeed actually calling out an LLM on BS tends to eventually produce an "I don't know and can't help you further" answer (as it should).
> There is no guarantee that the LLM will help you converge on anything.
Absolutely. The guarantee does not come from the LLM. The LLM is a simply an improved version of Google Search.
The guarantee can only come from a systemic application of epistemic discipline and reasoning, which is very much (smart) human territory.
Put it another way, I could make good decisions with/without LLMs, with some uncertain diagnostics as input. I would have to trawl through 50 papers myself, and it is possible that my decision arrives 5 years too late as a result. LLMs enable trawling and do some of the legwork in connecting the dots, but are ultimately only as capable as the orchestrating human.
The same goes for a human expert. There's no guarantee of convergence and you could eventually end up at "I don't know".