Nightmare because the AI is just generating a random text that fits the question.
This is not a fair assessment of what AI is doing.
Studies have found that newer reasoning AIs are about as good at diagnosing illness from a written description of symptoms as doctors are.
Granted, it cannot actually examine a patient, so we're not replacing doctors anytime soon. But your view is obsolete.
They are using the “gold standard for the evaluation of expert medical computing systems” not a proxy for what a doctor actually does when diagnosing someone.
It may have some utility after diagnosis, but this test doesn’t demonstrate utility for patients.
[flagged]
But I, SCP-426, am a toaster.
I feel the same when visiting a doctor in Canada. In that 2 minutes I have with they in one appointment per year I hear a standard text.
Not quite. An LLM generates text that would likely follow. The sky is… “blue”. A patient in pain with a bone protruding from their shin has a… “broken leg”.
The more training data, the more questions it can answer with a reasonable degree of probability of accuracy.
Throwing away a potentially useful analysis just because it’s probabilistic seems a bit like throwing the baby out with the bath water.
This is a very peculiar use of the word "random".