But why should I care? If you demonstrated that a model can perform more accurate diagnoses than a doctor, but also it had this strange behavior when no image was presented, why should that deter me from using the model?
Because you don’t have any way of telling if it actually used the image presented, or based it’s conclusions on a different image it made up
I don't find that persuasive. This is not the error I worry about. Let's say that hypothetically the model just ignores the input image 1 in 10,000 runs. This really doesn't concern me because the output will be trivially detectable incorrect nonsense that doesn't match the symptoms at all. Such a contingency is easily handled by running the image through multiple models and distilling the output, anyway.
The error I worry about is where the model uses the image and comes to an incorrect but symptom matching diagnosis. But in this hypothetical the model is less likely to do so than a doctor, so the choice is either accept the risk of the model or accept a higher risk from a doctor.
Really? You know you could just ask it.
Which would tell you what, exactly? The whole root of the problem is that the model doesn’t “know” either