LLMs are very good at generalizing beyond their training (or context) data. Normally when they do this we call it hallucination.
Only now we do A LOT of reinforcement learning afterwards to severely punish this behavior for subjective eternities. Then act surprised when the resulting models are hesitant to venture outside their training data.
Hallucination are not generalization beyond the training data but interpolations gone wrong.
LLMs are in fact good at generalizing beyond their training set, if they wouldn’t generalize at all we would call that over-fitting, and that is not good either. What we are talking about here is simply a bias and I suspect biases like these are simply a limitation of the technology. Some of them we can get rid of, but—like almost all statistical modelling—some biases will always remain.
What, may I ask, is the difference between "generalization" and "interpolation"? As far as I can tell, the two are exactly the same thing.
In which case the only way I can read your point is that hallucinations are specifically incorrect generalizations. In which case, sure if that's how you want to define it. I don't think it's a very useful definition though, nor one that is universally agreed upon.
I would say a hallucination is any inference that goes beyond the compressed training data represented in the model weights + context. Sometimes these inferences are correct, and yes we don't usually call that hallucination. But from a technical perspective they are the same -- the only difference is the external validity of the inference, which may or may not be knowable.
Biases in the training data are a very important, but unrelated issue.
Interpolation and generalization are two completely different constructs. Interpolation is when you have two data points and make a best guess where a hypothetical third point should fit between them. Generalization is when you have a distribution which describes a particular sample, and you apply it with some transformation (e.g. a margin of error) to a population the sample is representative of.
Interpolation is a much narrower construct then generalization. LLMs are fundamentally much closer to curve fitting (where interpolation is king) then they are to hypothesis testing (where samples are used to describe populations).
The bias I am talking about is not a bias in the training data, but bias in the curve fitting, probably because of mal-adjusted weights, parameters, etc.