This is just a misconception of how LLMs work and also what reasoning is.
“There cannot be any reasoning embedded in the model” a strong statement, what do you mean by reasoning because by any reasonable definition I’m aware of, they clearly are able to exhibit reasoning.
The fact that the pre training objective is next token loss has nothing to do with capabilities or their ability to reason. To be highly successful at next token prediction you NEED to reason. I’m quite confused here.
LLM output produces the illusion of reasoning. The underlying computation, however, is not reasoning.
If you don’t mind actually taking a few more words to be more specific that would be helpful because what you’re saying doesn’t really make sense at all. You don’t need to trust that the reasoning traces are all faithful representation of an internal reasoning trace. Plenty of other ways to probe models (see anthropics work using circuit tracing).
What else is there to say? LLMs can at most regurgitate approximations of human reasoning steps in the limited forms in which they may be expressed in the training data or interpolations thereof. That's the core essence of what they are. There is no proper reasoning to be found.
"at most" is wrong. RL with verifiable rewards takes you beyond quality and skills represented in training data, I'm not aware of meaningful fundamental limits here if you scale compute enough even though right now it's highly sample inefficient.
Since you refuse to actually define what you consider to be reasoning let me at least put one out there: a system exhibits reasoning when an answer depends on nontrivial intermediate computation over the problem. If you find problems with this, fine, but just make an effort to contribute an alternative.
If you increase test time compute you get better performance. If the model was just "interpolating" this wouldn't really work would it? Models can do FrontierMath expert problems (unpublished, expert authored, peer reviewed math problems) that require an insane amount of compositional reasoning. If they were regurgitating training data, that wouldn't really work would it? Chain of thought, while not always faithful to internal computation, improves performance. If the models were just regurgitating information, it wouldn't work that well would it?
"regurgitating training data" is also of course misleading. Yea they can memorize parts of the training data, but they generalize very well.
How do you define reasoning? What does a system have to functionally do in order to qualify for it?
Reasoning includes things like proper use of logic. LLMs have been repeatedly shown to fail horribly at this.
They consistently fail at drawing basic logical conclusions because they cannot build a sufficiently abstract model of certain problems that allows them to grasp their true nature. In other words, the whole class of questions of the kind of "how many r's in strawberry" or "do I take the car to the car wash?" would be answered correctly and reliably.
> Reasoning includes things like proper use of logic. LLMs have been repeatedly shown to fail horribly at this.
That models cannot do ALL logic problems does not mean that they cannot properly use logic...they can write Lean-verified theorems. How is that not logic?
> They consistently fail at drawing basic logical conclusions because they cannot build a sufficiently abstract model of certain problems that allows them to grasp their true nature.
What does their "grasp[ing] their true nature" have anything to do with what they can do?
> In other words, the whole class of questions of the kind of "how many r's in strawberry" or "do I take the car to the car wash?" would be answered correctly and reliably.
Again, just because you have interesting failure modes or brittleness does not mean they do not reason.
This is exactly backwards. The brittleness is because they emulate reasoning without actually algorithmically performing it.
Add.: I pointed to this class of problems specifically because they require the ability to abstract in a way that the question itself does not immediately suggest. Math problems are different in that they are described in terms of art that are closely related to certain patterns of manipulation (that is, the paper texts tend to contain both in close proximity to one another).
For you, a system needs to reason perfectly and flawlessly, all the time? So humans do not reason? Humans don't have brittle failure modes?
> they require the ability to abstract in a way that the question itself does not immediately suggest
yes, yet there are multitudes of other measurements of the same kind where LLMs reason perfectly well and better in many cases than a human could.
> Math problems are different in that they are described in terms of art that are closely related to certain patterns of manipulation (that is, the paper texts tend to contain both in close proximity to one another).
Is your logic really that math problems are actually easier to answer without reasoning and just by blending together closely related papers? I would definitely suggest reading the literature a bit more on this topic.