It's not really a problem. We're out of natural tokens anyway. The future is synthetic verifiable traces (already the way we train coding agents).
> synthetic verifiable traces
What does it mean, Is it like when somebody used some coding agent to develop a feature and later input prompts and a resulting PR can be used for training by a presumption that final PR was a correct implementation of a prompt?
Yea it’s rejection sampling, so you have an agent, you take a verifiable problem (people use lots of different verification signals but say unit tests etc) and have the agent attempt it K times. You accept the trajectories (all context, tool use etc, the entire log) that are positively verified and use these as training examples.
The trick is to find the examples that are just in between too difficult and too easy for the existing agent, these have the strongest training signals