by Charon77 3 hours ago
The issue with Markov Chain is you can't get good next token prediction on long enough context because once you see the last 1000 words instead of just 2, it's quite unlikely that your 'frequency' is populated for that exact combination, and markov chain don't work on token embedding that allows some encoding of meaning.