>information-theoretic approach to the process Can you elaborate on this? I've s...

nighthawk454 · on Nov 19, 2023

I think the analogy is something like: if you have a simple distribution over all words, then that's just word frequency. Obviously not a good predictor. The 'information' necessary to predict the correct next word contextually is just not there if you're predicting words in a vacuum. In order to be practically useful and predict the right words _in context_, the model must be conditioning off of more of the sentence/document (aka more information). So it should not be surprising that a 'glorified autocomplete' has some degree of "understanding" as it would be impossible for it to be any good as an autocomplete-er otherwise.

theGnuMe · on Nov 19, 2023

That's not information theoretic, that's just conditional probability.

tysam_and · on Nov 27, 2023

You might want to take another look at Shannon's paper, lol, this statement is quite contradictory. Probability _is_ the backbone of information theory, dude! It's quite incredible.

nighthawk454 · on Nov 20, 2023

it is conditional probability, but that is a fundamental concept used in information theory