Amusingly, the question "which substring occurs earlier on average" is different from the question "which substring is more likely to occur before the other". In fact the second question sometimes has a circular answer! For example, THH typically (with >50% probability) occurs before HHT, which typically occurs before HTT, which typically occurs before TTH, which typically occurs before THH.
Also the question "which substring occurs earlier on average" is intimately connected with algorithms for substring search. For example, if you want to check that a string doesn't contain HHH, you need to look at every third character, but for THH that's not enough.
Also the question "which substring occurs earlier on average" is intimately connected with algorithms for substring search. For example, if you want to check that a string doesn't contain HHH, you need to look at every third character, but for THH that's not enough.
Fascinating stuff.