In addition to everyone suggesting the classic n-gram approaches, now it is rath...

In addition to everyone suggesting the classic n-gram approaches, now it is rather easy to use a word2vec (google it) representation of the words instead - obtain a mapping between words and an array of x numbers (either by finding a pretrained word2vec model on internet or training one on texts from your special domain), and then just run clustering on those numbers instead.