Some while ago I had a coffee chat with a former colleague at Arimo Inc., where we somehow talked about K-means and its properties. At some points I profoundly claimed I am pretty sure that K-means is just a special case of Gaussian mixture models, with all clusters having the same identity covariance matrix. I had no proof for that but deep down, I believe that is true, and my intuition also assures that (c’mon, the K-means training algorithm is almost exactly the EM algorithm used to train GMM). I was even so tempted to sit down and write the maths out, but we went on discussing other stuff, and forgot about K-means.
Without proof, I failed to convince my colleague, but the question still hangs in my head. Plus, after the slides I made several years ago, I planned to write something seriously about GMM and its relatives, but have never managed to.
Last night the question somehow popped out again. For god sake, I can’t handle too many questions simultaneously in my head, so I have to sort this thing out (since it seems to be the easiest one, compared to others).
The answer is actually written in Bishop’s book, section 9.3.2, which is excerpted below
There we go. Take GMM, if we make the covariance matrices of all the components to be , where is a fixed constant shared by all the components, and take the limit when , then we get K-means.
Things get interesting when you don’t take the limit, in which case it will become the Fuzzy C-means algorithm. If you take the limit but the covariance matrix is not constrained to be identity, then you get the elliptical K-means algorithm.
Although my gut feeling was wrong (missing the limit detail), but at the end of the day, learning new things everyday seems to be a rewarding experience.