fa.bianp.net

Handwritten digits and Locally Linear Embedding

Category: General, Python, scikit-learn

I decided to test my new Locally Linear Embedding (LLE) implementation against a real dataset. At first I didn't think this would turn out very well, since LLE seems to be somewhat fragile, yielding largely different results for small differences in parameters such as number of neighbors or tolerance, but as it turns out, results are not bad at all. The idea is to take a handwritten digit, stored as a 8x8 pixel image and flatten it into a an array of 8x8 = 64 floating-point values.

image0

Then each handwritten digit can be seen as a point in a 64-dimensional space. Of course, visualizing in 64-dimensional spaces is not easy, and that's where Locally Linear Embedding comes handy. We'll use this method to reduce the dimension from 64 to 2 with the hope of preserving most of the underlying manifold structure. The following is a plot of the handwritten digits {0, 1, 2, 3, 4} after performing locally linear embedding. As you can see, some groups are nicely clustered, notably the 0 is isolated while other like {4, 5} are closer, precisely those that are more similar.

image1

Source code for this example can be found here but relies on my manifold branch of scikit-learn.