Handwritten digits and Locally Linear Embedding

⊕
By Fabian Pedregosa.

Category: General, Python, scikit-learn

Wed 04 May 2011

I decided to test my new Locally Linear Embedding (LLE) implementation against a real dataset. At first I didn't think this would turn out very well, since LLE seems to be somewhat fragile, yielding largely different results for small differences in parameters such as number of neighbors or tolerance, but as it turns out, results are not bad at all. The idea is to take a handwritten digit, stored as a 8x8 pixel image and flatten it into a an array of 8x8 = 64 floating-point values.

Then each handwritten digit can be seen as a point in a 64-dimensional space. Of course, visualizing in 64-dimensional spaces is not easy, and that's where Locally Linear Embedding comes handy. We'll use this method to reduce the dimension from 64 to 2 with the hope of preserving most of the underlying manifold structure. The following is a plot of the handwritten digits {0, 1, 2, 3, 4} after performing locally linear embedding. As you can see, some groups are nicely clustered, notably the 0 is isolated while other like {4, 5} are closer, precisely those that are more similar.

Source code for this example can be found here but relies on my manifold branch of scikit-learn.