Keep the gradient flowing

Low-level routines for Support Vector Machines

I've been working lately in improving the low-level API of the libsvm bindings in scikit-learn. The goal is to provide an API that encourages an efficient use of these libraries for expert users. These are methods that have lower overhead than the object-oriented interface as they are closer to the …

new get_blas_funcs in scipy.linalg

Today got merged some changes I made to function scipy.linalg.get_blas_funcs(). The main enhacement is that get_blas_funcs() now also accepts a single string as input parameter and a dtype, so that fetching the BLAS function for a specific type becomes more natural. For example, fetching the gemm routine for …

Locally linear embedding and sparse eigensolvers

I've been working for some time on implementing a locally linear embedding algorithm for the upcoming manifold module in scikit-learn. While several implementations of this algorithm exist in Python, as far as I know none of them is able to use a sparse eigensolver in the last step of the …

scikits.learn is now part of pythonxy

The guys behind pythonxy have been kind enough to add the latest scikit-learn as an additional plugin for their distribution. Having scikit-learn being in both pythonxy and EPD will hopefully make it easier to use for Windows users. pythonxy-logo For now I will continue to make windows precompiled binaries, but pythonxy …

Least squares with equality constrain

The following algorithm computes the Least squares solution || Ax - b|| subject to the equality constrain Bx = d. It's a classic algorithm that can be implemented only using a QR decomposition and a least squares solver. This implementation uses numpy and scipy. It makes use of the new linalg.solve_triangular function …

A profiler for Python extensions

Profiling Python extensions has not been a pleasant experience for me, so I made my own package to do the job. Existing alternatives were either hard to use, forcing you to recompile with custom flags like gprofile or desperately slow like valgrind/callgrind. The package I'll talk about is called …

scikit-learn coding sprint in Paris

Yesterday was the scikit-learn coding sprint in Paris. It was great to meet with old developers (Vincent Michel) and new ones: some of whom I was already familiar with from the mailing list while others came just to say hi and get familiar with the code. It was really great …

py3k in scikit-learn

One thing I'd really like to see done in this Friday's scikit-learn sprint is to have full support for Python 3. There's a branch were the hard word has been done (porting C extensions, automatic 2to3 conversion, etc.), although joblib still has some bugs and no one has attempted to …

Computing the vector norm

Update: a fast and stable norm was added to scipy.linalg in August 2011 and will be available in scipy 0.10 Last week I discussed with Gael how we should compute the euclidean norm of a vector a using SciPy. Two approaches suggest themselves, either calling scipy.linalg.norm …

Smells like hacker spirit

I was last weekend in FOSDEM presenting scikits.learn (here are the slides I used at the Data Analytics Devroom). Kudos to Olivier Grisel and all the people who organized such a fun and authentic meeting!

image0

image1

New examples in scikits.learn 0.6

Latest release of scikits.learn comes with an awesome collection of examples. These are some of my favorites:

Faces recognition

This example by Olivier Grisel, downloads a 58MB faces dataset from Labeled Faces in the Wild, and is able to perform PCA for feature extraction and SVC for classification, yielding …

Weighted samples for SVMs

Based on the work of libsvm-dense by Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu I patched the libsvm distribution shipped with scikits.learn to allow setting weights for individual instances. The motivation behind this is to be able force a classifier to focus its attention in …

Coming soon ...

Highlights for this release: * New stochastic gradient descent module by Peter Prettenhofer * Improved svm module: memory efficiency, automatic class weights. * Wrap for liblinear's Multi-class SVC (option multi_class in LinearSVC) * New features and performance improvements of text feature extraction. * Improved sparse matrix support, both in main classes (GridSearch) as in sparse …

memory efficient bindigs for libsvm

scikits.learn.svm now uses LibSVM-dense instead of LibSVM for some support vector machine related algorithms when input is a dense matrix. As a result most of the copies associated with argument passing are avoided, giving 50% less memory footprint and several times less than the python bindings that ship …

solve triangular matrices using scipy.linalg

For some time now I've been missing a function in scipy that exploits the triangular structure of a matrix to efficiently solve the associated system, so I decided to implement it by binding the LAPACK method "trtrs", which also checks for singularities and is capable handling several right-hand sides. Contrary …

LARS algorithm

I've been working lately with Alexandre Gramfort coding the LARS algorithm in scikits.learn. This algorithm computes the solution to several general linear models used in machine learning: LAR, Lasso, Elasticnet and Forward Stagewise. Unlike the implementation by coordinate descent, the LARS algorithm gives the full coefficient path along the …

Second scikits.learn coding sprint

Las week took place in Paris the second scikits.learn sprint. It was two days of insane activity (115 commits, 6 branches, 33 coffees) in which we did a lot of work, both implementing new algorithms and fixing or improving old ones. This includes: * sparse version of Lasso by coordinate …

Support for sparse matrices in scikits.learn

I recently added support for sparse matrices (as defined in scipy.sparse) in some classifiers of scikits.learn. In those classes, the fit method will perform the algorithm without converting to a dense representation and will also store parameters in an efficient format. Right now, the only classese that implements …

Flags to debug python C extensions.

I often find myself debugging python C extensions from gdb, but usually some variables are hidden because aggressive optimizations that distutils sets by default. What I did not know, is that you can prevent those optimizations by passing flags -O0 -fno-inline to gcc in keyword extra_compile_args (note: this will only …

July in Paris

One of the best things of spending summer in Paris: its parcs (here, with friends @ Parc Montsouris).

image0

Support Vector machines with custom kernels using scikits.learn

It is now possible (using the development version as of may 2010) to use Support Vector Machines with custom kernels in scikits.learn. How to use it couldn't be more simple: you just pass a callable (the kernel) to the class constructor). For example, a linear kernel would be implemented …

Howto link against system-wide BLAS library using numpy.distutils

If your numpy installation uses system-wide BLAS libraries (this will most likely be the case unless you installed it through prebuilt windows binaries), you can retrieve this information at compile time to link python modules to BLAS. The function get_info in numpy.distutils.system_info will return a dictionary that contains …

scikits.learn 0.2 release

Today I released a new version of the scikits.learn library for machine learning. This new release includes the new libsvm bindings, Jake VanderPlas' BallTree algorithm for *fast* nearest neighbor queries in high dimension, etc. Here is the official announcement. As usual, it can be downloaded from sourceforge or from …

Plot the maximum margin hyperplane with scikits.learn

Suppose some given data points each belong to one of two classes, and the goal is to decide which class a new data point will be in. In the case of support vector machines, a data point is viewed as a p-dimensional vector (2-dimensional in this example), and we want …

Fast bindings for LibSVM in scikits.learn

LibSVM is a C++ library that implements several Support Vector Machine algorithms that are commonly used in machine learning. It is a fast library that has no dependencies and most machine learning frameworks bind it in some way or another. LibSVM comes with a Python interface written in swig, but …