Low-level routines for Support Vector Machines

⊕
By Fabian Pedregosa.

Category: General, Python, scikit-learn

Wed 27 April 2011

I've been working lately in improving the low-level API of the libsvm bindings in scikit-learn. The goal is to provide an API that encourages an efficient use of these libraries for expert users. These are methods that have lower overhead than the object-oriented interface as they are closer to the …

new get_blas_funcs in scipy.linalg

⊕
By Fabian Pedregosa.

Category: General, Python, scipy

Sat 23 April 2011

Today got merged some changes I made to function scipy.linalg.get_blas_funcs(). The main enhacement is that get_blas_funcs() now also accepts a single string as input parameter and a dtype, so that fetching the BLAS function for a specific type becomes more natural. For example, fetching the gemm routine for …

Locally linear embedding and sparse eigensolvers

⊕
By Fabian Pedregosa.

Category: General, Python, scikit-learn

Thu 21 April 2011

I've been working for some time on implementing a locally linear embedding algorithm for the upcoming manifold module in scikit-learn. While several implementations of this algorithm exist in Python, as far as I know none of them is able to use a sparse eigensolver in the last step of the …

scikits.learn is now part of pythonxy

⊕
By Fabian Pedregosa.

Category: General, Python, scikit-learn

Wed 20 April 2011

The guys behind pythonxy have been kind enough to add the latest scikit-learn as an additional plugin for their distribution. Having scikit-learn being in both pythonxy and EPD will hopefully make it easier to use for Windows users. For now I will continue to make windows precompiled binaries, but pythonxy …

Least squares with equality constrain

⊕
By Fabian Pedregosa.

Category: Python, Tecnología

Thu 14 April 2011

The following algorithm computes the Least squares solution || Ax - b|| subject to the equality constrain Bx = d. It's a classic algorithm that can be implemented only using a QR decomposition and a least squares solver. This implementation uses numpy and scipy. It makes use of the new linalg.solve_triangular function …

A profiler for Python extensions

⊕
By Fabian Pedregosa.

Category: General, Python

Wed 06 April 2011

Profiling Python extensions has not been a pleasant experience for me, so I made my own package to do the job. Existing alternatives were either hard to use, forcing you to recompile with custom flags like gprofile or desperately slow like valgrind/callgrind. The package I'll talk about is called …

scikit-learn coding sprint in Paris

⊕
By Fabian Pedregosa.

Category: General, scikit-learn

Sat 02 April 2011

Yesterday was the scikit-learn coding sprint in Paris. It was great to meet with old developers (Vincent Michel) and new ones: some of whom I was already familiar with from the mailing list while others came just to say hi and get familiar with the code. It was really great …

py3k in scikit-learn

⊕
By Fabian Pedregosa.

Category: General

Mon 28 March 2011

One thing I'd really like to see done in this Friday's scikit-learn sprint is to have full support for Python 3. There's a branch were the hard word has been done (porting C extensions, automatic 2to3 conversion, etc.), although joblib still has some bugs and no one has attempted to …

Computing the vector norm

⊕
By Fabian Pedregosa.

Category: misc
#linear algebra #norm #scipy

Tue 15 February 2011

Update: a fast and stable norm was added to scipy.linalg in August 2011 and will be available in scipy 0.10 Last week I discussed with Gael how we should compute the euclidean norm of a vector a using SciPy. Two approaches suggest themselves, either calling scipy.linalg.norm …

Smells like hacker spirit

⊕
By Fabian Pedregosa.

Category: misc
#python #sklearn

Fri 11 February 2011

I was last weekend in FOSDEM presenting scikits.learn (here are the slides I used at the Data Analytics Devroom). Kudos to Olivier Grisel and all the people who organized such a fun and authentic meeting!

New examples in scikits.learn 0.6

⊕
By Fabian Pedregosa.

Category: General, scikit-learn, Tecnología

Fri 31 December 2010

Latest release of scikits.learn comes with an awesome collection of examples. These are some of my favorites:

Faces recognition

This example by Olivier Grisel, downloads a 58MB faces dataset from Labeled Faces in the Wild, and is able to perform PCA for feature extraction and SVC for classification, yielding …

Weighted samples for SVMs

⊕
By Fabian Pedregosa.

Category: sklearn, python

Mon 29 November 2010

Based on the work of libsvm-dense by Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu I patched the libsvm distribution shipped with scikits.learn to allow setting weights for individual instances. The motivation behind this is to be able force a classifier to focus its attention in …

Coming soon ...

⊕
By Fabian Pedregosa.

Category: scikit-learn, Tecnología

Wed 24 November 2010

Highlights for this release: * New stochastic gradient descent module by Peter Prettenhofer * Improved svm module: memory efficiency, automatic class weights. * Wrap for liblinear's Multi-class SVC (option multi_class in LinearSVC) * New features and performance improvements of text feature extraction. * Improved sparse matrix support, both in main classes (GridSearch) as in sparse …

memory efficient bindigs for libsvm

⊕
By Fabian Pedregosa.

Category: General, scikit-learn

Fri 19 November 2010

scikits.learn.svm now uses LibSVM-dense instead of LibSVM for some support vector machine related algorithms when input is a dense matrix. As a result most of the copies associated with argument passing are avoided, giving 50% less memory footprint and several times less than the python bindings that ship …

solve triangular matrices using scipy.linalg

⊕
By Fabian Pedregosa.

Category: scipy, Tecnología

Sat 30 October 2010

For some time now I've been missing a function in scipy that exploits the triangular structure of a matrix to efficiently solve the associated system, so I decided to implement it by binding the LAPACK method "trtrs", which also checks for singularities and is capable handling several right-hand sides. Contrary …

LARS algorithm

⊕
By Fabian Pedregosa.

Category: misc
#scikit-learn #sparse

Thu 30 September 2010

I've been working lately with Alexandre Gramfort coding the LARS algorithm in scikits.learn. This algorithm computes the solution to several general linear models used in machine learning: LAR, Lasso, Elasticnet and Forward Stagewise. Unlike the implementation by coordinate descent, the LARS algorithm gives the full coefficient path along the …

Second scikits.learn coding sprint

⊕
By Fabian Pedregosa.

Category: scikit-learn

Sun 12 September 2010

Las week took place in Paris the second scikits.learn sprint. It was two days of insane activity (115 commits, 6 branches, 33 coffees) in which we did a lot of work, both implementing new algorithms and fixing or improving old ones. This includes: * sparse version of Lasso by coordinate …

Support for sparse matrices in scikits.learn

⊕
By Fabian Pedregosa.

Category: General

Mon 23 August 2010

I recently added support for sparse matrices (as defined in scipy.sparse) in some classifiers of scikits.learn. In those classes, the fit method will perform the algorithm without converting to a dense representation and will also store parameters in an efficient format. Right now, the only classese that implements …

Flags to debug python C extensions.

⊕
By Fabian Pedregosa.

Category: General

Wed 18 August 2010

I often find myself debugging python C extensions from gdb, but usually some variables are hidden because aggressive optimizations that distutils sets by default. What I did not know, is that you can prevent those optimizations by passing flags -O0 -fno-inline to gcc in keyword extra_compile_args (note: this will only …

July in Paris

⊕
By Fabian Pedregosa.

Category: General

Fri 30 July 2010

One of the best things of spending summer in Paris: its parcs (here, with friends @ Parc Montsouris).

Support Vector machines with custom kernels using scikits.learn

⊕
By Fabian Pedregosa.

Category: General, scikit-learn, Tecnología

Thu 27 May 2010

It is now possible (using the development version as of may 2010) to use Support Vector Machines with custom kernels in scikits.learn. How to use it couldn't be more simple: you just pass a callable (the kernel) to the class constructor). For example, a linear kernel would be implemented …

Howto link against system-wide BLAS library using numpy.distutils

⊕
By Fabian Pedregosa.

Category: General

Thu 22 April 2010

If your numpy installation uses system-wide BLAS libraries (this will most likely be the case unless you installed it through prebuilt windows binaries), you can retrieve this information at compile time to link python modules to BLAS. The function get_info in numpy.distutils.system_info will return a dictionary that contains …

scikits.learn 0.2 release

⊕
By Fabian Pedregosa.

Category: General

Mon 22 March 2010

Today I released a new version of the scikits.learn library for machine learning. This new release includes the new libsvm bindings, Jake VanderPlas' BallTree algorithm for *fast* nearest neighbor queries in high dimension, etc. Here is the official announcement. As usual, it can be downloaded from sourceforge or from …

Plot the maximum margin hyperplane with scikits.learn

⊕
By Fabian Pedregosa.

Category: General, scikit-learn, Tecnología

Wed 17 March 2010

Suppose some given data points each belong to one of two classes, and the goal is to decide which class a new data point will be in. In the case of support vector machines, a data point is viewed as a p-dimensional vector (2-dimensional in this example), and we want …

Fast bindings for LibSVM in scikits.learn

⊕
By Fabian Pedregosa.

Category: General, scikit-learn, Tecnología

Tue 09 March 2010

LibSVM is a C++ library that implements several Support Vector Machine algorithms that are commonly used in machine learning. It is a fast library that has no dependencies and most machine learning frameworks bind it in some way or another. LibSVM comes with a Python interface written in swig, but …