I've been working for some time on implementing a locally linear
embedding algorithm for the upcoming manifold module in scikit-learn.
While several implementations of this algorithm exist in Python, as far
as I know none of them is able to use a sparse eigensolver in the last
step of the …

The guys behind pythonxy have been kind enough to add the latest
scikit-learn as an additional plugin for their distribution. Having
scikit-learn being in both pythonxy and EPD will hopefully make it
easier to use for Windows users. For now I will continue
to make windows precompiled binaries, but pythonxy …

The following algorithm computes the Least squares solution || Ax -
b|| subject to the equality constrain Bx = d. It's a classic algorithm
that can be implemented only using a QR decomposition and a least
squares solver. This implementation uses numpy and scipy. It makes use
of the new linalg.solve_triangular function …

Profiling Python extensions has not been a pleasant experience for me,
so I made my own package to do the job. Existing alternatives were
either hard to use, forcing you to recompile with custom flags like
gprofile or desperately slow like valgrind/callgrind. The package I'll
talk about is called …

Yesterday was the scikit-learn coding sprint in Paris. It was great to
meet with old developers (Vincent Michel) and new ones: some of whom I
was already familiar with from the mailing list while others came just
to say hi and get familiar with the code. It was really great …

One thing I'd really like to see done in this Friday's scikit-learn
sprint is to have full support for Python 3. There's a branch were
the hard word has been done (porting C extensions, automatic 2to3
conversion, etc.), although joblib still has some bugs and no one has
attempted to …

**Update: a fast and stable norm was added to scipy.linalg in August
2011 and will be available in scipy 0.10** Last week I discussed with
Gael how we should compute the euclidean norm of a vector a using
SciPy. Two approaches suggest themselves, either calling
scipy.linalg.norm …

I was last weekend in FOSDEM presenting scikits.learn (here are
the slides I used at the Data Analytics Devroom). Kudos to Olivier
Grisel and all the people who organized such a fun and authentic
meeting!

Latest release of scikits.learn comes with an awesome collection of
examples. These are some of my favorites:

Based on the work of libsvm-dense by Ming-Wei Chang, Hsuan-Tien Lin,
Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu I patched the libsvm
distribution shipped with scikits.learn to allow setting weights for
individual instances. The motivation behind this is to be able force a
classifier to focus its attention in …

Highlights for this release: * New stochastic
gradient descent module by Peter Prettenhofer * Improved svm
module: memory efficiency, automatic class weights. * Wrap for
liblinear's Multi-class SVC (option multi_class in LinearSVC) * New
features and performance improvements of text feature extraction. *
Improved sparse matrix support, both in main classes (GridSearch) as in
sparse …

scikits.learn.svm now uses LibSVM-dense instead of LibSVM for
some support vector machine related algorithms when input is a dense
matrix. As a result most of the copies associated with argument passing
are avoided, giving 50% less memory footprint and several times less
than the python bindings that ship …

For some time now I've been missing a function in scipy that exploits
the triangular structure of a matrix to efficiently solve the associated
system, so I decided to implement it by binding the LAPACK method
"trtrs", which also checks for singularities and is capable handling
several right-hand sides. Contrary …

I've been working lately with Alexandre Gramfort coding the LARS
algorithm in scikits.learn. This algorithm computes the solution to
several general linear models used in machine learning: LAR, Lasso,
Elasticnet and Forward Stagewise. Unlike the implementation by
coordinate descent, the LARS algorithm gives the full coefficient path
along the …

Las week took place in Paris the second scikits.learn sprint. It was
two days of insane activity (115 commits, 6 branches, 33 coffees) in
which we did a lot of work, both implementing new algorithms and fixing
or improving old ones. This includes: * sparse version of Lasso by
coordinate …

I recently added support for sparse matrices (as defined in
scipy.sparse) in some classifiers of scikits.learn. In those classes,
the fit method will perform the algorithm without converting to a dense
representation and will also store parameters in an efficient format.
Right now, the only classese that implements …

I often find myself debugging python C extensions from gdb, but usually
some variables are hidden because aggressive optimizations that
distutils sets by default. What I did not know, is that you can prevent
those optimizations by passing flags -O0 -fno-inline to gcc in keyword
extra_compile_args (note: this will only …

One of the best things of spending summer in Paris: its parcs (here,
with friends @ Parc Montsouris).

It is now possible (using the development version as of may 2010) to use
Support Vector Machines with custom kernels in scikits.learn. How to use
it couldn't be more simple: you just pass a callable (the kernel) to the
class constructor). For example, a linear kernel would be implemented …

If your numpy installation uses system-wide BLAS libraries (this will
most likely be the case unless you installed it through prebuilt windows
binaries), you can retrieve this information at compile time to link
python modules to BLAS. The function get_info in
numpy.distutils.system_info will return a dictionary that contains …

Today I released a new version of the scikits.learn library for
machine learning. This new release includes the new libsvm bindings,
Jake VanderPlas' BallTree algorithm for *fast* nearest neighbor
queries in high dimension, etc. Here is the official announcement. As
usual, it can be downloaded from sourceforge or from …

Suppose some given data points each belong to one of two classes, and
the goal is to decide which class a new data point will be in. In the
case of support vector machines, a data point is viewed as a
p-dimensional vector (2-dimensional in this example), and we want …

LibSVM is a C++ library that implements several Support Vector
Machine algorithms that are commonly used in machine learning. It is a
fast library that has no dependencies and most machine learning
frameworks bind it in some way or another. LibSVM comes with a Python
interface written in swig, but …

Yesterday we had an extremely productive coding sprint for the
scikits.learn. The idea was to put people with common interests in a
room and make them work in a single codebase. Alexandre Gramfort and
Olivier Grisel worked on GLMNet, Bertrand Thirion and Gaël Varoquaux
worked on univariate feature selection …

Today I released the first public version of Scikit-Learn (release
notes). It's a python module implementing some machine learning
algorithms, and it's shaping quite good.

For this release I did not want to do any incompatible changes, so most of them are just bug fixes and
updates. For the next …