Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

$ \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\min}{min} \newcommand{\XX}{\mathcal{X}} \newcommand{\RR}{\mathbb{R}} \newcommand{\ff}{\mathbf{f}} $

the

making-of

Fabian Pedregosa,

Data Science Meetup

Sheffield, April 2014

Outline

  • 1. Why scikit-learn
  • 2. Project Vision
  • 3. Key Features
  • 4. Current Development

1

Who

2

Day-to-day technicalities

3

Project conception

4

Project Vision

Generic machine learning library.

5

Project Vision

Easy to use machine learning library.

6

Project Vision

Community project

7

The beginning

8

The Example Gallery

9

Community

10

Welcome Newcomers

11

The API

12

JMLR paper

13

But also ...

Highlighting the need for more controlled code review

14

Scikit-learn today

15

Key features

Consistent API

16

Key features

Comprehensive

Clustering, Covariance Estimators, Matrix Decomposition, Ensemble Methods, Feature Extraction, Feature Selection, Gaussian Processes, Isotonic regression, Kernel Approximation, Semi-Supervised Learning, Linear Discriminant Analysis, Generalized Linear Models, Manifold Learning, Gaussian Mixture Models, Multiclass and multilabel classification, Naive Bayes, Nearest Neighbors, Neural network models, Cross decomposition (PLS), Quadratic Discriminant Analysis, Random projections, Support Vector Machines, Decision Trees

17

Key features

Healthy community-driven project

credit: F. Perez, A. Meurer

18

Key features

Healthy community-driven project

credit: F. Perez, A. Meurer

19

Key features

Fast (most methods, most usercases)

credit: Gilles Louppe

20

Key features

Effortless parallelization

21

Key features

Documentation

demo

22

Big Data

23

Work in Progress

23

Impact

Not only in academia!

24

Acknowledgment

To all the people that have contributed, but specially

(Granada coding sprint 2011)

25