The Painless Route in Python to Fast and Scalable Machine Learning

Victoriya Fedotova, Frank Schlimbach

Analytics Big Data Distributed Systems Machine-Learning Scientific Libraries (Numpy/Pandas/SciKit/...)

See in schedule

Python is the lingua franca for data analytics and machine learning. Its superior productivity makes it the preferred tool for prototyping. However, traditional Python packages are not necessarily designed to provide high performance and scalability for large datasets.
From this talk you will learn how to get close-to-native performance with Intel-optimized packages, such as numpy, scipy, and scikit-learn. The next part of the talk is focused on getting high performance and scalability from multi-cores on a single machine to large clusters of workstations. It will be demonstrated that with Python it is possible to achieve the same performance and scalability as with hand-tuned C++/MPI code:
- Scalable Dataframe Compiler (SDC) is used to compile analytics code using pandas/Python and scale it to bare-metal cluster performance. It compiles a subset of Python code into efficient parallel binaries that use message passing to perform collective communications.
- A convenient Python API to data analytics and machine learning primitives (daal4py). While its interface is scikit-learn-like, its MPI-based engine allows to scale machine learning algorithms to bare-metal cluster performance.
- From the talk you will learn how to use SDC and daal4py together to build an end-to-end analytics pipeline that scales to clusters, requiring only minimal code changes.

Type: Talk (45 mins); Python level: Beginner; Domain level: Intermediate


Victoriya Fedotova

Intel

Victoriya is a leading software engineer working on development and optimization of data analytics and machine learning algorithms in Intel® oneAPI Data Analytics Library, a library for high-performance data scientific computations aimed to speedup big data workloads.
Before joining Intel® oneDAL team, she made a visible impact to extension of Intel® Math Kernel Library with statistical components such as Summary Statistics and Data Fitting. Victoriya has overall 10+ years of experience in software optimizations.
She holds master’s degree in Mathematics and Computer Science from Nizhny Novgorod State University.

Frank Schlimbach

Intel

Software Architect at Intel with HPC background and now pathfinding for making Python technologies ready for HPC.