Diffprivlib: Privacy-preserving machine learning with Scikit-learn

Train machine learning models with differential privacy guarantees

Naoise Holohan

Data Privacy Data Protection Machine-Learning Open-Source Scientific Libraries (Numpy/Pandas/SciKit/...)

See in schedule Download/View Slides

Data privacy is having an ever-increasing impact on the way data is stored, processed, accessed and utilised, as the legal and ethical effects of data protection regulations take effect around the globe. Differential privacy, considered by many to be the strongest privacy guarantee currently available, gives robust, provable guarantees on protecting privacy, and allows tasks to be completed on data with guarantees on the privacy of individuals in that data. This naturally extends to machine learning, where training datasets can contain sensitive personal information, that are vulnerable to privacy attacks on trained models.
By using differential privacy in the training process, a machine learning model can be trained to accurately represent the dataset at large, but without inadvertently revealing sensitive information about an individual. Diffprivlib is the first library of its kind to leverage the power of differential privacy with scikit-learn and numpy to give data scientists and researchers access to the tools to train accurate, portable models with robust, provable privacy guarantees built-in.
In this talk, we will introduce attendees to the idea of differential privacy, why it is necessary in today's world, and how diffprivlib can be seamlessly integrated within existing scripts to protect your trained models from privacy vulnerabilities. Attendees will be expected to have a basic understanding of sklearn (i.e., how to initialise, fit and predict a model). No knowledge of data privacy or differential privacy will be assumed or required.

Type: Talk (30 mins); Python level: Intermediate; Domain level: Beginner


Naoise Holohan

IBM Research Europe

Naoise Holohan is a research scientist in the AI, Privacy and Security Team at the Dublin Research Lab. Since joining IBM in 2017, he has been the differential privacy lead in the team, contributing to the Data Privacy Toolkit and leading the development of the Differential Privacy Library, an open source machine learning library for differential privacy. Prior to joining IBM, Naoise graduated with a PhD from Trinity College Dublin for his work on "Mathematical Foundations of Differential Privacy".