Mastering a data pipeline with Python: 6 years of learned lessons from mistakes

Robson Junior

Beginners Big Data Case Study Data Science Open-Source

See in schedule Download/View Slides

Building data pipelines are a consolidated task, there are a vast number of tools that automate and help developers to create data pipelines with few clicks on the cloud. It might solve non-complex or well-defined standard problems. This presentation is a demystification of years of experience and painful mistakes using Python as a core to create reliable data pipelines and manage insanely amount of valuable data. Let's cover how each piece fits into this puzzle: data acquisition, ingestion, transformation, storage, workflow management and serving. Also, we'll walk through best practices and possible issues. We'll cover PySpark vs Dask and Pandas, Airflow, and Apache Arrow as a new approach.

Type: Talk (45 mins); Python level: Beginner; Domain level: Beginner

Robson Junior

Microsoft

Robson is a developer deeply involved with software communities, especially the Python community. I've been organizing conferences and meetups since 2011 and effectively speaking in conferences since 2012 about python and cloud technologies and since 2016 about data-related technologies. Also as an Independent consultant, I conduct on-demand architecture consultancy and training sessions about data-related technologies.

Mastering a data pipeline with Python: 6 years of learned lessons from mistakes

Robson Junior

Robson Junior

Microsoft

Registration

Program

Setup

EuroPython

FAQ

Mastering a data pipeline with Python: 6 years of learned lessons from mistakes

Robson Junior

Robson Junior

Microsoft

Registration

Program

Setup

Sponsor

EuroPython

FAQ