PyData Amsterdam 2024

Mastering Data Flow: Empower Your Projects with Prefect's Pipeline Magic
09-18, 13:30–15:00 (Europe/Amsterdam), Rokin Room - OBA Oosterdok

Embark on a transformative journey into the realm of data engineering with our 90-minute workshop dedicated to, recently released, Prefect 3. In this hands-on session, participants will learn the ins and outs of building robust data pipelines using the latest features and enhancements of Prefect 3. From data ingestion to advanced analytics, attendees will gain hands-on experience and practical insights to elevate their data engineering skills.


Join us for an engaging workshop where we'll dive deep into the world of data engineering with Prefect 3. Throughout the session, participants will explore the following key topics:

  • Overview of Prefect and its core features
  • Understanding the Prefect ecosystem and its integration with popular data science tools
  • Setting up a Prefect environment: Installation, configuration, and project setup

Building Data Pipelines:
- Data ingestion: Fetching data from various sources including RSS feeds, APIs, and databases
- Data transformation and manipulation using Prefect tasks and flows
- Data storage and persistence: Storing processed data into local databases such as MongoDB or SQL
- Integrating machine learning models for advanced data processing and analysis within Prefect workflows

Advanced Techniques and Best Practices:
- Implementing error handling and retry strategies for fault tolerance and reliability
- Sending real-time alerts and notifications based on pipeline analysis using Prefect's notification features
- Exploring Prefect's advanced features such as parallel execution, versioning, and dependency management

Workshop Materials and Requirements:
Participants will have access to workshop materials, including code examples, instructions, and sample datasets, which will be provided in advance via GitHub. To ensure seamless participation, attendees are required to have Docker installed on their machines as we will be running services locally through Docker or utilising free cloud services for certain components.

By the end of the workshop, attendees will have gained a comprehensive understanding of Prefect 3 and its capabilities, empowering them to design, execute, and optimise data pipelines efficiently in real-world scenarios.

We invite you to join us on this exciting journey of mastering data flows with Prefect!

In advance of the workshop please visit the github repo here: https://github.com/Cadarn/PyData-Prefect-Workshop. Clone a copy of the repository and follow the setup instructions in the README file including:
- Setting up a new Python environment with all the required modules
- Installing Docker and Docker Compose if necessary and then running the docker compose file so that local copies of the software images are downloaded in advance.
- Create a free MongoDB Atlas account and setup a project/database for the workshop

See also: Workshop materials

I spent 10 years as an astrophysics researcher analysing high-energy data from space telescopes in the search for new objects in the universe and a better understanding of what we already knew to be out there. In 2015 I transitioned to data science joining a smart-cities startup called HAL24K. Over the next 8 years, I built data science solutions that enabled city governments and suppliers to derive actionable intelligence from their data to make cities more efficient, better informed, and better use of resources. During that time I built and led a team of 10 data scientists and helped the company spin out four new companies. In 2022, I joined ComplyAdvantage as a Senior Data Scientist working to combat financial crime and fraud.

I have been an active member of the PyData community since 2015 and founded PyData Southampton in 2023. I am also a long-time supporter of DataKind UK in their mission to bring pro-bono data science support to charities and NGOs in the third sector.

Senior Research Fellow at University of Southampton
Research Software Engineer for the Legacy Survey of Space and Time, UK