PyData Amsterdam 2024

Building a Data Platform from scratch
09-19, 12:05–12:40 (Europe/Amsterdam), Van Gogh

Ever wondered how to start from scratch, without any existing data infrastructure? In this talk, I will share my experience of building a data platform from scratch at a startup. This talk is intended for data (platform) engineers, data scientists, and anyone interested in building a scalable data platform in the cloud using open-source tools.

I will discuss the challenges faced in designing and implementing this platform, as well as the lessons learned along the way. We'll answer questions such as, why build a data platform at a startup? Why pick open source over alternatives? How to deploy data infrastructure on Kubernetes? How to build the first data products?


This talk will start by answering the question: Why build a data platform at a startup? After which we'll dive into how it was built.

  • Designing a scalable data platform on GCP (5 mins)
  • Deploying open source data infra (PeerDB, Airflow, Airbyte, Grafana) on Kubernetes with auto-scaling (5 mins).
  • Leveraging Airflow, Airbyte and Spark on BigQuery to power data analytics and orchestrating ETL processing. (10 mins)
  • Building data imports and exports using Python, Postgres, FastAPI, and Docker (10 minutes).

In the end you should have a good idea on considerations when building a data platform, what the pitfalls are and how to get value from it.

I write and speak about the learnings and challenges I face in the data world, from the perspective of having worked in various data roles (Data Science, ML & Data Engineering, Tech Lead).

Currently I'm working at Palm as Data & AI Lead.

Before this I've built the data platform from scratch at Solvimon. Solvimon is a startup tackling the entire billing ecosystem by building a modern and flexible platform.

In my previous role at Adyen, I led the initiative that resulted in their first end-to-end machine learning solution, which boosted payment conversion rates and generated hundred millions of euros in additional revenue for leading global merchants (e.g. Spotify, Microsoft, Meta).