09-19, 12:05–12:40 (Europe/Amsterdam), Van Gogh
Ever wondered how to start from scratch, without any existing data infrastructure? In this talk, I will share my experience of building a data platform from scratch at a startup. This talk is intended for data (platform) engineers, data scientists, and anyone interested in building a scalable data platform in the cloud using open-source tools.
I will discuss the challenges faced in designing and implementing this platform, as well as the lessons learned along the way. We'll answer questions such as, why build a data platform at a startup? Why pick open source over alternatives? How to deploy data infrastructure on Kubernetes? How to build the first data products?
This talk will start by answering the question: Why build a data platform at a startup? After which we'll dive into how it was built.
- Designing a scalable data platform on GCP (5 mins)
- Deploying open source data infra (PeerDB, Airflow, Airbyte, Grafana) on Kubernetes with auto-scaling (5 mins).
- Leveraging Airflow, Airbyte and Spark on BigQuery to power data analytics and orchestrating ETL processing. (10 mins)
- Building data imports and exports using Python, Postgres, FastAPI, and Docker (10 minutes).
In the end you should have a good idea on considerations when building a data platform, what the pitfalls are and how to get value from it.
I write and speak about the learnings and challenges I face in the data world, from the perspective of having worked in various data roles (Data Science, ML & Data Engineering, Tech Lead).
Currently I'm working at Palm as Data & AI Lead.
Before this I've built the data platform from scratch at Solvimon. Solvimon is a startup tackling the entire billing ecosystem by building a modern and flexible platform.
In my previous role at Adyen, I led the initiative that resulted in their first end-to-end machine learning solution, which boosted payment conversion rates and generated hundred millions of euros in additional revenue for leading global merchants (e.g. Spotify, Microsoft, Meta).