PyData Amsterdam 2024

How I hacked UMAP and won at a plotting contest
09-20, 11:50–12:25 (Europe/Amsterdam), Escher

In this talk, I’ll share my journey of animating UMAP, a cutting-edge dimensionality reduction algorithm, by visualizing not just its final output but each intermediate step as well. I’ll explain why and how I modified UMAP’s source code, while also demonstrating the use of Polars for data wrangling, Plotnine for visualization, and ffmpeg for animation. The result ultimately earned me a runner-up position in the 2024 Plotnine plotting contest.


Recently, I set out to deepen my understanding of UMAP, a cutting-edge dimensionality reduction algorithm. Despite its fascinating theoretical foundation, which is rooted in Riemannian geometry and algebraic topology, I found the math a bit overwhelming. So, I decided to take a different approach. Instead of just visualizing the final output (a two-dimensional projection of a high-dimensional dataset), I wanted to visualize all the intermediate steps as well.

In this talk, I'll take you through my journey of hacking UMAP’s source code, wrangling data with Polars, visualizing data with Plotnine, and finally animating the visualizations with ffmpeg. The result? A runner-up finish in the 2024 Plotnine plotting contest.

No prior experience is needed; I’ll introduce each tool and concept as we go, ensuring everything is easy to follow. By the end, you’ll have a toolkit of tips and tricks to start visualizing algorithms yourself.

Jeroen Janssens, PhD, is a data science consultant and certified instructor. His expertise lies in visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. He’s passionate about helping and teaching others to do such things.

Jeroen works as a Senior Machine Learning Engineer at Xomnia in Amsterdam. Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and several startups in New York City.

He is the author of Data Science at the Command Line (O’Reilly, 2021) and Python Polars: The Definitive Guide (with Thijs Nieuwdorp; to be published by O’Reilly in January 2025). Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.

Website: https://jeroenjanssens.com