PyData Amsterdam 2024

Vincent D. Warmerdam

Vincent is a senior data professional, and recovering consultant, who worked as an engineer, researcher, team lead, and educator in the past. I’m especially interested in understanding algorithmic systems so that one may prevent failure. As such, he prefers simpler solutions that scale and worry more about data quality than the number of tensors we throw at a problem. He's also well known for creating calmcode as well as a small dozen of open-source packages.

He's currently employed at probabl where he works together with scikit-learn core maintainers to improve the ecosystem of tooling.

The speaker's profile picture

Sessions

09-19
10:35
35min
Run a benchmark they said. It will be fun they said.
Vincent D. Warmerdam

This is the story of a fun idea that turned into a huge benchmark before it turned into a rabbit hole.

I was trying to figure out reasonable default parameters for some of the components in the skrub library. In order to do that I was looking for datasets with a permissible license which I could use for benchmarking. This is how I stumbled on some old Kaggle competitions that still had their datasets publicly available. So I should just run a simple benchmark, right?

That's where it all started. There were many lessons. They will all be shared.

Rembrandt