PyData Amsterdam 2024

Understanding Polars Expressions when you're used to pandas
09-19, 10:35–11:10 (Europe/Amsterdam), Escher

When it comes to dataframes, pandas is the go-to library for many people. Yet Polars is taking the world by storm, and so many data practitioners are curious about trying it out. There is a learning curve though, as Polars introduces some concepts which pandas users might not be familiar with. This talk will be a deep dive into one of those concepts (expressions) and will focus on how you can understand them from a pandas perspective.


If you've taken a Polars tutorial, or read a Polars blog post before, you'll like have seen Polars expressions (pl.col) everywhere. People are usually quick to gain an intuition for how they work - but it's often not clear to beginners how far they go. For example, pl.col('a').mean() is supported - but how about pl.col('a').filter(pl.col('b') > pl.col('c').mean()).max()? When can you combine an expression with over or group_by?

This presentation will give you a clear mental model to think about expressions with. It is specifically designed to be understandable for people coming from pandas.

The presentation will (roughly) follow the following format:
- Intro: why should you care about Polars?
- Super-fast Polars crash course
- Polars expressions: what are they?
- Let's try re-implementing Polars expressions...in pandas!
- Expressions in group-by / over contexts

Data practitioners - whether they are Polars novices or Polars experts - are expected to benefit and learn something new from the talk.

Marco is a core dev of pandas and Polars and works at Quansight Labs as Senior Software Engineer. He also consults and trains clients professionally on Polars. He has also written the first Polars Plugins Tutorial and has taught Polars Plugins to clients.

He has a background in Mathematics and holds an MSc from the University of Oxford, and was one of the prize winners in the M6 Forecasting Competition (2nd place overall Q1).

This speaker also appears in: