PyData Amsterdam 2024

From language to marketing: RNNs for data-driven multi-touch attribution at Booking.com
09-19, 13:40–14:30 (Europe/Amsterdam), Rembrandt

Are you a data scientist thinking about the efficacy of digital marketing channels and the link between marketing and a customer’s decision to buy? Or, do you wonder about language models and their application to other sequence modelling tasks? This talk will introduce the field of marketing attribution modelling and show that language/sequence models (like attention-based RNNs) can be modified to function as flexible attribution models. In the course of the talk, we will use our work at Booking.com to illustrate the challenges involved in such re-purposing of language models to attribution, esp. in the evaluation of these models in the absence of “ground truth” signals.


Online marketing has become essential to the success of businesses (online or not) and the number of digital marketing platforms has grown over time. As a result, the measurement of the effectiveness of different digital marketing channels has become as important for the online marketer as the actual marketing content posted on those channels. In addition, online marketing has ensured that businesses are able to approach and attract a much wider and diverse customer base which means that the efficacy of a marketing channel varies across customers. As a result, cross channel-level marketing evaluation strategies (eg. market mix modelling) or rule-based approaches (eg. first/last touch attribution, uniform multi-touch attribution) can lead to biased effectiveness measurements which in turn cause inefficient marketing spend in the future.

In this talk, we will present data-driven multi-touch attribution modelling as a way to measure the efficacy of online marketing in a user- and channel-dependent manner. Using the lessons from our work with attribution modelling at Booking.com, we will divide our talk into 3 main sections:

  1. Business and data context: We will begin with a general introduction and overview of data-driven multi-touch attribution modelling as a ML task without “ground truth” labels (approx. 20% of the time allotted for the talk).
  2. Model architecture and offline evaluation: Multi-touch attribution as a language-like ML task for which attention-based RNNs can function as flexible models. We will follow the general idea proposed in Arava et al., 2018 with separate neural networks handling marketing- and user-related features and show the pitfalls involved in treating attribution purely as a sentence/language task. We will introduce modifications (using Pytorch) to the original additive attention proposed for neural machine translation (Bahdanau et al., 2014) which is more suited to the multi touch attribution task. We will end this section with strategies for offline evaluation of attribution models, even in the absence of “ground truth” attribution labels (approx. 50% of the time allotted for the talk).
  3. Online experiment design: A convincing evaluation of real-world business impact at Booking.com requires a strong experimentation design. We will talk about the challenges involved in randomization and measurement approaches for online A/B testing of multi-touch attribution models (approx. 30% of the time allotted for the talk).

This talk will appeal to ML and data science practitioners across a wide spectrum of skill levels and interests in marketing, language and sequence modelling, and online experimentation. For beginners, the first half of the talk will introduce the concepts of marketing customer journeys and data-driven multi-touch attribution modelling and how they are applied in our work at Booking.com. For the more experienced attendees, the talk will illustrate similarities and differences between language- and attribution-modelling and the pitfalls to be aware of when using language models for the attribution task. The final section of the talk will provide real-life examples of challenges involved in randomization and measurement approaches in A/B testing which will be useful to any data science practitioner thinking about online experimentation.

I am a Senior Machine Learning Scientist at Booking.com based at the global headquarters in Amsterdam. My machine learning interests lie primarily in probabilistic/Bayesian modelling and currently I work on applying these ideas to attribution modelling. When not crunching statistics or building ML models, you can find me coasting on one of my bicycles or gazing at landscapes through a train window.

I’m a Data Scientist with 5 years of experience at Booking.com. My academic background is in Applied Mathematics. Prior to joining Booking.com, I worked in the telecommunications industry for 3 years.

Outside of work, I’m passionate about traveling and exploring new cultures with my family. Additionally, I enjoy playing tennis with my son – it's our favourite way to stay active and have fun together.

I’m excited to be a part of the PyData community and look forward to connecting with fellow data enthusiasts!