Duo, the Push, and the Bandits

Why am I terrified of a 2D owl?

May 23, 2022

Some people say that Duo, the light-green mascot of Duolingo, an app that helps you learn different languages by prodding you to practice, lives in a tree, much like a real spectacled owl. Then, there are those that whisper that Duo lives among us and watches us, wearing a backpack carrying all of our sins.

I personally believe that Duo mostly exists as a collection of bytes that originates in the shadowlands beyond the Gates of Hell. The distributed spirit of Duo peacefully roams the underworld and grows stronger by feeding on energy of lost souls and cloud-based microservices until every day at 8 PM Eastern Time, driven by an incantation from Hades himself and a multi-armed bandit push notification strategy promoted to production in Scala, the devil’s own language, Duo reconstitutes into its full avian form and appears on my phone to guilt me into learning Italian.

Why am I doing this to myself?

After a year of focusing entirely on technical growth, I realized that I needed The Arts Back in my Life. One of the realizations that came after watching a bunch of Sorrentino movies was that I wanted to pick up Italian again. Once upon a time in high school, I took Latin for four years, and then decided that I made a terrible mistake learning a language that has 30+ potential ways you could mess up the ending of a noun, not to mention not a single person alive will be able to correct your pronunciation. So, I decided to do a linguistic pivot and started teaching myself Italian, motivated mostly by my extreme desire to understand Celentano.

Fast forward to this February, when I decided to pick up Italian again so I could learn a language that’s not mostly made up of JVM stacktrace errors.

Vicki @vboykis

If your Spark job generates a large enough JVM stack trace, a CVS pharmacy receipt gets printed in the real world.

Pushing me Into The Worst Decision I’ve ever Made in My Life

Starting Duolingo, something that more and more people have done over the past couple years if this is any indicator, is the worst decision I’ve ever made in my life. No matter what has happened in my day or what my energy level is, the Owl doesn’t care. The Owl is always ready.

Duolingo’s notifications encouraging you to use the app to practice are so timely, on-point, so repetitive, and so passive-aggressive, that they’ve become a meme in the industry over the past couple years.

The notifications became really fascinating to me, because they are one of the few push notifications that I actually have enabled. So I got to wondering, why are they so terrifying, and why are they so good, to the point that they’ve become a meme that even the company winkingly acknowledges?

Push it good

The answer lies in a combination of how they’re implemented technologically plus how the technological implementation ties into the psychology of the intent of the app. First, let’s talk about how push notifications work. A long time ago in a galaxy far away, ARPANet started. Ok, maybe too far back. Let’s fast-forward a bit to 2003, when Research in Motion, the company behind the ill-fated Blackberry phone, started sending notifications that would pop up on the device’s screen when people received an email. Steve Jobs took notice and Apple made the Apple Push Notification Service available in 2009, changing the mobile landscape.

The first push notifications were extremely basic, but it soon became clear that companies could engage consumers with notifications based on usage events, such as shopping cart abandonment in e-commerce, basic user inactivity over a given time period, and once rich push notifications became an option, based on geotargeting, or user browsing behavior.

Then, companies started varying the copy, or the content of the push notification themselves, based on a number of heuristics like the age of the user on the platform (new user? returning?), the time of day, and even the language.

Today, many push notifications are based on surfacing personalized content so that users are more likely to engage with them. The push notifications that are content-dependent are often backed by machine learning approaches that figure out the optimal time and the optimal push to send. This is especially important so you don’t irritate users who have push notifications enabled. What many companies end up doing as a result, is running experiments on the types of push notifications to send to different groups, at different times.

Push it good, Duo

Duolingo has been running A/B tests for a long time. They’ve gotten really good at picking a group of correct messages. But previously, they were sending them at random.

Duolingo’s push notifications, at least as of this paper a few years ago, are based on a complex set of heuristics that involve modifying multi-armed bandits. Multi-armed bandits are a form of reinforcement learning, a branch of machine learning, given several different paths, tries to optimize between exploring those diverse paths and finding the paths that offer the highest reward and just traversing down those specific paths, while controlling for any additional business logic that might be important.

What does this mean in terms of these push notifications? Sending a push notification and having the user open the app and complete the lesson is the ultimate goal (aka reward in this case), and here are the metrics Duolingo tracks on it:

Daily Active Users (DAUs): The number of distinct users in the bucket that opened and interacted with the Duolingo application on their device each day.
Total Lessons Completed: The total number of language lessons that users in the bucket completed.

DX Recurring Retention: The percentage of users who opened the Duolingo app on a given day who also did so X days later. This is further divided into new users (those whose accounts were created less than 48 hours before their first notification in the experiment) and existing users

In the general case of multi-armed bandits, each arm (aka path to a completed lesson) is assumed to be available and the model alternates between optimizing on these paths and sending the one that works the best over and over again. This doesn’t work for Duolingo because language learners want to see a variety of push notifications in order to re-engage them.

For example, I get this one a lot and it makes me feel super guilty every time. At some point, I’m just going to ignore it instead of clicking through.

It would make sense for the bandit to optimize against sending it to me and penalize by frequency. Also, not every notification makes sense for every user:

For example, some templates in reference a user’s “streak wager,” a game mechanic applicable to only a fraction of users on any given day. As such, each template has defined eligibility criteria to prevent the template from being sent to an inappropriate user.

What Duolingo did then, is a bunch of work to better use historical data for push notification opens to figure out the best time and message to send with these notifications.

Why are the pushes so good?

So, it’s true that this new policy increased engagement rate for push notifications. In fact, the paper notes that they experienced an increase of .5% in daily active users as a result of the experiment (and more importantly new user retention, the lifeblood of any app, increased by over 2%.) That’s nothing to sneeze at:

However, what’s even more interesting: the authors note in the paper is that, even before implementing fancy bandits, going into the A/B test, Duolingo’s notifications worked well already:

The control group’s templates would be selected via the legacy algorithm (i.e., using a uniform random distribution over the eligible templates each round). Since the pools of templates and other aspects of the notification system had already been optimized through years of A/B tests, this provided a strong baseline to compare against.

This made me come back to the idea that something that I’ve talked about before is that tech can be very good and effective if you combine the goals of the tech with human instincts that are already built in. For example, one recommender system that many people liked in this Normcore discussion thread was Spotify’s, because the company’s goal of getting you to stay on the platform longer and finding the music valuable is very much in line with the listener’s goals of finding new and relevant music.

The same is the case for Duolingo. Push notifications are meant to get me to engage with the app for the sake of the company’s bottom line. Everything in this paper tells us this is the case - multi-armed bandits require an enormous amount of infrastructure, as does collecting millions and millions of lines of log events, cleaning those events, making them available for machine learning, training the algorithms, then preparing the algorithms for A/B tests, running the A/B tests, interpreting the results, and promoting them to production.

Just look how immense and involved this system to build these bandits is in comparison to the size of lift. Check out this architecture, which involves at the very minimum Kinesis (a Kafka-like service), Spark, Elasticbeanstalk, Cloudwatch, and a zoo of other AWS services to log data, train the model, run the model, and send the actual notification. There is no way the company would invest money in this infrastructure if building it also didn’t produce new DAUs, new completed lessons, and new revenue.

But, what’s important is that from a user perspective, the push notifications for Duolingo work with you in a way most push notifications don’t. Most push notifications make you anxious and their goal is a boost in short-term engagement. But, if you as a language learner don’t practice every day, you forget absolutely everything, which I learned when I took a month off Italian and on day 31, when I logged on again, and immediately forgot the difference between la pasta and il pasto.

So, everything in Duolingo optimizes for you coming back, but also for you learning, and here, the push notification is here to help you create a habit that not only boosts their DAUs, but also your connection to your language. And this is the kind of good synergy that works really well when user and app systemic incentives are aligned, and it’s not clear how much the machine learning helps here, versus how much the underlying goal that’s already there is a strong foundation.

If you don’t practice every day, you will never understand Celentano. So in that sense, the goals of me and Duolingo are both very interested in having Duo continue to bother me, which is what makes this dive into how these push notifications work so interesting. Speaking of which, it’s 9:28 PM again, which means I need to wrap up this newsletter and ritornare a Duo before it comes for me.

The Newsletter:

This newsletter’s M.O. is takes on tech news that are rooted in humanism, nuance, context, and a little fun. It goes out whenever I can get around to it. If you like it, forward it to friends!

Swag: Stickers. Mug. Notepad.

The Author:

I’m a machine learning engineer. Most of my free time is spent wrangling two small kids, reading, and writing bad tweets. Find out more here or follow me on Twitter.

Normcore Tech

Discussion about this post