If you’ve ever used YouTube’s autoplay feature, Amazon’s “More Like This”, anything on Netflix, or used Facebook’s News Feed, you’ve used one of the most common machine learning systems working to shape our online worlds today: recommender systems.
Also known as recsys in industry lingo, these systems are machine learning algorithms embedded in software that provide a set of content recommendations based on an individual user’s history of activity, aggregated across the entire user base.
Because there is so much content generated on most web platforms, recommender systems are usually constantly working to surface content that they think is relevant to you (aka what will make you stay on the platform longer.)
The most common recommendation system in use today is collaborative filtering, which works something like this. I read Shoe Dog. You read Shoe Dog. We both say that we liked Shoe Dog. I also read Bad Blood, and I liked it. The recommender system already knows that you like the same stuff that I read, so it recommends Bad Blood to you to read, as well.
There’s another way to do recommenders, as well: content filtering. Aka, instead of looking at user activity, looking at the content itself. We know that Bad Blood is similar to Shoe Dog because both are best-selling books about American businesses, so a recommendation based on these two books might be something like Walter Isaacson’s biography on Steve Jobs. If you’re interested in diving in deeper into the nitty gritty of how recommender systems work, do a search for matrix factorization.
Much of the consumer-facing internet runs on recommendation systems and how to tune them.
As a recent example of their pervasiveness,
80 percent of movies watched on Netix came from recommendations, 60 percent of video clicks came from home page recommendation in YouTube.
There are whole academic conferences sponsored by companies like Hulu, Amazon, Medium, and Spotify that have a keen interest in improving recommendations.
This all seems like a great system, what’s not to love?
A lot.
Recommender systems today have two huge problems that are leading companies (sometimes at enormous pressure from the public) to rethink how they’re being used: technical bias, and business bias.
A prime example of a company that’s been in the spotlight for recommender misuse YouTube. Let’s take a look at how the problem of recommender systems impacts the video site.
First is the issue of similar content from a technical perspective. Recommender systems will feed you similar content. In the case of business books, it’s (mostly) fine, unless you care about variety. If you like business books, their best bet to reduce errors is to give you more of that same type of books. But what if you don’t want to continue to read business books? What if you want to read fantasy? Unless you clicked on a fantasy title, the recommender system has no way to know that you like it. This is irritating for books, but a real problem for YouTube.
The threats, Mr. Cain explained, came from right-wing trolls in response to a video he had posted on YouTube a few days earlier. In the video, he told the story of how, as a liberal college dropout struggling to find his place in the world, he had gotten sucked into a vortex of far-right politics on YouTube.
Over years of reporting on internet culture, I’ve heard countless versions of Mr. Cain’s story: an aimless young man — usually white, frequently interested in video games — visits YouTube looking for direction or distraction and is seduced by a community of far-right creators.
The common thread in many of these stories is YouTube and its recommendation algorithm, the software that determines which videos appear on users’ home pages and inside the “Up Next” sidebar next to a video that is playing. The algorithm is responsible for more than 70 percent of all time spent on the site.
The radicalization of young men is driven by a complex stew of emotional, economic and political elements, many having nothing to do with social media. But critics and independent researchers say YouTube has inadvertently created a dangerous on-ramp to extremism by combining two things: a business model that rewards provocative videos with exposure and advertising dollars, and an algorithm that guides users down personalized paths meant to keep them glued to their screens.
Here’s what happened: Caleb Cain at some point clicked on a video about alternate history or Nazis, and the recommender system picked up on it. Other people that clicked on that same video also watched similar content, so it kept recommending the same type of content, until Caleb became enveloped in an enormous filter bubble that his brain was simply unable to get out of. If you’re constantly told that red is blue, there’s no way you can get back to reasoning that red is red, as long as it looks like millions of other people think like you.
This is an even bigger problem for children’s content. In its eagerness to keep young eyes on the screen longer, the algorithm is now working backwards - influencing creators to create terrible, weird content for kids
To begin: Kid’s YouTube is definitely and markedly weird. I’ve been aware of its weirdness for some time. Last year, there were a number of articles posted about the Surprise Egg craze. Surprise Eggs videos depict, often at excruciating length, the process of unwrapping Kinder and other egg toys. That’s it, but kids are captivated by them. There are thousands and thousands of these videos and thousands and thousands, if not millions, of children watching them.
On-demand video is catnip to both parents and to children, and thus to content creators and advertisers. Small children are mesmerised by these videos, whether it’s familiar characters and songs, or simply bright colours and soothing sounds. The length of many of these videos — one common video tactic is to assemble many nursery rhyme or cartoon episodes into hour+ compilations —and the way that length is marketed as part of the video’s appeal, points to the amount of time some kids are spending with them.
YouTube broadcasters have thus developed a huge number of tactics to draw parents’ and children’s attention to their videos, and the advertising revenues that accompany them.
This problem is so bad that never at any given time do I allow my daughter to watch YouTube unsupervised, and when I do I usually cast via Bluetooth on my phone to on our TV screen where I can see what’s going on. I’m not alone: Silicon Valley parents don’t let their kids watch, either. In order to fight this crap, I have effectively become my child’s own recsys.
Ok, so algorithms are stupid. What else is new? Garbage in, garbage out. Like most algorithms, recommender systems are driven by two things: the data you put into them and the results you’re hoping to get out. They don’t operate in a vacuum, separate from people. The people who work at YouTube are not bad people. They’re all smart - exceedingly so - and working on problems at scale. They write papers like these. They hold multiple PhDs.
So then, how did these systems, then, go so wrong? Can’t they just fill in the bias and randomize them a bit more?
The real problem is YouTube’s business model.
The way YouTube makes money is through advertising. It says so in Alphabet’s 10k.
How we make money
The goal of our advertising business is to deliver relevant ads at just the right time and to give people useful commercial information, regardless of the device they’re using. We also provide advertisers with tools that help them better attribute and measure their advertising campaigns across screens. Our advertising solutions help millions of companies grow their businesses, and we offer a wide range of products across screens and formats. We generate revenues primarily by delivering both performance advertising and brand advertising.
If you browse around looking for YouTube in the document, you’ll find lots of statements like this:
As online advertising evolves, we continue to expand our product offerings which may affect our monetization.
As interactions between users and advertisers change and as online user behavior evolves, we continue to expand and evolve our product offerings to serve their changing needs. Over time, we expect our monetization trends to fluctuate. For example, we have seen an increase in YouTube engagement ads, which monetize at a lower rate than traditional desktop search ads. Additionally, we continue to see a shift to programmatic buying which presents opportunities for advertisers to connect with the right user, in the right moment, in the right context. Programmatic buying has a different monetization profile than traditional advertising buying on Google properties.
(There is a small subset of their revenue that comes from YouTube Premium, their subscription service (side note: If you can justify it, I extremely recommend this as it makes YouTube 150% more tolerable))
What you’ll also find if you read about YouTube’s CEO, Susan Wojcicki, is that she’s from the advertising world, which is an enormous signal of what YouTube values in leadership:
Her tenure as C.E.O. wasn’t supposed to be dominated by pedophilia and attempted mass murder. When she got the job, in 2014, Ms. Wojcicki was hailed straightforwardly as the most powerful woman in advertising, someone who’d helped turn on the cash spigots in her time at Google and would presumably repeat the trick at YouTube. In the five years since, Ms. Wojcicki has introduced new forms of ads as well as subscription offerings for music, original content and the cord-cutting service YouTube TV. But somewhere along the line, her job became less about growth and more about toxic containment.
YouTube is THIRSTY for advertising money, at all times. Regardless of what users are doing on the platform, as long as it doesn’t impact advertising partnerships, it’s all above board. To wit, in 2018,
YouTube recently suspended advertising from Paul's YouTube channel after he shocked a rat with a Taser and joked on Twitter about ingesting Tide Pods, which are capsules containing laundry detergent. Weeks earlier, he had filmed himself next to a corpse of a Japanese suicide victim, a move that was widely criticized.
Paul responded with an apology tour, first with a short video, then with a longer video on suicide prevention. But then he returned with the Tide Pods tweet and the rat video.
His infractions count as two strikes, Wojcicki said during her appearance Monday at the CodeMedia industry conference. "We can’t just be pulling people off of our platform," she asserted.
However, when brands pull content from YouTube, YouTube changes its tone:
Following the exodus of some of its high-profile advertisers, Google has publicly apologized and pledged to give brands more control over where their ads appear.
The problem is, though, that since Google and Facebook control 90% of advertising online, advertisers always come back. They have to.
If the storyline is feeling a bit worn, that's because it is. Many major brands fled the Alphabet-owned platform due to content issues in 2017, but they’d largely returned before this latest scandal, drawn by its 1.8 billion users and attractive targeting tools.
In the case of AT&T, the timing of these new revelations is especially on-the-nose: The company had only just announced in January that it would start advertising again on YouTube after a two year hiatus spurred by having its ads play on videos featuring disturbing material like hate speech and violent extremism.
So really, the most important algorithm at YouTube isn’t collaborative filtering: it’s the endless loop between advertising money and cultural norms.
This puts Susan Wojcicki in an interesting position: she has to make advertisers happy, but also kind of not mess up the platform, but also still continue to push the recommendation system since that’s where views come from. So, she has to kind of low-key excuse some of the absolute atrocities on the platform to keep it relatively “open”, while at the same time putting on a happy face for users and an even happier song and dance for advertisers.
No recommendation system, no PhD, and no complete set of data can fix this business problem.
Art: Blam, Roy Lichtenstein 1962
What I’m reading lately:
How Instagram accidentally started recommending emojis
I just finished reading An Elegant Puzzle. Still mulling over it, but expect a review within the next several weeks.
This thread about open source maintenance:
This summary by Peter Norvig about some points Noam Chomsky made on NLP
SQL EVERYWHERE - a paper.
This essay about Odessa was so good I’m going to have to buy the book.
About the Author and Newsletter
I’m a data scientist in Philadelphia. This newsletter is about tech and everything around tech. Most of my free time is spent kid-wrangling, reading, and writing bad tweets. I also have longer opinions on things. Find out more here or follow me on Twitter.
If you like this newsletter, forward it to friends!