This week, I got an excellent question from a reader (keep them coming!).
Paul writes,
Following up on your two posts about recsys I can imagine that you receive a ton of replies aiming to point out the value of recsys and trying to refute your arguments. I would love to hear these other voices, or instead some kind of meta post contemplating your position on Netflix and what others say about it.
Surprisingly I haven’t received ANY replies arguing that the current state of recsys is good, so I’d love to open it up to readers.
So, what are the *best* recommender systems that you’ve encountered either at work or at your online lives?
Mine is Pocket recommendations on the Firefox start page. I was super against them at first because it seemed like Pocket was collecting browsing data, and I'm still kind of hazy on how exactly they work WITHOUT storing the data. I'm also not a huge fan of the fact that they feature advertising content sometimes, but I always find something interesting to read: https://help.getpocket.com/article/1142-firefox-new-tab-recommendations-faq
I generally like Pocket recommendations. They're usually adjacent to my interests, and not something I might have discovered otherwise. And they're not just a copy of my twitter feed either.
I think there's a tendency to recognize when recommendations are wrong and blissfully stream/click/watch-on when they're good. I often groove out to a on-point Spotify weekly playlist or find myself 10 recommendations deep in the YouTube metalworking/woodworking/DIY subspace, but as soon as that weird conspiracy theory video pops up in the top ranking spot it's like a rumblestrip in your brain that wakes you up to, "oh this is crap!". Music and YT clips have an advantage over long-form media in that the consequences of being wrong aren't as bad - 3 minutes lost to a song, versus the "only evening I've had with my wife to sit on the couch and watch TV in a week is gone" consequence.
But, let's face it, the best recommender is the endcap of the grocery checkout with a cool beverage on a hot day.
I think Spotify does a very nice job of making custom playlists and mixes for me. My only complaint is that it seems to be too effective and doesn't seem to value newness enough. This results in 1) Having favorite songs that I know come up often, but not remembering the artist or song name and 2) Hearing the same songs over and over, when sometimes I want to discover something new.
I'll chime in for Spotify as well; I have not actually made any playlists, but I've been listening for years and my Discover Weekly and Release Radar (for the newest stuff they have) is just exceptionally good.
Spotify in particular relies on you to curate playlists and create ones of your own to really maximize the value of its rec engines. That it is, it looks at what are in your playlists, creates a form of music interest topo-map, and then finds other users who have some similar songs together but then pulls in other songs from their playlists that you haven't listened to. So if you don't have a lot of playlists then there's less opportunities to match newness
A subtle distinction worth raising: a Spotify user does not need to make playlists to receive good recs. They will “embed” you whether you curate lists or not, using other activity like song completions, saved and hearts. The article you linked explicitly omits this point, and it matches my experience working on embedding based recommenders.
As for newness, in my experience Your Daily Mixes are designed to emphasize familiarity over novelty. For novelty you have Discover Weekly and Release Radar. The daily mixes seem to follow the “old-new-old” song “sandwich” format followed by radio DJs, where novel tracks are interspersed with your favorites.
I had no idea, but that's great to know! Now I can create some playlists to subset my music preferences-- looking forward to seeing how this impacts the recsys. This brings up the point that I wish the features used in these systems were less black-boxey. Why does it have to be a secret, and wouldn't it valuable for people to understand the system so they can influence it? I'd love to see some output like from Python's LIME <https://github.com/marcotcr/lime>
this is a bit of self promotion.... but .... the recommender i made for the dutch public broadcasting ended up working very well. https://youtu.be/kYMfE9u-lMo?t=301
the idea behind the algorithm: we *force* people away from the status quo. and since there is no monetary incentive the broadcaster could actively pursue the idea of broadening everyones perspective. the algorithm is very simple too.
Two notes that might help contextualize recommender system evaluation:
First, in my experience, folks *really* care about false positives (you recommended a thing I don't like), but rarely care about false negatives (you didn't recommend a thing I like). The universe of things people like is very very large! So it's impossible for folks to note their absence when evaluating recs.
Second, recsys have annoying failure modes and subtle success modes. If Netflix recommends "The Great Norwegian Fishing Show" and you're not an angler, you'll wish that spot went to another crime procedural you haven't watched yet. Conversely, if you're 12 songs into a Spotify playlist one day, you're not reflecting on recommendation quality. You just enjoy your music and go about your day!
All that said, I think in 2019 Spotify sets the bar for recommendation quality. They operate in the highly-commoditized streaming music space, and therefore must differentiate their product through excellent product design. I believe Discover Weekly, Your Daily Mixes and Release Radar help them differentiate by providing lists of songs that are generally appealing with few glaring misses. That's the most you can hope for from recsys nowadays.
I actually really like Apple Music's recommendations quite a bit - both for discovery of music I haven't heard before as well as old favorites I haven't heard in awhile. I think it's much better than other music streaming services I've used.
Yeah. Their iPhone app has a page called "For you" which has recently played music and then a number of sections based on stuff you've listened to (for example I have an "Indie" section, "Electronic", a section based on a playlist I recently listened to, etc). I may be in the "power user" category for this stuff. I like a ratio of 60/40 new stuff/things I've heard before.
I wonder the degree to which editors curate For You. When I used Apple Music I also liked their recs, but they’ve marketed their service as one with greater human input.
Hmm, that's an interesting point. This reminds me of the way that Stitch Fix serves clothing choices for customers to stylists, with the stylist being the human touchpoint deciding which specific clothing is sent to a customer. Maybe the better recommendation systems are those that have human input.
Human-in-the-loop recsys are great. They solve a lot of the problems Vicki has identified in her recent newsletters, although you don't have the same economies of scale as with software-only systems.
Facebook recommends to me my ex-husband's wife; LinkedIn gives me bosses of jobs I have quit; YouTube has creepy right-wing videos or craziness; I don't need Spotify to do anything because I can look at my real friend's lists; Amazon tells me what somebody else bought with the thing I'm buying but I don't want it. There are no good recommenders. But I'm happy to keep it the way it is, because in order to become better, they would have to invade privacy even more and scrape even more data. So it's all good.
I enjoy looking at the recommendations as a data person primarily to try to reverse-engineer the thinking and data points that might have led to the recommendation. Sometimes, I am just mystified, generally by the simplicity of the "connection." For instance, I connected with my brother on LinkedIn. This popped up a list including many people with the same name as my brother as "People You May Know." It is not as though they have a club and meet regularly! His is a relatively common name, as is mine. The comment above struck me because I agree with the conclusion. (And I have my son recommend music to me.)
There was a company called Songza that had amazing music recommendations. It got bought by Google, and now that engine is used to power Google Music's recommendations, which I like a lot more than Spotify.
In my field of Information Security, the best recommenders are semi-closed peer groups, such as the Information Sharing and Advisory Centers (ISACs), specifically the Healthcare ISAC (H-ISAC) and Research, Education, and Networking ISAC (REN-ISAC). Automatic systems do not take the high variety of data points in question into consideration. In addition, human nature, specifically in the healthcare field, is biased toward other similar organizations who may have had the same experiences. Therefore, semi-closed peer groups work best for the structures we have to work in.
I tried PYMK but as I have more than 1.5K people in my network most of the time I do not access it, not because I do not need any network, but for me real network occurs most in foruns and conferences.
Yea, I'm not saying its actually good for networking necessarily, but I find it is good for showing me people I may know (ie: I never have to actually search for someone directly)
Twitter's "Who to follow" is very reactive to your most recent activity, but it can be useful. It's 100% better than the recommender system that makes up their algorithmic timeline (which is terrible for a number of reasons).
Amazon's "People also shopped for" has always been one of the more useful recommender systems.
The recommender system that SHOULD exist, though, is a recommendation engine for citing research papers based on the text in your article.
I'm a big fan of Soundcloud's recommendations, in terms of what auto-plays after I've finished listening to a song, but I also appreciate friend-curated recommendations above all others... I have a couple collaborative playlists going on Spotify that have interested groups contributing to them, and they're ideal recommendation pots because they're semi-chaotic but also from a group with shared interests.
I just cite the good cases for me, let's discuss the bad cases:
Amazon: Was really good in the past for books, but for electronics never reached my expectations. Very large catalog and always recommended the TOP picks.
Linkedin: As someone described below, just recommend people that do not have anything to exchange with me: Not good job positions, network. Just corporate fan-fic and self-entitlement.
Facebook: Mixed feelings here. When we have some political rally in my country, FB becomes very crazy to recommend me something that I do not like (I'm libertarian) for the most extreme spectrums. But outside politics, I like the funny videos (especially pranks and comedy) and most of the time I just display in my TV the FB recommendations when I'm watching comedy videos.
Instagram: Was good in the past, now just entitlement and people faking. I really liked in the past to see simple things of my friends as cats pictures, babies pictures, real celebrations with spontaneous pics, food pictures, and short videos. Right now IG it's all about people faking trips, IG models, heavily produced pics, and a huge rally for the likes.
IG it's the real case when the platform changed the behavior of the content (people) and with that lowered the level of recommendations.
Twitter: At least for me, the less invasive one. The Ads at least for me it's more annoying than the ultra-left/right recommendations.
I think personally as a consumer, I have different expectations over RecSys across different products. Ex: Spotify I use for matter serendipity, especially for the Discovery Weekly. Since 2016 I make a decision of not hear anything mainstream for the music style that I like (Heavy Metal). I got in touch with several different bands and I'm satisfied with that.
Youtube it's good but a wild if you watch by accident some trash (e.g. flat earth or conspiracies). To overcome that I created two accounts: One for academic and research purposes where I just watch things related with my subscriptions (just going to carrousel), and another one for fun and assuming that most of the time I'll receive something irrelevant (e.g. videos about gaming or pranks like epic5TV).
Netflix, at least for me it's the worse: Limited catalog, low level of serendipity, and the infinite pushing their content (at least for me, mostly it's not relevant). In the past (I canceled) I used Netflix to watch espionage movies (e.g. Bourne movies) and only movies (I never watch series because I do not see the point to stay for a long time in the same show, i.e. years). Netflix never understood that I just want to sit 2 hours and watch something that will end after these 2 hours. Because of that I just use Youtube Movies and I solved that problem.
Personally, I agree that recsys on music probably work best and I enjoy Spotify's discover weekly most of the time. I can think of a few more, for instance recommending people on twitter, instagram, or linkedIn.
From my professional experience, creating a performant recsys product is difficult. I believe the main reason for this is, at least in most cases, that it aims to optimize for personal taste, which is taken alone a very difficult to grasp, subjective and instable concept.
What I would be most curious about is how others would evaluate the success of recsys in terms of metrics and performance, i.e. what is a good recsys supposed to achieve?
Medium's recommendations via email are quite good. Oddly, their recommendations on the app are garbage -- I don't know why they don't use the same system in both places...!
Mine is Pocket recommendations on the Firefox start page. I was super against them at first because it seemed like Pocket was collecting browsing data, and I'm still kind of hazy on how exactly they work WITHOUT storing the data. I'm also not a huge fan of the fact that they feature advertising content sometimes, but I always find something interesting to read: https://help.getpocket.com/article/1142-firefox-new-tab-recommendations-faq
I generally like Pocket recommendations. They're usually adjacent to my interests, and not something I might have discovered otherwise. And they're not just a copy of my twitter feed either.
I think there's a tendency to recognize when recommendations are wrong and blissfully stream/click/watch-on when they're good. I often groove out to a on-point Spotify weekly playlist or find myself 10 recommendations deep in the YouTube metalworking/woodworking/DIY subspace, but as soon as that weird conspiracy theory video pops up in the top ranking spot it's like a rumblestrip in your brain that wakes you up to, "oh this is crap!". Music and YT clips have an advantage over long-form media in that the consequences of being wrong aren't as bad - 3 minutes lost to a song, versus the "only evening I've had with my wife to sit on the couch and watch TV in a week is gone" consequence.
But, let's face it, the best recommender is the endcap of the grocery checkout with a cool beverage on a hot day.
I think Spotify does a very nice job of making custom playlists and mixes for me. My only complaint is that it seems to be too effective and doesn't seem to value newness enough. This results in 1) Having favorite songs that I know come up often, but not remembering the artist or song name and 2) Hearing the same songs over and over, when sometimes I want to discover something new.
I'll chime in for Spotify as well; I have not actually made any playlists, but I've been listening for years and my Discover Weekly and Release Radar (for the newest stuff they have) is just exceptionally good.
Spotify in particular relies on you to curate playlists and create ones of your own to really maximize the value of its rec engines. That it is, it looks at what are in your playlists, creates a form of music interest topo-map, and then finds other users who have some similar songs together but then pulls in other songs from their playlists that you haven't listened to. So if you don't have a lot of playlists then there's less opportunities to match newness
A subtle distinction worth raising: a Spotify user does not need to make playlists to receive good recs. They will “embed” you whether you curate lists or not, using other activity like song completions, saved and hearts. The article you linked explicitly omits this point, and it matches my experience working on embedding based recommenders.
As for newness, in my experience Your Daily Mixes are designed to emphasize familiarity over novelty. For novelty you have Discover Weekly and Release Radar. The daily mixes seem to follow the “old-new-old” song “sandwich” format followed by radio DJs, where novel tracks are interspersed with your favorites.
I had no idea, but that's great to know! Now I can create some playlists to subset my music preferences-- looking forward to seeing how this impacts the recsys. This brings up the point that I wish the features used in these systems were less black-boxey. Why does it have to be a secret, and wouldn't it valuable for people to understand the system so they can influence it? I'd love to see some output like from Python's LIME <https://github.com/marcotcr/lime>
to be fair, they're pretty open about it: https://qz.com/571007/the-magic-that-makes-spotifys-discover-weekly-playlists-so-damn-good/
this is a bit of self promotion.... but .... the recommender i made for the dutch public broadcasting ended up working very well. https://youtu.be/kYMfE9u-lMo?t=301
the idea behind the algorithm: we *force* people away from the status quo. and since there is no monetary incentive the broadcaster could actively pursue the idea of broadening everyones perspective. the algorithm is very simple too.
I find that Amazon’s “people who bought this also bought” is helpful for books
Two notes that might help contextualize recommender system evaluation:
First, in my experience, folks *really* care about false positives (you recommended a thing I don't like), but rarely care about false negatives (you didn't recommend a thing I like). The universe of things people like is very very large! So it's impossible for folks to note their absence when evaluating recs.
Second, recsys have annoying failure modes and subtle success modes. If Netflix recommends "The Great Norwegian Fishing Show" and you're not an angler, you'll wish that spot went to another crime procedural you haven't watched yet. Conversely, if you're 12 songs into a Spotify playlist one day, you're not reflecting on recommendation quality. You just enjoy your music and go about your day!
All that said, I think in 2019 Spotify sets the bar for recommendation quality. They operate in the highly-commoditized streaming music space, and therefore must differentiate their product through excellent product design. I believe Discover Weekly, Your Daily Mixes and Release Radar help them differentiate by providing lists of songs that are generally appealing with few glaring misses. That's the most you can hope for from recsys nowadays.
I actually really like Apple Music's recommendations quite a bit - both for discovery of music I haven't heard before as well as old favorites I haven't heard in awhile. I think it's much better than other music streaming services I've used.
That's pretty interesting. Is this part of their subscription service?
Yeah. Their iPhone app has a page called "For you" which has recently played music and then a number of sections based on stuff you've listened to (for example I have an "Indie" section, "Electronic", a section based on a playlist I recently listened to, etc). I may be in the "power user" category for this stuff. I like a ratio of 60/40 new stuff/things I've heard before.
I wonder the degree to which editors curate For You. When I used Apple Music I also liked their recs, but they’ve marketed their service as one with greater human input.
Hmm, that's an interesting point. This reminds me of the way that Stitch Fix serves clothing choices for customers to stylists, with the stylist being the human touchpoint deciding which specific clothing is sent to a customer. Maybe the better recommendation systems are those that have human input.
Human-in-the-loop recsys are great. They solve a lot of the problems Vicki has identified in her recent newsletters, although you don't have the same economies of scale as with software-only systems.
Facebook recommends to me my ex-husband's wife; LinkedIn gives me bosses of jobs I have quit; YouTube has creepy right-wing videos or craziness; I don't need Spotify to do anything because I can look at my real friend's lists; Amazon tells me what somebody else bought with the thing I'm buying but I don't want it. There are no good recommenders. But I'm happy to keep it the way it is, because in order to become better, they would have to invade privacy even more and scrape even more data. So it's all good.
I enjoy looking at the recommendations as a data person primarily to try to reverse-engineer the thinking and data points that might have led to the recommendation. Sometimes, I am just mystified, generally by the simplicity of the "connection." For instance, I connected with my brother on LinkedIn. This popped up a list including many people with the same name as my brother as "People You May Know." It is not as though they have a club and meet regularly! His is a relatively common name, as is mine. The comment above struck me because I agree with the conclusion. (And I have my son recommend music to me.)
There was a company called Songza that had amazing music recommendations. It got bought by Google, and now that engine is used to power Google Music's recommendations, which I like a lot more than Spotify.
Vicki,
In my field of Information Security, the best recommenders are semi-closed peer groups, such as the Information Sharing and Advisory Centers (ISACs), specifically the Healthcare ISAC (H-ISAC) and Research, Education, and Networking ISAC (REN-ISAC). Automatic systems do not take the high variety of data points in question into consideration. In addition, human nature, specifically in the healthcare field, is biased toward other similar organizations who may have had the same experiences. Therefore, semi-closed peer groups work best for the structures we have to work in.
Wouldn't Linkedin's People you may know be the best example?
I tried PYMK but as I have more than 1.5K people in my network most of the time I do not access it, not because I do not need any network, but for me real network occurs most in foruns and conferences.
Yea, I'm not saying its actually good for networking necessarily, but I find it is good for showing me people I may know (ie: I never have to actually search for someone directly)
Twitter's "Who to follow" is very reactive to your most recent activity, but it can be useful. It's 100% better than the recommender system that makes up their algorithmic timeline (which is terrible for a number of reasons).
Amazon's "People also shopped for" has always been one of the more useful recommender systems.
The recommender system that SHOULD exist, though, is a recommendation engine for citing research papers based on the text in your article.
I'm a big fan of Soundcloud's recommendations, in terms of what auto-plays after I've finished listening to a song, but I also appreciate friend-curated recommendations above all others... I have a couple collaborative playlists going on Spotify that have interested groups contributing to them, and they're ideal recommendation pots because they're semi-chaotic but also from a group with shared interests.
I just cite the good cases for me, let's discuss the bad cases:
Amazon: Was really good in the past for books, but for electronics never reached my expectations. Very large catalog and always recommended the TOP picks.
Linkedin: As someone described below, just recommend people that do not have anything to exchange with me: Not good job positions, network. Just corporate fan-fic and self-entitlement.
Facebook: Mixed feelings here. When we have some political rally in my country, FB becomes very crazy to recommend me something that I do not like (I'm libertarian) for the most extreme spectrums. But outside politics, I like the funny videos (especially pranks and comedy) and most of the time I just display in my TV the FB recommendations when I'm watching comedy videos.
Instagram: Was good in the past, now just entitlement and people faking. I really liked in the past to see simple things of my friends as cats pictures, babies pictures, real celebrations with spontaneous pics, food pictures, and short videos. Right now IG it's all about people faking trips, IG models, heavily produced pics, and a huge rally for the likes.
IG it's the real case when the platform changed the behavior of the content (people) and with that lowered the level of recommendations.
Twitter: At least for me, the less invasive one. The Ads at least for me it's more annoying than the ultra-left/right recommendations.
I think personally as a consumer, I have different expectations over RecSys across different products. Ex: Spotify I use for matter serendipity, especially for the Discovery Weekly. Since 2016 I make a decision of not hear anything mainstream for the music style that I like (Heavy Metal). I got in touch with several different bands and I'm satisfied with that.
Youtube it's good but a wild if you watch by accident some trash (e.g. flat earth or conspiracies). To overcome that I created two accounts: One for academic and research purposes where I just watch things related with my subscriptions (just going to carrousel), and another one for fun and assuming that most of the time I'll receive something irrelevant (e.g. videos about gaming or pranks like epic5TV).
Netflix, at least for me it's the worse: Limited catalog, low level of serendipity, and the infinite pushing their content (at least for me, mostly it's not relevant). In the past (I canceled) I used Netflix to watch espionage movies (e.g. Bourne movies) and only movies (I never watch series because I do not see the point to stay for a long time in the same show, i.e. years). Netflix never understood that I just want to sit 2 hours and watch something that will end after these 2 hours. Because of that I just use Youtube Movies and I solved that problem.
Personally, I agree that recsys on music probably work best and I enjoy Spotify's discover weekly most of the time. I can think of a few more, for instance recommending people on twitter, instagram, or linkedIn.
From my professional experience, creating a performant recsys product is difficult. I believe the main reason for this is, at least in most cases, that it aims to optimize for personal taste, which is taken alone a very difficult to grasp, subjective and instable concept.
What I would be most curious about is how others would evaluate the success of recsys in terms of metrics and performance, i.e. what is a good recsys supposed to achieve?
Medium's recommendations via email are quite good. Oddly, their recommendations on the app are garbage -- I don't know why they don't use the same system in both places...!