Note: This is the second part of however many I feel like in a series about data science consulting. Check out the first part, on selling data science, here.
A lot of junior people who watch senior people at work usually assume that what they know is magic. I know I did when I first started in my first job straight out of college as an economic consultant (a fancy term for Excel engineer). A couple months in, I managed to not mess anything up enough to merit an invite to a client kickoff call with senior executives.
As I listened on mute, I was immediately impressed by how the senior executives knew which questions to ask to get to the bottom of the issue and gain the client’s confidence, to figure out the real business problem, while I barely understood what they were talking about.
But, it turns out, after 10+ years in industry, there is no magic to figuring out what companies want. It’s all about repeated exposure to a given set of patterns.
Just as senior technical people have a list of tricks to cycle through when troubleshooting code (is it a typo? an error in the loop? an error in the system? an external dependency gone awry? an I/o error? did we write out log files to troubleshoot?), senior consultants know intuitively that there are really only several data problems that companies have:
Why are we losing customers?
How do we make some process go faster?
How can we make something cost less?
Let’s talk about the first one: why do customers leave or unsubscribe?
This problem is also known as churn, and I’ve encountered it at every single job and client I’ve ever worked for, including even companies that have near-monopolistic holds on their respective industries.
Churn is really, really important for organizations that rely on month-to-month revenue accrual, like most of SaaS companies. But it’s just as important in retail, entertainment, finance, and on and on and on.
Companies always want to understand why they’re losing revenue, and how they can make more of it.
But, it’s also always one of the hardest things to measure, because what you’re really trying to pin down is the point where people become unhappy enough to leave.
How do you measure unhappiness? It’s kind of like that Rent song, Seasons of Love:
Five hundred twenty-five thousand six hundred minutes
Five hundred twenty-five thousand moments so dear
Five hundred twenty-five thousand six hundred minutes
How do you measure, measure a year?
In daylights, in sunsets
In midnights, in cups of coffee
In inches, in miles
In laughter, in strife
In five hundred twenty-five thousand six hundred minutes
How do you measure a year in the life?
You can try to measure churn by how many people leave the platform month-on-month. But that won’t tell you why. What you’re really looking for is the root cause. What makes people unhappy? Is it long wait times? A slow website? Bad support? Maybe they don’t like the colors of your buttons? Maybe your target audience is wrong? At which point does a customer go from being merely unhappy, to pressing the “Cancel” button? Where’s the tipping point?
Maybe your product doesn’t work like it’s supposed to (*cough* dongles.)
Actually, let’s take the example of Apple. Everyone talks about how terrible the dongle situation is. But wow would Apple know that the dongle problem is the root cause of people switching from iPhones to Androids?
Sure, they could look at month-over-month sales before and after they introduced dongles in iPhone 7. But that’s not a clear indication. They’d also have to take into account any number of other factors that could cause a decrease in consumer happiness.
What would make someone want to switch from an iPhone to Android?
Hardware defects on a phone by phone basis
Time series modeling of purchases weighted with seasonal effects
Unhappiness interactions with iOS (maybe regressions in released versions)
General consumer sentiment (aka how the US economy is doing)
Popularity of alternatives (aka other phones released by carriers like Samsung)
Application parity (how many apps are available for iOS versus Android)
Each of these factors, or features in data science speak, comes from analyzing some set of data. That data needs to be gotten from somewhere. The iOS data probably lives in some internal bug reporting system, and is formatted as log data. The hardware defects could be either in a manufacturing reporting system, or in some finance system that logs material returns. Consumer preferences can come from government reporting of quarter-over-quarter GDP. Alternative data probably comes from some industry report. And application parity is probably a combination of internal and external sources. Which of these things, when combined, could lead to a good indicator of when a customer is most likely to cancel?
In addition to answering that very complicated question, all of that data needs to be standardized, normalized for seasonality and number of consumers, and created into a model that can compare apples to apples (heh.)
Even though the problem is the same, each company does have something very different: the way they collect and classify the data around the problem. Each company’s data processes are similar (data warehouse, maybe some sort of streaming solution, a dashboarding layer, reporting, a billing or CRM system, etc.).
This process, the dataset, the questions asked, and the people involved in evaluating the metrics, are unique across each company. There are parts of the process that can be automated, but the process of putting together the solution is very much manual, and will never be replaced by AI.
These kinds of question are why internal data science teams - and external consultants - exist.
But wouldn’t it be great if there was a single way to collect all that data, all the time, standardize it, and corral it? Then we could know the churn of every company in the world.
Luckily, this is exactly what Narrator, a new company founded by engineers from WeWork (of “We need Kafka” WeWork fame. BTW, just putting it out here that my Kafka hot take was really the only one you needed to understand what WeWork’s S1 would be like. ), hopes to accomplish:
The article states that the company will be working on creating a standard set of training data that every single company using Narrator will implement:
“We provide the equivalent of a data team for the price of an analyst,” explains Narrator co-founder and director of engineering Star. “Within the first month, our clients get an infinitely scalable data system.”
Led by chief executive officer Elsamadisi, a former senior data engineer at WeWork, the Narrator founding team is made up entirely of alums of the co-working giant. The building blocks of Narrator’s subscription-based data modeling tool were developed during Elsamadisi’s WeWork tenure, where he was tasked with making sense of the company’s disorganized trove of data.
Dussud gets something very right when he says,
“All companies are fundamentally the same when it comes to the kinds of data they want to understand about their business,” Narrator’s Dussud tells TechCrunch. “Every startup wants to know what’s my monthly recurring revenue, why are my customers churning or whatever the case may be. The only reason they have to go hire a data team and hire a business analyst is because the way that their data is structured is specific to that company.”
The business idea behind this thinking is very sound. Everyone wants to know churn.
But the path to get there is not easy, and here’s where it starts to go off the rails:
“If you start to imagine a world where, under the hood, the structure of the data at all companies is the same, you can now start reusing a lot of the things that in the past would actually be quite complicated,” said Star.
What Narrative, and every single other data startup that has been working on this problem since time immemorial, is not taking into account, is that data is wily.
Data will never do what you want.
You have to talk to the data and tame it. You have to coral it into data warehouses, normalize it, remove all the misspellings and mistakes from manual entry and failed data unit tests. You need to standardize schemas, convert data formats, run unit tests, and do feature engineering. All of this is not something that can be automated and standardized on a per-company basis.
Just look at this list of data problems that people have worked on, all unique and miserable in their own way:
I am really putting my Russian pessimism out there, and far be it from me to Knock The Hustle, but there is no way Narrator can possibly work:
Churn is always the same.
Data is not, and can never be.
And not understanding that, or ignoring it, is why many companies and startups never understand what’s going on with their customers.
Art: Churning Woman, Mihaly Munkacsy 1873
What I’m reading lately:
I just finished Tommy Tomlinson’s book on weight in America, “The Elephant in the Room”, and it’s a very, very strong recommend on weight loss, in a very Normcore and reasonable way. He’s a very talented writer tackling a very important topic.
A profile of Geoff Hinton, including this killer paragraph:
Hinton has said that when he was growing up, his mother gave him two choices: “Be an academic or be a failure.” His family tree is branch-breakingly weighted with scientists. His great-great-grandfather was George Boole, founder of Boolean logic, familiar to anyone who has done a “Boolean search.” One of George Boole’s sons-in-law was Charles Howard Hinton, Geoffrey’s great-grandfather, a mathematician and sci-fi writer who coined the concept of a “tesseract” (a four-dimensional object we can see in the 3-D world as a cube—well known to all readers of the classic children’s novel A Wrinkle in Time), and who ended up in the U.S. after being run out of Victorian England for bigamy. His son, Geoffrey’s grandfather, settled in Mexico—so there is a Mexican Hinton branch. Geoffrey Hinton’s middle name is Everest—as in the geographer Everest, his great-great-grandmother’s uncle, namesake of the mountain—and his father’s cousin was Joan Hinton, a nuclear physicist who helped out on the Manhattan Project and lived in China during the Cultural Revolution. Her father invented the jungle gym.
This newsletter on data! if you’re in data and not subscribed, you should be.
The discussion around this tweet:
About the Author and Newsletter
I’m a data scientist in Philadelphia. This newsletter is about issues in tech that I’m not seeing covered in the media or blogs and want to read about. Most of my free time is spent wrangling a preschooler and a newborn, reading, and writing bad tweets. I also have longer opinions on things. Find out more here or follow me on Twitter.
If you like this newsletter, forward it to friends!