AI won't save healthcare

But Google sure is gonna try

Art: Jan Steen, Doctor’s Visit, 1661

Several years ago, when my firstborn was very young, I thought I was having a heart attack. 

I was watching my seven-month-old play on the rug next to me, when I felt my chest tighten in a new and frightening way. I ignored the pain for several minutes while hoping with every fiber of my being that I wasn’t experiencing symptoms of cardiac arrest.

At 29, I knew the chances of me having a heart attack were very slim. And yet, I was experiencing chest tightness, trouble breathing, and discomfort on my right side. Every medical show I’d ever watched, combined with what I’d learned from years of reading the news, told me cardiac arrest was America’s biggest killer, and that it happened quickly. So, I did the thing I very much didn’t want to do: I Googled my symptoms with a growing sense of panic.

I did a Google search for symptoms, ran through my list of care options, and ended up at the ER, where I waited for hours before figuring out via Google that I’d instead sprained a muscle. I left the ER late at night with a hefty bill and a bunch of stress, but having not received any care at all. 

Medical care in the United States, on average, is much more expensive (and often of worse quality) than almost anywhere else in the developed world (and also worse than it used to be in the United States itself)

For example, Tweets like this are not uncommon indicators of medical costs in the US: 

A Vox project which asked people to submit their ER bills for examination showed that prices can be high and extremely volatile across states and even within hospitals. 

At the end of my blog post, I lamented that I wasn’t sure why people weren’t working more on these hard problems: faster diagnoses, making medical billing clearer, and generally making the patient experience in America more convenient and cheaper instead of working on, say, snack startups. 

Over the past few days, a couple of news items have come out that have made me realize that the answer is this: in the American healthcare industry, there’s no financial incentive to helping end users (patients), unless it also helps the institution taking care of them (hospitals, insurance companies, etc.) 

This is probably not news to anyone who’s immersed in healthcare, but it was sad for me to come to this realization.

The first piece of evidence is this extremely well-written, funny, and insightful essay on doing a medical tech startup

The author’s business idea was to create a database of clinical trials so that when people Googled for, say, “headache”, the database could tell them with statistical accuracy which medicine was the best to take based on aggregating all the studies.  

If you’re among the 77% of Americans that Google their health problems, insipid answers like this won’t surprise you. But we should be surprised, because researchers carry out tens of thousands of clinical trials every year. And hundreds of clinical trials have examined the effectiveness of painkillers. So why can’t I Google those results?

And so in the year of our lord 2017 I had a Brilliant Startup Idea: use a structured database of clinical trials to provide simple, practical answers to common medical questions.

So he built that database. And then he realized he couldn’t make any money on it. First, he tried Ye Olde Advertising: 

But then I look at WebMD’s 10-Qs and start to spiral. Turns out the world’s biggest health website makes about $0.50/year per user. That is…not enough money to bootstrap GlacierMD. I’m pouring money into my rent, into my Egyptian contractors, into AWS—I need some cash soon.

Then, he decided to see if he can sell it to doctors, 

Miraculously I find some doctors that are willing to talk to me. So I borrow my parents’ car and drive out to the burbs to meet a doctor I’ll call Susan.


Susan is a bit chatty (she’s a psychiatrist) but eventually I demo GlacierMD. I show her how you can filter studies based on the demographic data of the patient, how you can get treatment recommendations based on a preferred side effect profile, how you can generate a dose-response curve. She oohs and aahs at all the right points. By the end of the interview she’s practically drooling.

But then he realizes that she doesn’t want to buy it,

“Hmmmm,” she said, picking at her fingernails. “...Of course I always have the best interests of my patients in mind, but, you know, it’s not like they’ll pay more if I prescribe Lexapro instead of Zoloft. They won’t come back more often or refer more friends. So I’d sorta just be, like, donating this money if I paid you for this thing, right?”

I had literally nothing to say to that. It had been a bit of a working assumption of mine over the past few weeks that if you could improve the health of the patients then, you know, the doctors or the hospitals or whatever would pay for that. There was this giant thing called healthcare right, and its main purpose is improving health—trillions of dollars are spent trying to do this. So if I built I thing that improves health someone should pay me, right?

But it turns out that there is no incentive for doctors to provide better care, because they don’t get paid for better care. They get paid for seeing more patients. There’s no incentive to provide less care, unless you’re a fancy concierge medicine doctor who makes money off subscriptions as opposed to one-offs.  

Patient needs and healthcare industry economic incentives rarely align. This results in disastrous decisions for patients. I wrote about one of these situations in my earlier post about baby-friendly hospitals (NCT). 

What it means is that hospitals are now getting rid of nurseries that are currently a part of maternity wards. These nurseries are where babies who are just born go for a short amount of time. Usually, in these nurseries the babies are weighed, observed, given baths, and kept for a couple hours (until they wake up to feed) so exhausted moms can get some rest.

So, I started investigating why American hospitals were suddenly moving in this direction, and it all started to make a lot of sense.

The TL;DR is that increasingly cash-strapped hospitals are looking for things to drive down costs, and baby-friendly hospitals are a PR-friendly way to show new moms that hospitals care about them, while at the same time cutting expenses.

Ultimately, why are hospitals pushing baby-friendly and breastfeeding? Because it saves them (and the government, through Medicaid payments) money.

In theory, it sounds great. In reality, baby-friendly hospitals are, in general, a terrible idea for moms, as threads upon threads of recent new moms attest. My own experience the second time around was far from ideal.

So, it doesn’t make sense to start companies to improve quality of care in a way that doesn’t align with lowering costs for hospitals and health insurance companies, because they’re going to get strangled by the market. 

Where does this leave us? With horrible companies dominating the space. One of these  companies is Epic (which does medical billing), which is so terrible that there is a whole New Yorker article about it. 

And, because there is so much money in healthcare, any big company that spans multiple industries would be stupid to try not to get into it. For example, the author of the Epic article, Atul Gawande, who was previously a doctor and an (excellent!) writer, now leads a hazy healthcare venture backed by Amazon, among others.

This is why Facebook is doing stuff like this: 

And why Google recently restarted Google Health. Google Health was originally around in the 2010s to collect medical information that Google users could enter voluntarily (lol). After initiating a patent to serve ads in electronic medical record systems, Google shut down Health, as is common with most Google services that have been around longer than five milliseconds. 

Google Health was restarted in 2018 when David Feinberg, who previously worked as the CEO in (my home state of Pennsylvania's) Geisinger Health System was tapped to lead a new initiative at the company

According to CNBC’s Christina Farr, who first reported on the details of the transition, “Feinberg’s job will be figuring out how to organize Google’s fragmented health initiatives, which overlap among many different business.”

More recently, the company doubled down on Artificial Intelligence (AI) through Verily, an entity focused on precision medicine and disease-research projects. And last July, Google hired former Cleveland Clinic CEO Dr. Toby Cosgrove as executive adviser to its Cloud healthcare and life sciences division, areas of the company that don’t appear to overlap with Feinberg’s role.

While there is not an incentive for these companies to help patients beyond what money they can bring in the form of partnerships and ads, there is an incentive for selling big projects to big hospital and healthcare systems, because these projects pay off.  

For Google, one of these recent initiatives has centered around breast cancer research. If you were reading the tech news recently, you might have been excited to see the following headlines about a recent Google Health publication in the academic journal Nature that claims it can detect breast cancer just as accurately as radiologists and doctors. A couple things about this gave me pause. First, if you’ve been reading Normcore long enough, you know that companies don’t often do things just out of the goodness of their own hearts - there’s always a secondary motivation (NCT) Second, machine learning systems are still often very rudimentary (NCT) and third, they rely on the judgment of people (NCT) to work well. 

So what was the deal here? If you read Google Health’s statement on it, you’ll assume that it’s a pretty reasonable study. They say,

In collaboration with colleagues at DeepMind, Cancer Research UK Imperial Centre, Northwestern University and Royal Surrey County Hospital, we set out to see if artificial intelligence could support radiologists to spot the signs of breast cancer more accurately.

The model was trained and tuned on a representative data set comprised of de-identified mammograms from more than 76,000 women in the U.K. and more than 15,000 women in the U.S., to see if it could learn to spot signs of breast cancer in the scans. The model was then evaluated on a separate de-identified data set of more than 25,000 women in the U.K. and over 3,000 women in the U.S. In this evaluation, our system produced a 5.7 percent reduction of false positives in the U.S, and a 1.2 percent reduction in the U.K. It produced a 9.4 percent reduction in false negatives in the U.S., and a 2.7 percent reduction in the U.K.

False positives means you tagged someone as having breast cancer when they don’t, and a false negative is someone who does have breast cancer but who was not tagged. The latter is more dangerous because it means the disease remains untreated, but the former can be just as stressful for someone waiting for the results of an unknown test for days. 

So there’s a deep learning neural network-based model that looks at pictures of mammography scans, sees if it can spot signs of cancer based on previously-labelled images, and then lets you know whether it thinks this new image has areas that look like they have cancer, as well. What it really spits out is a 0/1 binary answer: as in “this picture looks like cancer,” therefore, “you have cancer.”

These results were compared to human diagnosticians who have looked at hundreds of thousands of these scans. Then, the patients were examined down the line to see if they did have cancer. 

It all sounds super solid, right? Except for a couple things. 

First, there is the issue of the actual methodology and reporting. People much smarter than me cover it in this very long and detailed thread. The gist of it is that this is the wrong tool for the job and results in incorrect predictions, including over-diagnosing of cancers. 

To be fair, this issue is not limited to AI systems. It’s a symptom of mammography in general, but this particular application of AI will overindex on positive results because, as always, it’s better to have a false positive than a false negative. That is, it’s “better” to detect something that isn’t there than to miss something that is. 

Second, as I mentioned, what this study gives is a binary response. If you read the paper instead of the popular press around it (and lucky for you, it’s available from Nature TO RENT for $8.99), you’ll find some more nuance.

The statistical model itself generates probabilities, which is then standardized to a yes/no response: 

The AI system consisted of an ensemble of three deep learning models, each operating on a different level of analysis (individual lesions, individual breasts and the full case). Each model produces a cancer risk score between 0 and 1 for the entire mammography case. The final prediction of the system was the mean of the predictions from the three independent models. The AI system natively produces a continuous score that represents the likelihood of cancer being present. To support comparisons with the predictions of human readers, we thresholded this score to produce analogous binary screening decisions.

So it initially produces the probability that, “This image looks like it contains cancerous cells,” and then standardizes to “It looks like cancer”, or “It doesn’t.” Although this is, as they note, what humans also do, the system won’t behave like a human and explain an interpretation, go through the results with you, and allow you to dig into how it made the decision: it’s a neural network and has very low interpretability as to how it arrived at the decision, particularly since the study doesn’t reveal its source data or code:

So why even do this study at all when it’s hard to interpret, hard to replicate, and might not be the correct metric? 

This study is not actually meant to advance cancer research, or improve health outcomes (although I genuinely believe that the scientists working on this study believe it will do both, and it’s possible that both will be a side effect).  

What this study is, is an exercise in sales engineering. If you’re not familiar with sales engineering, you’ve led a blessed life. But basically what it means is that you have a company, and you want to convince other companies that your company can do the job. You go to a potential client and say, “Hey, my company can complete this task.” And the company says, “Well yeah that’s great, but this is a $2 million project and  you have no track record of doing this work. We’d like to truly believe that you can do it. Prove it.”

So you build a proof of concept, a small model, something that is not the entire scope of work, but that will help you make the sale and get started on the work once you win the contract. This is sales engineering. 

This study was released for the same reasons that the OpenAI work was (NCT): to showcase to decision makers in the healthcare industry what Google can do. It’s meant to display how Google Health and DeepMind can build these systems, and how, if you pay Google enough money, they’ll build a system like this for you. 

This is Google trying to make the sale. Except, since healthcare is big money and big companies, they also have to go big, by getting published in a serious, prestigious journal and partnering with multiple medical centers across the world. The Nature paper has almost 30 contributors, including Demis Hassaibs, the CEO of DeepMind.

Not only does the sales engineering process include wowing people with star power and PR (this is part of the sales versus execution culture (NCT)), it also includes seemingly sharing a lot of info without sharing anything. For example, in addition to not sharing any of the code, they also can’t/don’t share some of the data:

They also have to make it seem like only they can build this system. 

As a contrast, check out this recent NYU study, which examined very similar things, but open-sourced all of their data and code. 

So, what do we have here? On the surface, if you read Google’s press release or a newspaper article on the issue, it seems like a promising foray into enabling, that word again, better health outcomes for women. 

But what this study really is, is a classic business move in the expensive, massive, and fierce United States healthcare industry, to prove that Google is not being left behind Amazon and Facebook, and can also do healthcare, and has the team to prove it.  

It’s true that they’re not trying to solve snacks like I complained about in my original blog post, but I don’t know that this is any better. 

What I’m reading lately:

  1. This thread about learning, very juicy:

  1. Julia’s fantastic post on ad tracking:

  1. The joys of walking

  2. The Wonder Years aired from 1988 and 1993 and depicted the years between 1968 and 1973. When I watched the show, it felt like it was set in a time long ago. If a new Wonder Years premiered today, it would cover the years between 2000 and 2005.”

  3. The future of the web isn’t the web.

About the Newsletter

This newsletter is about issues in tech that I’m not seeing covered in the media or blogs and want to read about. It goes out once a week to free subscribers, and once more to paid subscribers. If you like it, forward it to friends!

Select previous free Normcore editions:
Keybase and the chaos of crypto · What’s up with Russia’s Internet· I spent $1 billion and all I got was this Rubik’s cube· Die Gedanken sind frei · Neural nets are just people· Le tweet, c’est moi· The curse of being big on the internet· How do you like THAT, Elon Musk?·Do we need tech management books? ·Two Python Paths
Select previous paid Normcore editions:
Sidewalks for the internet· Imgur is bad now · Eric Schmidt and the great revolving door· No photos please · Deep thoughts of Cal Newport

About the Author:
I’m a data scientist in Philadelphia. Most of my free time is spent wrangling a preschooler and a baby, reading, and writing bad tweets. I also have longer opinions on things. Find out more here or follow me on Twitter.