Horses and Zebras (September 2017)

One of the very first things that young physicians are taught is that when you hear hoofbeats you should think of horses, not zebras. It’s great and very useful advice; aphorisms become aphorisms for a reason. If there are hundreds of horses out there for every zebra, you want your doctor to be really good at spotting horses. But what happens when a zebra shows up? Well, for one thing, that well-worn adage turns out to be worse than useless; it actively points the diagnostician in the wrong direction. And, as it happens, there are zebras all over the place.

Even though the absolute number of any given rare illness — usually defined as a prevalence of fewer than 1 cases per 1,500 people —can be vanishingly small, rare illnesses are, overall, reasonably common: estimates peg the frequency at somewhere between 6% and 8% of the population [1]. You almost certainly know someone afflicted with a zebra, even if you don’t know that you know. The problem is that these illnesses are, generally speaking, diagnostic nightmares. Patients regularly endure astounding lapses between symptom onset and diagnosis; in one survey nearly a quarter of respondents reported waited more than five years before being correctly diagnosed [2]. The reasons behind this aren’t totally clear, but almost certainly relate to doctors’ relentless pursuit of, and greater comfort with, horses.

Such profound diagnostic failure implies not only that we’re exposing patients to needless delay, cost and sickness, but also that corralling zebras might require something besides our usual set of diagnostic tools. The trouble is that, if a patient spends several years— not to mention astronomical sums — chasing the identity of her illness, it stands to reason that she’s been exposed to every diagnostic modality that could reasonably be expected to yield a solution (and probably quite a few that couldn’t). Between the plethora of exams and tests that these patients have already been subjected to, and digital libraries that have collated all of our medical knowledge into searchable databases, it’s fair to say that our struggles with zebras don’t stem from a dearth of useful information. These mysteries aren’t typically unlocked by some late arriving silver bullet; they’re solved by someone synthesizing the available data and figuring out the answer. So maybe the smart way to approach a tricky diagnostic problem might be just to have lots of different doctors take a shot at it.

That, at least, is the animating insight behind a software firm called CrowdMed. Their platform, which was launched in 2013, aims to speed the diagnosis of rare illnesses by tapping into the intelligence of a wide network of “medical detectives”: a previously unconnected array of health professionals, med students and laypeople from around the world. The detectives each evaluate the patient’s case, and suggest a diagnosis; the set of suggestions are then aggregated and ultimately presented to the patient as a list of possible diagnoses, ranked by likelihood of being correct. But the heart of CrowdMed’s pitch isn’t merely that they’ll collect second (and third and fourth) opinions; instead, they’re arguing that they can unravel vexing medical problems by getting a group of diverse diagnosticians to work in concert.


CrowdMed is seeking to leverage the problem-solving capacity of groups. This idea — that groups of people working together can realize a level of performance or intelligence that exceeds that of its smartest member — has a long intellectual history, as detailed in James Surowiecki’s 2005 book, The Wisdom of the Crowds [3]. [A sentence worth of examples]. Though Surowiecki makes a reasonably compelling argument that crowdsourcing approaches are well suited to addressing a variety of problems, it’s obvious that not all groups are smart or capable of making good decisions. A mob of people reinforcing one another’s marauding isn’t smart, and a coterie of investors inflating a bubble isn’t making good choices. Surowiecki acknowledges this, and suggests that a group’s problem-solving capacity is tied to its adherence to a set of essential conditions.

Chief among these are diversity and independence. In diverse groups the members each have a unique perspective or individual “private information”; constituents of independent groups aren’t unduly influenced by one another. The smartest groups have both these elements, and dumbest have neither. Diversity is critical to ensuring that the group evaluates as many potential solutions as possible, whereas independence helps prevent individual errors from becoming systematically biased. Groups that lack independence and diversity are prone to fixating on an incorrect answer, and, worse yet, will often fail to consider the correct solution at all. The result can be as dramatic as the Tulip Mania in 17th century Netherlands (fueled by a group who was neither diverse nor independent), or as mundane as a mutual fund investing in the wrong stock. In either case, the group’s characteristics prevent it from taking advantage of its collective intelligence.

It’s not clear that CrowdMed’s cadre of sleuths is diverse or independent, at least not in the sense contemplated by Surowiecki. By design, there are essentially zero barriers to participate as a CrowdMed solver; the company encourages everyone, regardless of whether they have a medical background or training, to try to solve the cases. The result is a rather heterogeneous assortment of diagnosticians: 38% are male, 24% from outside the United States and 42% weren’t involved in medical industry [4]. The service is still relatively small — 357 total case solvers in the period between 2013 and 2015 — and it’s certainly reasonable to expect it to become more cosmopolitan and gender-balanced as it grows. Overall, CrowdMed profiles as a rather diverse group in terms of demographics.

But demographic diversity isn’t the same as informational diversity. The stipulations that Surowiecki discussed have nothing to do with whether the crowd is gender-balanced or multinational or trained in a particular field. Instead, the salient question is whether the various group members are able to contribute unique private information. But since the patient uploads all the relevant data and images, and all solvers have access to same online resources, it’s more or less impossible for any of the solvers to possess what might be considered private information. The solvers are able to ask the patients specific questions, but even the records of those interactions are fully available to all other solvers. The private information necessary for diversity is structurally excluded from the CrowdMed platform.

To be fair, though, it’s possible that CrowdMed’s solvers achieve diversity not through an abundance of private information but by virtue of eccentric interpretations of the available facts; perhaps, if everyone puts a unique spin on the data, it can compensate for an absence of private information. In a way, CrowdMed seems to be counting on this: in 100% of its cases, doctors—more than one, usually—have taken a shot at a diagnosis and come up short. Given such a daunting starting point, it’s not totally unreasonable to suppose that an unconventional approach might be required to crack these cases. So, that 40% of CrowdMed’s solver base is comprised of people who work outside of the medical field would seem to both be well-suited to CrowdMed’s challenges and assuage concerns of insufficient informational diversity.

However, though dependence on non-medical minds might be a boon to diversity, it has the potential to undercut CrowdMed’s operation in other ways. Surowiecki’s book (and the entire notion of crowd-based decision making) can easily be read as an indictment of reliance on expert advice. After all, crowdsourcing methods are beneficial only if they offer superior performance to a traditional non-crowd decision maker. But those comparisons are, critically, between a crowd and a single expert. Accordingly, Surowiecki and others caution us to limit our reliance on solitary, lone-wolf decision makers. That said, experts—people with enough knowledge and experience to make meaningful contributions to the crowd’s deliberations — remain indispensable. There’s no reason to believe that experts (doctors, in this case) are less valuable to crowds than laypeople, or that a crowd of laypeople is preferable to a crowd of experts. On the contrary, it’s clear that in this realm, diversity can only be beneficial when accompanied by expertise. If, for example, a significant portion of a crowd lacks medical expertise — and thus the capacity to contribute meaningful ideas to the discussion — not only can you be fairly certain that a correct solution won’t emerge from that section of the crowd, but you also risk bogging down the true experts by forcing them to sift through the output of the non-experts. This is precisely the quagmire that CrowdMed and its armada of detectives finds itself in. The overall consequence of CrowdMed’s problem-solving method is to rob the group of its potential advantages in efficiency and potentially reduce the crowd’s intelligence to below that of its smartest members.

Despite that, CrowdMed’s process has been, at very least, a qualified success. About half of its clients would recommend CrowdMed to a friend, and close to 60% reported that the process led them closer to a diagnosis; this certainly suggests that the process is providing customers with real value. But, unsurprisingly, that value is disproportionately attributable to the superior contributions of expert group members: solvers in the medical field are, on average, rated more than 20% better than their non-medical colleagues. The only salient difference between those two groups is level of relevant expertise. Crowdsourcing, as it happens, isn’t that much different from traditional problem solving: either way, you need people who know what they’re talking about.


Surowiecki’s book opens with a famous anecdote. Francis Galton—the famous statistician and legendary polymath — finds himself at a country fair. For a small fee, the fairgoers were invited to guess the weight of an ox; whoever is closest to the mark wins the pot. Close to 800 people got in on the action. When the guesses were averaged, that figure ended up being only one pound off from the actual weight of the ox. The crowd had, collectively, made a nearly perfect prediction. Galton was duly impressed and the first seeds of the crowdsourcing revolution were planted.

The Galton story is a tidy illustration of the potential of the sort of crowdsourcing that CrowdMed is trying to tap into, but paradoxically also hints at its limitations. While only a portion of the country fair crowd were genuine experts — butchers, farmers and the like — everyone who guessed had at least some rational basis to make a judgment. Each guesser, for instance, would have known his own weight and could use that to at least ballpark a guess for the ox. Close to half of CrowdMed’s sleuths, on the other hand, aren’t involved in the medical field at all; they can’t be expected to have even basic knowledge to make a rudimentary diagnosis, let alone identify some esoteric disease that has evaded the detection of several previous doctors.

Perhaps more importantly, the crowd was able to successfully determine the weight of the ox in large part because with such a wide range of independent guesses, each participant’s individual biases and errors essentially canceled one another out. The process systematically eliminated outliers and pushed the estimate toward the fat middle of the bell curve.

But rare diseases aren’t found in the middle of the bell curve. They are, by definition, uncommon outliers that ignore conventional wisdom and frustrate our usual approaches. And, because of that, their solutions don’t lend themselves to the aggregation of a sea of piecemeal contributions. They ask that one smart person take a step back, put it all together and recognize that it’s not an ox, at all. It’s a zebra.


[1] Rode, Joachim. “Rare Diseases: understanding this Public Health Priority.” (2005).

[2] Faurisson, Francois. “EurordisCare2: Survey of diagnostic delays, 8 diseases, Europe.” (2004).

[3] Surowiecki, James. The wisdom of crowds. Anchor, 2005.

[4] Meyer, Ashley ND, Christopher A. Longhurst, and Hardeep Singh. “Crowdsourcing diagnosis for patients with undiagnosed illnesses: an evaluation of CrowdMed.” Journal of medical Internet research 18.1 (2016).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s