Tuesday, January 06, 2015

The Problem of "Missing Heritability"

When the human genome was sequenced in 2003, the expectation was that scientists, armed with a powerful array of new genetic techniques, would very quickly identify the genetic correlates of things like schizophrenia, depression, autism, and alcoholism, which we supposedly "know" have a large genetic component. Extremely large, well-funded genome-wide association studies (GWAS) were begun. (See this PLoS paper to learn how such studies are conducted.) Surprisingly, the studies failed, for the most part, to converge on genes, gene copy number variants, SNPs (single nucleotide polymorphisms), gene rearrangements, mutations, or other peculiarities of allelic architecture whose presence could predict important diseases. When genetic loci of interest were found, the findings often didn't replicate in followup studies; or if they did, the relative odds ratios (a measure of the ability of genetic features to predict the prevalence of a trait) fell far short of explaining the known magnitudes of various traits in target populations.

If things like schizophrenia and alcoholism truly do have a strong genetic basis (as we've been told), and if they were to involve tractable numbers of genes (say, dozens or scores of genes, rather than hundreds or thousands), DNA studies of the GWAS kind should produce immediate, strong, recognizable genetic signatures of disease. The results should leap off the page. But they don't. Time and again, the very few candidate alleles that are found are either found in low numbers, or have low "penetrance" (low capacity to predict disease), or both, and quite often the candidate alleles that are found are not confirmed by followup studies.

This has given rise to  major crisis in genetics, summed up in a paper called "Finding the missing heritability of complex diseases" that appeared in Nature in 2009. Scientists are desperate to explain the "missing heratibility" of disorders we "know" are genetic.

It's assumed, by most scientists, that the failure of genome-wide association studies to find genetic explanations of complex diseases can be attributed to such studies' low power and resolution for genetic variations of modest effect. (So the quest has begun to increase the study size of future trials, in hopes of seeing more robust results.) In addition, there's a reasonable expectation that many complex diseases will doubtless be found to involve numerous genes, each contributing only a small effect to the total. It's also been suggested that some traits are dictated by extremely rare genetic features with high "penetrance." There's also an awareness that current laboratory techniques have low power to detect gene-gene interactions. What no scientist wants to say, however—and what none of the 24 co-authors of the Nature article (above) would say—is that maybe the genetic component(s) of schizophrenia, alcoholism, depression, etc. are simply vanishingly small to begin with. Yet that's exactly what the data are telling us. But we won't listen to the data because it doesn't fit our preconception of how the world should work.

We "know" that a person's height is largely controlled by genes. Studies going back almost a century have determined that body height is 80% to 90% heritable; no one seriously questions this fact. (Height is heritable—it "runs in families.") However, at least three large, modern genetic studies have been done to find "height genes"; the largest involved over 180,000 study subjects (and 291 co-authors). In all, some 180 genetic loci were identified that play a role in determining a person's height. But the 180 genomic features, put together, accounted for only 10% of observed variations in height. The rest appears to be environment.
 
Are we now supposed to enlarge our "study population" from 180,000 to several million, in order to find the genetic explanation for body height, just because we "know" one exists?
 
At some point, don't we have to just admit "the data's the data"?

Shouldn't we be willing (at least provisionally) to entertain the idea that maybe the twin studies and the data showing that certain things "run in families" (pre-GWAS-era data, mostly) are in need of reevaluation? Shouldn't we at least consider the idea that prior studies misjudged the importance of uncontrolled-for environmental variables? Now that we have powerful DNA-analytic techniques for investigating heritability, and the techniques aren't giving us the results we want, must we "increase the size of the microscopic" to make the results look bigger? Or shouldn't we just accept what the data are telling us (as painful as that may be)?

In a future post, I'll talk about some of the GWAS results for alcoholism. Stay tuned.

This post is, in part, derived from material for a forthcoming book on mental illness I'm writing. Please come back often to find out how to get free sample chapters.