Showing posts with label genome-wide association. Show all posts
Showing posts with label genome-wide association. Show all posts

Thursday, January 08, 2015

Where Are the Alcoholism Genes?

Alcohol abuse is the third leading cause of preventable death in the U.S. (behind smoking and obesity) creating a burden to the U.S. economy of about $200 billion a year. Data from twin studies (and other studies) have long suggested a strong genetic component to alcoholism. It's widely claimed that the heritability of alcoholism is about 50%; this number is supported by the National Institute on Alcohol Abuse and Alcoholism, for example, as well as at least one meta-analysis of twin and adoption studies. (For a dissenting meta-analysis, see this paper.)

No one expects that alcoholism will turn out to be a simple matter of inheriting one or two "alcoholism genes" (although at one time, this was in fact the hope). Instead, it's expected that multiple genetic features probably interact to produce the propensity to develop alcohol dependence. Even so, if alcoholism is as heritable as people say it is, and if it involves a tractable number of genes (dozens or scores, rather than hundreds or thousands), candidate genes should be fairly easy to discover using the powerful new tools of DNA analysis. Candidate SNPs (single nucleotide polymorphisms) should jump right off the page and hit you in the face. If gene copy number variants are involved, it should be readily apparent. If genes are involved at all, it should be pretty obvious pretty fast.

That's not what we find.

Genome-wide association studies (GWAS) are able to account for only a small proportion of the "known" heritability of alcoholism. This "small proportion of heritability" problem is true also for GWAS investigations of things like body height (a trait that's even more heritable than alcoholism), where only 10% of the variability of human body height can be accounted for by the more than 100 genetic features thought to control it. In fact, the general failure of GWAS to account for heritable conditions is so widespread, papers have been written about it.

The general idea behind GWAS is the "common disease, common variant" hypothesis, which says that common diseases are attributable to potentially many allelic variants (genetic differences) present in a sizable percentage of the population. In GWAS, several hundred thousand to more than a million single nucleotide polymorphisms (SNPs) are assayed, per person, times hundreds (or often, thousands) of individuals, and statistics for the expected (versus actual) rate of occurrence of various SNPs are calculated. SNPs that appear to "travel together" down through the generations are assessed for linkage disequilibrium to determine if they are associated more often than normal population statistics would predict. The data sets generated in GWAS are enormous and the statistical calculations challenging. For a description of the technique, see this article; also see this piece for criticisms of the technique (and a spirited rebuttal of criticisms).

One of the best known GWAS alcoholism studies is the so-called German study, by Treutlein et al. (2009). This study (which was likely underpowered, statistically; and the authors admit this) included 1460 study subjects with DSM-IV alcohol dependence and 2332 control individuals. The study claimed to have detected and replicated evidence for association of 15 markers to alcoholism, although only 9 SNPs were in genes (the rest were in what used to be called "junk DNA") and only two met criteria for genome-wide significance. The two SNPs that met criteria for genome-wide significance were colocated (5000 base-pairs apart) in chromosomal region 2q35 (the long arm of chromosome 2).

Less well known is the 2012 GWAS conducted on two samples (N=1721 and N=1113) of Korean men by Baik et al., which found that a dozen SNPs on chromosome 12 had genome-wide significant associations with alcohol consumption. Most of the SNPs were in intronic regions of genes C12orf51, CCDC63, MYL2, OAS3, CUX2, and RPH3A. Three of these successfully replicated in the second test arm. These locations were not the same as the ones found in the German study (nor the studies below). On the other hand, it's not clear whether results applicable to Korean men will be applicable to non-Korean men, or women. Also, this was not a case-control study, but an imputation-based study using an Asian subset of HapMap baseline data.

Another study, often called the SAGE study, by Bierut et al. (2010), looked at 1897 alcohol-abuse subjects and 1932 controls. Here, 15 SNPs of interest were identified, but only one SNP (identified as rs13160562) was among the 15 identified in the German study, and it did not reach genome-wide significance. Said the researchers: "Our top 15 association signals were not replicated in the independent datasets nor did the present study replicate the two genome-wide significant results reported by Treutlein and colleagues." (One of the independent datasets was a large family-based study of 258 families with more than 2,000 genotyped individuals, while the second was a study of alcohol-dependent men and community-based comparison subjects of German descent.)

In 2012, Heath et al. looked at 8754 individuals (2062 of whom were alcohol dependent) from the Australian Twin Registry. It's hard to interpret the Heath writeup, but it appears some of the study population were twins and some were parents of the twins, while all of the controls were non-alcohol-dependent twins. (So imagine that. A twin study that's also a GWAS.) No findings reached genome-wide significance (p=8.4×10−8 for this study). Of the top 400 SNPs, 65% had effect sizes of less than 0.25% of the variance (when ranked by p-value for association with heavy drinking), and only 7.5% had effect sizes greater than 0.3%, with a median effect size of 0.23%. So in other words, the main finding was one of small effect sizes. The Heath writeup was quite clear on this point, stating: "The primary conclusion from these analyses is that, as for many other complex phenotypes (e.g. body-mass index), effect sizes for the contribution of individual genetic variants to differences in heaviness of alcohol consumption and alcoholism risk are small, perhaps accounting for as little as one-tenth of one percent of the variance."

The fact that Heath et al. found 30 SNPs with effect size 0.3% or more is not terribly helpful from an alcoholism-prediction standpoint, because while in theory those 30 SNPs, put together, might account for 10% of phenotypic variance (if their effects are addictive), in reality it's unlikely you'd inherit all 30 defects together (and it's not clear whether you'd have to inherit one copy of each, or two). But even if you did, it raises your alcoholism risk 10%—not the 50% that's so often claimed.

The Heath group also reported: "No SNPs or tagging SNPs confirmed in the German Alcoholism GWAS study were replicated in our analyses, nor were any SNPs or tagging SNPs identified as the most strongly associated SNPs in the SAGE study." In other words, the German results didn't replicate in the Heath study, nor did the results of Bierut et al.

Edenberg et al. (2010), in a case-control study involving 1399 people, found somewhat more encouraging results (arguably). Although no single SNP met genome-wide criteria for significance, the researchers found that 14 out of 140 top SNPs from the German study "were significant in our primary analysis of alcohol dependence" (no p-value stated). So in some sense, the German results replicated for Edenberg (although it's by no means clear that Edenberg's 14 SNPs were among the German group's top 15 SNPs; read the study carefully; it just says 14 of the top 140). Also, using a variety of types of evidence, Edenberg et al. argue that a cluster of 6 genes on chromosome 11 are associated with alcohol dependence, but no effect sizes were given.

It should be noted that genome-wide significance (mentioned several times above) is a fairly stringent criterion. The need for it arises from the fact that any time you look at frequency numbers for a million data points, some are statistically likely to jump out (producing false positives) unless you correct for the sheer number of data points.

My reading of the studies is that no smoking pistols were found; some of the SNPs are suggestive of a relationship to alcohol dependence, but I wouldn't exactly call the results strong, because in general, the studies all found different SNPs in different genes (with only a couple of exceptions), and many of the SNPs were not in genes. (Some of those that were, were in introns.) The studies were not terribly robust in terms of replication (neither within studies nor between them). Certainly, any hope of finding an "alcoholism gene" (singular) are dashed. In fact, it appears unlikely that any gene, of any kind, will be found that can explain more than a few tenths of a percent of your likelihood to have gotten "alcoholism" from your parents or grandparents.

How, then, is alcoholism heritable? We know it runs in families, yes. But that doesn't mean it's more than a few percent genetic. Lots of things that aren't genetic run in families: poverty, political beliefs, the Tooth Fairy myth, etc. (An affirmative answer to the question “Did you have your back rubbed?” has been shown to be 92% heritable for males and 21% heritable for females.)

Environment, in all likelihood, is the great multiplier, the amplifier that brings out the weak signal in DNA; that's what future research will probably end up telling us. That's what the research tells us now.

I'm in the process of finishing a book on mental illness. It's an evidence-based book, part science, part memoir. (It's not as technical as this post, though! It's meant for non-technical readers.) For more info, sign up for the mailing list, and return here often. Thanks!

Tuesday, January 06, 2015

The Problem of "Missing Heritability"

When the human genome was sequenced in 2003, the expectation was that scientists, armed with a powerful array of new genetic techniques, would very quickly identify the genetic correlates of things like schizophrenia, depression, autism, and alcoholism, which we supposedly "know" have a large genetic component. Extremely large, well-funded genome-wide association studies (GWAS) were begun. (See this PLoS paper to learn how such studies are conducted.) Surprisingly, the studies failed, for the most part, to converge on genes, gene copy number variants, SNPs (single nucleotide polymorphisms), gene rearrangements, mutations, or other peculiarities of allelic architecture whose presence could predict important diseases. When genetic loci of interest were found, the findings often didn't replicate in followup studies; or if they did, the relative odds ratios (a measure of the ability of genetic features to predict the prevalence of a trait) fell far short of explaining the known magnitudes of various traits in target populations.

If things like schizophrenia and alcoholism truly do have a strong genetic basis (as we've been told), and if they were to involve tractable numbers of genes (say, dozens or scores of genes, rather than hundreds or thousands), DNA studies of the GWAS kind should produce immediate, strong, recognizable genetic signatures of disease. The results should leap off the page. But they don't. Time and again, the very few candidate alleles that are found are either found in low numbers, or have low "penetrance" (low capacity to predict disease), or both, and quite often the candidate alleles that are found are not confirmed by followup studies.

This has given rise to  major crisis in genetics, summed up in a paper called "Finding the missing heritability of complex diseases" that appeared in Nature in 2009. Scientists are desperate to explain the "missing heratibility" of disorders we "know" are genetic.

It's assumed, by most scientists, that the failure of genome-wide association studies to find genetic explanations of complex diseases can be attributed to such studies' low power and resolution for genetic variations of modest effect. (So the quest has begun to increase the study size of future trials, in hopes of seeing more robust results.) In addition, there's a reasonable expectation that many complex diseases will doubtless be found to involve numerous genes, each contributing only a small effect to the total. It's also been suggested that some traits are dictated by extremely rare genetic features with high "penetrance." There's also an awareness that current laboratory techniques have low power to detect gene-gene interactions. What no scientist wants to say, however—and what none of the 24 co-authors of the Nature article (above) would say—is that maybe the genetic component(s) of schizophrenia, alcoholism, depression, etc. are simply vanishingly small to begin with. Yet that's exactly what the data are telling us. But we won't listen to the data because it doesn't fit our preconception of how the world should work.

We "know" that a person's height is largely controlled by genes. Studies going back almost a century have determined that body height is 80% to 90% heritable; no one seriously questions this fact. (Height is heritable—it "runs in families.") However, at least three large, modern genetic studies have been done to find "height genes"; the largest involved over 180,000 study subjects (and 291 co-authors). In all, some 180 genetic loci were identified that play a role in determining a person's height. But the 180 genomic features, put together, accounted for only 10% of observed variations in height. The rest appears to be environment.
 
Are we now supposed to enlarge our "study population" from 180,000 to several million, in order to find the genetic explanation for body height, just because we "know" one exists?
 
At some point, don't we have to just admit "the data's the data"?

Shouldn't we be willing (at least provisionally) to entertain the idea that maybe the twin studies and the data showing that certain things "run in families" (pre-GWAS-era data, mostly) are in need of reevaluation? Shouldn't we at least consider the idea that prior studies misjudged the importance of uncontrolled-for environmental variables? Now that we have powerful DNA-analytic techniques for investigating heritability, and the techniques aren't giving us the results we want, must we "increase the size of the microscopic" to make the results look bigger? Or shouldn't we just accept what the data are telling us (as painful as that may be)?

In a future post, I'll talk about some of the GWAS results for alcoholism. Stay tuned.

This post is, in part, derived from material for a forthcoming book on mental illness I'm writing. Please come back often to find out how to get free sample chapters.