Thursday, January 08, 2015

Where Are the Alcoholism Genes?

Alcohol abuse is the third leading cause of preventable death in the U.S. (behind smoking and obesity) creating a burden to the U.S. economy of about $200 billion a year. Data from twin studies (and other studies) have long suggested a strong genetic component to alcoholism. It's widely claimed that the heritability of alcoholism is about 50%; this number is supported by the National Institute on Alcohol Abuse and Alcoholism, for example, as well as at least one meta-analysis of twin and adoption studies. (For a dissenting meta-analysis, see this paper.)

No one expects that alcoholism will turn out to be a simple matter of inheriting one or two "alcoholism genes" (although at one time, this was in fact the hope). Instead, it's expected that multiple genetic features probably interact to produce the propensity to develop alcohol dependence. Even so, if alcoholism is as heritable as people say it is, and if it involves a tractable number of genes (dozens or scores, rather than hundreds or thousands), candidate genes should be fairly easy to discover using the powerful new tools of DNA analysis. Candidate SNPs (single nucleotide polymorphisms) should jump right off the page and hit you in the face. If gene copy number variants are involved, it should be readily apparent. If genes are involved at all, it should be pretty obvious pretty fast.

That's not what we find.

Genome-wide association studies (GWAS) are able to account for only a small proportion of the "known" heritability of alcoholism. This "small proportion of heritability" problem is true also for GWAS investigations of things like body height (a trait that's even more heritable than alcoholism), where only 10% of the variability of human body height can be accounted for by the more than 100 genetic features thought to control it. In fact, the general failure of GWAS to account for heritable conditions is so widespread, papers have been written about it.

The general idea behind GWAS is the "common disease, common variant" hypothesis, which says that common diseases are attributable to potentially many allelic variants (genetic differences) present in a sizable percentage of the population. In GWAS, several hundred thousand to more than a million single nucleotide polymorphisms (SNPs) are assayed, per person, times hundreds (or often, thousands) of individuals, and statistics for the expected (versus actual) rate of occurrence of various SNPs are calculated. SNPs that appear to "travel together" down through the generations are assessed for linkage disequilibrium to determine if they are associated more often than normal population statistics would predict. The data sets generated in GWAS are enormous and the statistical calculations challenging. For a description of the technique, see this article; also see this piece for criticisms of the technique (and a spirited rebuttal of criticisms).

One of the best known GWAS alcoholism studies is the so-called German study, by Treutlein et al. (2009). This study (which was likely underpowered, statistically; and the authors admit this) included 1460 study subjects with DSM-IV alcohol dependence and 2332 control individuals. The study claimed to have detected and replicated evidence for association of 15 markers to alcoholism, although only 9 SNPs were in genes (the rest were in what used to be called "junk DNA") and only two met criteria for genome-wide significance. The two SNPs that met criteria for genome-wide significance were colocated (5000 base-pairs apart) in chromosomal region 2q35 (the long arm of chromosome 2).

Less well known is the 2012 GWAS conducted on two samples (N=1721 and N=1113) of Korean men by Baik et al., which found that a dozen SNPs on chromosome 12 had genome-wide significant associations with alcohol consumption. Most of the SNPs were in intronic regions of genes C12orf51, CCDC63, MYL2, OAS3, CUX2, and RPH3A. Three of these successfully replicated in the second test arm. These locations were not the same as the ones found in the German study (nor the studies below). On the other hand, it's not clear whether results applicable to Korean men will be applicable to non-Korean men, or women. Also, this was not a case-control study, but an imputation-based study using an Asian subset of HapMap baseline data.

Another study, often called the SAGE study, by Bierut et al. (2010), looked at 1897 alcohol-abuse subjects and 1932 controls. Here, 15 SNPs of interest were identified, but only one SNP (identified as rs13160562) was among the 15 identified in the German study, and it did not reach genome-wide significance. Said the researchers: "Our top 15 association signals were not replicated in the independent datasets nor did the present study replicate the two genome-wide significant results reported by Treutlein and colleagues." (One of the independent datasets was a large family-based study of 258 families with more than 2,000 genotyped individuals, while the second was a study of alcohol-dependent men and community-based comparison subjects of German descent.)

In 2012, Heath et al. looked at 8754 individuals (2062 of whom were alcohol dependent) from the Australian Twin Registry. It's hard to interpret the Heath writeup, but it appears some of the study population were twins and some were parents of the twins, while all of the controls were non-alcohol-dependent twins. (So imagine that. A twin study that's also a GWAS.) No findings reached genome-wide significance (p=8.4×10−8 for this study). Of the top 400 SNPs, 65% had effect sizes of less than 0.25% of the variance (when ranked by p-value for association with heavy drinking), and only 7.5% had effect sizes greater than 0.3%, with a median effect size of 0.23%. So in other words, the main finding was one of small effect sizes. The Heath writeup was quite clear on this point, stating: "The primary conclusion from these analyses is that, as for many other complex phenotypes (e.g. body-mass index), effect sizes for the contribution of individual genetic variants to differences in heaviness of alcohol consumption and alcoholism risk are small, perhaps accounting for as little as one-tenth of one percent of the variance."

The fact that Heath et al. found 30 SNPs with effect size 0.3% or more is not terribly helpful from an alcoholism-prediction standpoint, because while in theory those 30 SNPs, put together, might account for 10% of phenotypic variance (if their effects are addictive), in reality it's unlikely you'd inherit all 30 defects together (and it's not clear whether you'd have to inherit one copy of each, or two). But even if you did, it raises your alcoholism risk 10%—not the 50% that's so often claimed.

The Heath group also reported: "No SNPs or tagging SNPs confirmed in the German Alcoholism GWAS study were replicated in our analyses, nor were any SNPs or tagging SNPs identified as the most strongly associated SNPs in the SAGE study." In other words, the German results didn't replicate in the Heath study, nor did the results of Bierut et al.

Edenberg et al. (2010), in a case-control study involving 1399 people, found somewhat more encouraging results (arguably). Although no single SNP met genome-wide criteria for significance, the researchers found that 14 out of 140 top SNPs from the German study "were significant in our primary analysis of alcohol dependence" (no p-value stated). So in some sense, the German results replicated for Edenberg (although it's by no means clear that Edenberg's 14 SNPs were among the German group's top 15 SNPs; read the study carefully; it just says 14 of the top 140). Also, using a variety of types of evidence, Edenberg et al. argue that a cluster of 6 genes on chromosome 11 are associated with alcohol dependence, but no effect sizes were given.

It should be noted that genome-wide significance (mentioned several times above) is a fairly stringent criterion. The need for it arises from the fact that any time you look at frequency numbers for a million data points, some are statistically likely to jump out (producing false positives) unless you correct for the sheer number of data points.

My reading of the studies is that no smoking pistols were found; some of the SNPs are suggestive of a relationship to alcohol dependence, but I wouldn't exactly call the results strong, because in general, the studies all found different SNPs in different genes (with only a couple of exceptions), and many of the SNPs were not in genes. (Some of those that were, were in introns.) The studies were not terribly robust in terms of replication (neither within studies nor between them). Certainly, any hope of finding an "alcoholism gene" (singular) are dashed. In fact, it appears unlikely that any gene, of any kind, will be found that can explain more than a few tenths of a percent of your likelihood to have gotten "alcoholism" from your parents or grandparents.

How, then, is alcoholism heritable? We know it runs in families, yes. But that doesn't mean it's more than a few percent genetic. Lots of things that aren't genetic run in families: poverty, political beliefs, the Tooth Fairy myth, etc. (An affirmative answer to the question “Did you have your back rubbed?” has been shown to be 92% heritable for males and 21% heritable for females.)

Environment, in all likelihood, is the great multiplier, the amplifier that brings out the weak signal in DNA; that's what future research will probably end up telling us. That's what the research tells us now.

I'm in the process of finishing a book on mental illness. It's an evidence-based book, part science, part memoir. (It's not as technical as this post, though! It's meant for non-technical readers.) For more info, sign up for the mailing list, and return here often. Thanks!