Statistical and Structural Replication

Statistical and Structural Replication

In the first paper I wrote about GWAS, in 2012, I reviewed a paper about GWAS of height by Weedon et al.  After correcting for population stratification, Weedon et al identified a handful of SNPs with genome-wide significant correlations with height (news at the time), and declared, “This means that the associations are likely to reflect true biological effects on height.” (p. 580) I noted that they never paused to say what they meant by a “true biological effect”, and suggested that they were confusing two things: statistical significance, which involves testing the null hypothesis that an association between SNP and phenotype was the result of sampling error, and hypotheses about ill-defined “true biological effects,” which, whatever they might be, couldn’t be confirmed by null hypothesis significance testing.  True biological effects are structural hypotheses about the nature of the relationship between the SNP and the phenotype; statistical significance may be a necessary condition for structural significance, but they are certainly not sufficient.

This distinction has never been fully resolved in the GWAS literature, and we worked through it again last week, after I gave a talk at APS about heritability. I went through my usual arguments about the limitations of the heritability concept, including the idea that the heritability of a trait (itself a problem, because traits don’t “have” fixed heritabilies) doesn’t tell us anything about the likelihood of success for gene-finding. Michel Nivard demurred.

I responded with a question:

At that point an argument about replication broke out. Michel pointed out that a recent study of depression had replicated most of the genome-wide significant hits from an earlier study. Steve Pittelli objected, saying that they weren’t really replications, because they had been tested at a much less stringent probability level than the original. Steve was then taken to task for not understanding the math behind Type I error-correction in replication. Tempers flared.

The two sides were talking past each other because one side was talking about statistical replication, and the other about structural replication. For statistical replication, it makes sense to conduct discovery at genome-wide significance, and then when the field of possible hypotheses has been reduced, to conduct replication analyses at ordinary significance levels. This is fine, but it must be borne in mind that only a very limited hypothesis is being tested, whether the association begtween the SNP and the phenotype is different than zero for reasons other than sampling error. Although I couldn’t find the information in Howard et al., presumably the effect sizes of the “replicated” SNPs were considerably smaller than they were in the original GWAS, because of the winner’s curse, but that doesn’t matter if all you care about is statistical replication.

Pittelli wants more ( and although I agree with him that there are more important hypotheses to think about, I part ways about how the case should be argued. Steve is always arguing that SNP associations are “false positives,” by which I think he means that the observed SNP correlations don’t really exist, that they would go away if the GWAS people didn’t spin them into existence. I think more or less the exact opposite. Based on a combination of the First and Fourth laws of behavior genetics, the Meehlian crud factor, and modern thinking about omnigenics, I am quite happy with the possibility that all SNPs have statistically significant associations with behavioral phenotypes, if you make the discovery sample big enough. It’s a lot like the old discussions about twin studies, where opponents of hereditarian conclusions about them thought they had to deny the results of the twin studies themselves, the very idea that rMZ > rDZ, based on the equal environments assumption and the like. That was a losing battle: rMZ is greater than rDZ, for reasons that can be very broadly characterized as genetic. The hill to die on involves the question of what heritability means, not about the heritability itself.

There is a lesson here for the GWAS community as well.  Simply establishing tiny, uninterpreted, but “significant” associations between SNPs and phenotypes will not lead anywhere interesting. Sometime in the future, some wag will declare the idea that all SNPs are associated with all complex outcomes, “The Fifteenth Law of Behavior Genetics.” The kind of GWAS that has recently evolved, the kind that leads to that foolishness about dog ownership, has an advantage that I long ago identified in twin studies: it can’t fail. If all you want to do is show that dog ownership is heritable, that there are SNPs with significant correlations that are statistically replicable, that it has genetic correlations with other phenotypes, then you are in luck, because it will work every time. Go ahead, do cat ownership, do golden retriever ownership, knock yourself out. But science that works every time is pointless science.

By way of contrast, consider FTO, a gene that was identified with GWAS, and which appears to play an important role in obesity. FTO and the SNPs associated with it are more than just statistically replicable. The FTO-related SNPs aren’t just p<.05 every time a GWAS is run, they are the lead SNPs every time the GWAS is run. They are the lead SNPs in other populations and in other species. That is structural replication, and it is what Steve Pittelli, not unreasonably, is looking for. It seems to me that the proper course for GWAS science is proceeding from statistical replicability to structural replicability, but doing that involves the possibility that it won’t work, that statistical replicability is all there is. In another slide from my APS talk, I (artlessly) made the point that this is what has happened for personality: GWAS has shown, as it must, that personality is heritable and associated with SNPs, but there is no underlying genetic structure.

This is GWAS telling us that there is nothing there, that the null hypothesis is true, and that it is time to move on. I got teased for this and I could have found a nicer way to say it, but it is important to remember that meaningful science is required to run into dead ends occasionally, even most of the time.

There is another way to think about GWAS replicability that is worth mentioning, at the level of a polygenic score rather than individual SNPs.  When Howard et al re-ran the depression GWAS, they generated a new Manhattan plot with a new distribution of effect sizes for all the SNPs. The individual-level statistical replication involved the finding that the previously genome-wide significant SNPs made it past p<.05 the second time around, but as I have noted that hypothesis takes a pass on whether their previous role as lead SNPs at the front of the pack replicated. In fact they appear to have faded considerably: in sample sizes this large, p<.05 significant SNPs can have almost unimaginably small effects. So a more general question is how well the effect sizes of the SNPs replicated across studies. If GWAS was purely random in the Pittelli sense, the answer would be zero, we would be re-rolling the dice every time. But an out of sample polygenic score is essentially a measure of the replicability of SNP effect sizes, whether the same ones come out on the (relatively) large size both times. And the answer is very much that the class is half (or maybe a tenth) full. Polygenic scores do replicate, at a level that is sometimes interesting but often not terribly prepossessing.

Eric Turkheimer

Eric Turkheimer is the Project Leader for the Genetics and Human Agency Project. Eric is a clinical psychologist and behavioral geneticist. For thirty years he has been involved in empirical and theoretical investigations of the implications of genetics for the genesis of complex human behavior. Current projects include understanding the interaction between socioeconomic status and the heritability of intelligence, and philosophical analysis of the ethical status of work that purports to demonstrate biologically based differences in behavior among racial groups.

No Comments

Post A Comment