26 Mar Smith-Woolley, Plomin et al. on Genetics and Educational Outcomes
Emily Smith-Woolley, along with Robert Plomin and his team, just published an article entitled, “Differences in exam performance between pupils attending selective and non-selective schools mirror the genetic differences between them.” The title of the article is in my opinion seriously misleading. I have to say that I am very disappointed that such a prestigious team of researchers would choose to emphasize the most genetically deterministic aspects of a study that in fact has practically nothing to do with genetics. Let me explain.
90% of the article consists of an essentially sociological analysis of the effects of schooling on cognitive outcomes. This is the part of the study that has nothing to do with genetics. (I will get to the “genetic” part below.) Children are not assigned at random to different kinds of schools; there is an elaborate selection process based on their family’s background and their own prior level of cognitive performance. So it is reasonable to ask whether school type makes a discernible difference in cognitive performance conditional on all the background variables, but anyone who has ever tried to do this kind of thing knows that isolating an effect of schools in a complex interconnected network of causes and effects, using nothing but linear statistical control, is a profoundly difficult exercise.
It is no surprise that children are admitted into selective schools on the basis of family SES and prior cognitive performance. When you control statistically for those variables the differences in school outcome mostly disappear, which is an interesting enough finding, but you have to note several cautions.
- Although there is nothing wrong with using multiple regression to explore relations in complex networks of inter-correlated variables, described as such it probably would not have been published in Nature.
- Analysis of covariance, which is essentially what they are doing, is simply not capable of isolating causes in situations like these, as any second-year graduate student in sociology could tell you.
- Because the authors think they are doing “genomics,” not circa 1975 sequential multiple regression, they don’t bother to do the elementary statistical things that you are supposed to do in conducting an analysis of this kind, like testing for interactions between the covariates (the background variables) and the main predictor (school type). This the first thing you learn about ANCOVA– explaining between group differences in terms of within-group relations doesn’t make sense unless the within-group regressions are parallel.
- Most important, there is nothing seriously “genetic” about this analysis. The study does not use twins or family members. (I know about the GPS; hang on.) All it does is regress out family background effects from the school effects. This is where the title of the article is so misleading. Because things like socioeconomic status and childhood achievement are partially heritable, the authors refer to them as “heritable selection factors,” thus justifying the phrase in the title about school effects mirroring the “genetic differences” among the children. But of course, everything is heritable, so they could say this about any and every background factor they chose to partial out. And even more crucially, none of the covariates are perfectly heritable, or even close, and the rest of the variance is, just as accurately, environmental. So the title of the paper could have said “mirror the environmental differences between them” and been every bit as accurate.
- “Mirror” is a way of saying “is correlated with” while implying that just maybe you are talking about cause. This kind of slippery language has unfortunately become standard practice in modern behavioral genetics.
Obviously I have avoided the headline analysis in the report: they also have a GPS for educational attainment, show that it differs among the school types, and use it as a covariate in the ANCOVA. Isn’t that what justifies their assertion of “genetic differences” among children attending different kinds of schools? No, if anything it does the opposite. The GPS, which Plomin always refers to as a “game-changer,” changes pretty much nothing in the analysis.
The relation between the GPS and school type is difficult to assess at first, because school-type is represented as a categorical variable, while all the covariates are continuous. The mean differences for GPS among the school types certainly got my attention when I first looked at them. But here is a quiz: what is the size of the relationship between school type and the GPS, expressed as a correlation? The answer is .12. Every analysis of EA that has ever been conducted has shown that it is positively correlated with indicators of social and economic well-being. Now we know that it is also very modestly correlated with going to private school. This is not news.
Moreover, when the GPS is included in the ANCOVA it plays almost no role at all. School type starts out explaining 7% of the variance in cognitive performance. Including the GPS as a covariate reduces this to 6%, including all the other phenotypic covariates, especially prior cognitive performance, reduces it to under 1%. Once again, the authors don’t bother to conduct any kind of formal assessment of variable importance in a complex sequential regression, because they don’t think of themselves as doing that kind of thing, but if they did the GPS would almost certainly be at the bottom of the list. It just doesn’t make any difference.
The part of the paper I am most unhappy with is the theoretical interpretation, starting with the title. It is not a “genetic” analysis except in the weakest sense of the word, and it could just as easily be called “environmental.” The big takeaway line from the paper goes out of its way to promote a genetically determined interpretation of their results:
Although finding genetic differences between state non-selective, grammar and private school students may initially seem surprising, when we consider the heritable traits that selection is based on, this difference is less unexpected. Put another way, students with higher polygenic score for years of education have, on average, higher cognitive ability, better grades and come from families with higher SES, and these students are subsequently more likely to be accepted into selective schools. This results in a system in which children are intentionally phenotypically selected, but unintentionally genetically selected.
This could have been lifted right out of The Bell Curve. Don’t get me wrong: if the authors think their data support the hypothesis that socioeconomic educational differences are simply the result of pre-existing genetic differences among the students assigned to different schools, that is their right. But I would prefer that they explicitly defended the idea, instead of letting it float implicitly in the background. The data they report here do nothing to actually make the case in one direction or the other. All it shows is what we already knew: kids are exposed to educational opportunities on the basis of a complex set of socioeconomic and cognitive phenotypes, all of which are the result of both “genes” and “environment” in the classic biometric sense. When you control for all these background variables the association between school type and cognitive outcome mostly goes away, which is interesting, but given the non-experimental research design, not very conclusive. There is nothing in the paper to push one’s thinking in either a genetic or environmental direction.
There is very little new in this paper, and nothing to be alarmed about.