Against Decile Analysis

Against Decile Analysis

One of the interesting characteristics of GWAS reporting is how regimented it is.  The format of old-fashioned social science research was made up fresh by the investigators every time, and part of the fun of reading it was in the creativity of designs, models and reporting procedures that were applied. That created a lot of problems, because all of those creative flourishes were actually investigator degrees of freedom, and they were used– usually without conscious bad intent– to make results look bigger or more novel than they really were.

In many ways, GWAS and its surrounding methodologies are a reaction to the replication crisis. It is well known that the investigator has been taken out of the loop of hypothesis generation about SNPs and genes, but there have also been profound changes in the way results are reported. Papers about GWAS are all exactly the same, reporting the same findings in the same format using the same analyses. Some of them are actually created automatically by bots. As above, I know there are good reasons for this. I don’t object, though I suspect that strict adherence to these unwritten rules will create new problems while solving older ones.

Anyway, one part of the standard analysis that I don’t think is very useful is what has come to be called decile analysis. It is used in the evaluation of polygenic score correlations. These are, of course, usually pretty small.  But the samples are big, and that allows the investigator to see what happens at the extremes, to say, the correlation between the polygenic score and the depression may be only .15, but the odds of depression in the top decile is four times the odds in the lowest decile. 

That sounds a lot better, but as I have said previously, the decile analysis isn’t actually adding something to the correlation. It isn’t as though the investigator is saying, our correlation is only .15, but this particular correlation of .15 happens to have particularly useful properties at the extremes that will make it useful for screening people or whatever. What they are reporting is just a general property of small correlations: given large enough sample sizes, looking at the extremes will produce what appear to be large differences even though the magnitude of the relationship is small. If you had a paper and pencil measure of delinquent behavior in adolescents, and found that it was correlated .15 with actual delinquent behavior, it wouldn’t be considered useful to observe that adolescents at the top and bottom deciles might produce meaningful differences in delinquency. A peer reviewer, I think, would say, “Your validity coefficient is .15; leave it there.”

I have said most of that before. What I want to add now is that there is an obvious old-fashioned alternative to decile analysis that does a much better job representing the predictive properties of small correlations: the standard error of prediction (SEP), along with the confidence intervals it produces.

When using a linear regression to make a prediction about an individual, the SEP quantifies the average amount of error in predicting Y from X.

Where SS is sum of squares, n is the sample size and p is the number of parameters in the regression, typically two (slope and intercept).

In putting confidence intervals around actual predictions, there are two sources of error that can be treated separately. First, there is uncertainty about the location of the regression line. This is a function of sample size, obviously, but also a function of how from the mean of X one is predicting.  We are most certain about the location of the line at the mean, but because we are uncertain about the slope, the total uncertainty gets greater the further away we get. That is why confidence intervals around regression lines have that characteristic concave shape.

The second reason there is uncertainty in regression predictions is because of error in the regression itself, that is scatter of the points around the regression line. This error is not a function of sample size (the regression is imperfect in the population) but is a function of R2.

Accordingly, there are two formulas for confidence errors around predictions: one for the location of the regression line, which is sometimes referred to as the SEP of the mean of Y for a given X. What is the 95% CI around the average weight of all men who are six feet tall? What is the 95% CI around the average IQ of children with a polygenic score 1 SD above the mean? That CI is given by:

Y-hat is the predicted Y at a given X, t will be made clear in the example, s y.x is the SEP

OK, let’s start working some examples. Let’s imagine a standardized polygenic score  in a sample of n=1,000 for predicting IQ, with a mean of 100 and a SD of 15, just to keep things on a familiar scale. We will work the example for two versions of the score, an optimistic one that predicts IQ at r=.3, and a less optimistic one that predicts it at r=.1.

Here is an R function to calculate it. (How do you change font in Gutenberg?)

MeanCI<-function(r,p,n){
+   (100+15*r*qnorm(p))+c(-1,1)*qt(p,n-2)*sqrt(((1-r^2)*225*n)/n-2)*sqrt((1/n)+qt(p,n-2)^2/((n-1)))
+ }

MeanCI(.3,.95,1000)
[1] 105.9733 108.8304
MeanCI(.1,.95,1000)
[1] 100.9767 103.9579

Pretty good, right? With r=.3 we can count on a good 7 points on average, and even as low as .1, we get a reliable couple of points. Wouldn’t it be useful to society to know that?

The problem is that these CIs are for the *mean* of many people with that extreme polygenic score. Contrast it with the CIs for the expected IQ of a single individual at the .95th percentile. Here is a formula:

Now instead of just locating the regression line, we are making a prediction about an individual, which includes the (very large) errors inherent in the fact that the regression isn’t very strong. Here are the results:

> IndivCI(.3,.95,1000)
[1]  83.91547 130.88821
> IndivCI(.1,.95,1000)
[1]  77.96053 126.97403

Yikes! Would you want your kid evaluated for a school curriculum on the basis of a confidence interval like that? You can see that the score is still doing something: the CI’s extend more above 100 than they do below. Here is what I conclude from this example.

  1. Any imagined use of a polygenic score for some collective purpose sets up a conflict of interest between the authority that is administering them and the individuals to which they are applied. A school could increase the average IQs of its pupils by screening them with polygenic scores, but only at the cost of being radically unfair to the individual students. (This isn’t the only problem.)
  2. “Decile analysis” doesn’t make any of this clear. It is, first of all, the first type of analysis: a way of locating the regression line at the extremes. It doesn’t take into account the unpredictability of individuals, and therefore gives an extremely biased view of how the score would perform in the real world.
  3. The fact that polygenic scores are estimated using amazing genomic technology does not excuse them from basic textbook psychometrics. They are, once and for all, correlation coefficients, and should be treated as such.
  4. CIs on the mean and individual prediction should be reported for all risk scores in the future.

Eric Turkheimer
ent3c@virginia.edu

Eric Turkheimer is the Project Leader for the Genetics and Human Agency Project. Eric is a clinical psychologist and behavioral geneticist. For thirty years he has been involved in empirical and theoretical investigations of the implications of genetics for the genesis of complex human behavior. Current projects include understanding the interaction between socioeconomic status and the heritability of intelligence, and philosophical analysis of the ethical status of work that purports to demonstrate biologically based differences in behavior among racial groups.

1Comment
  • RCB
    Posted at 13:19h, 23 July Reply

    It’s simple: decile comparisons provide additional information. Telling me that a correlation is 0.15 does not tell me the magnitude of the differences between people with different polygenic scores. It also does not capture potential nonlinearities in the underlying relationship. Is the correlation driven mostly by a slope in the middle that tapers off near the ends? Or the opposite?

Post A Comment