Say we have two large GWAS, for diseases A and B, and at a particular SNP the p values are 1e-7, 1e-4. What can we say from this?

– It looks like the SNP is associated with disease A, though it does not quite reach GW-significance

– It may also be associated with disease B, but the evidence is not so good.

– It is likely to be genome-wide significant for A OR B; but…

It is tempting to use one p-value to inform the other; the likely association with disease A should raise our suspicions about association with disease B. However, the information shared is not symmetric; even if we account for the mutual low P-values, there is better evidence for association with disease A than with disease B. And what if the diseases are completely unrelated? The low p values are more likely to be just coincidence.

One elegant way to approach this problem is to analyse the ‘conditional false discovery rate’ (cFDR), developed by Andreasson et al in 2013. The idea behind this is to only consider a subset of SNPs with p values for disease B less than some threshold, and assess association for A amongst only these SNPs, using a Benjamini-Hochberg/Storey’s Q like procedure. Ultimately, it estimates:

Pr(H0(a)|Pa<pa,Pb<pb)

that is, the posterior probability that a SNPs is not associated with phenotype A (H0(a)) given cutoffs on the p values (Pa, Pb) for phenotypes A and B.

This tests the hypothesis of association with only one disease at the time, while incorporating information from the other phenotype, and accounting for the overall similarity between the two phenotypes. This fundamentally simple idea allows us to tidily address the above problem

One shortcoming of the original technique was that it required that the two p values be distributed independently at null SNPs. This implied that the two GWAS in question had to share no controls. We extended the technique to allow controls to be shared between studies, increasing the potential power (ie, larger control groups) and applicability (only needing summary statistics and knowledge of control overlap) of the technique.

The major part of this work was developing a way to adjust the p value for A to an ‘expected quantile’ which accounted for the shared controls. Exactly how much to adjust it depends on what the true allelic difference for the SNP is for phenotype B, which we call ‘eta’. Essentially, the shared controls ‘pull’ the observed effect size for A towards eta. Obviously, we do not know eta, but if we consider it to be a random variable, we can get a handle on its distribution by looking at the observed effect sizes for phenotype B – a neat instance of the empirical Bayes technique.

On the advice of a reviewer, we investigated what happens if we wrongly estimate the distribution of eta (not much), and proved some inequalities bounding the expected quantile.

We noticed a difficulty with the technique in terms of limiting the overall false discovery rate amongst ‘discovered’ SNPs. We compute the cFDR ‘at’ a SNP as the probability that a SNP with smaller p values is null. This is analogous to Storey’s Q value, or the Benjamini-Hochberg technique – but there is an important caveat. If we declare non-null all SNPs with Storey’s Q less than some value alpha, the total false discovery rate is less than alpha. This, unfortunately, does not hold for the cFDR!

We found an aesthetic but inefficient geometry-based bound on the overall FDR, but it is not optimal.

This makes it quite hard to decide on a cutoff for significance. We settled on ‘the smallest cFDR amongst SNPs with p value for phenotype A of less than 5e-8’. However, by doing this, we make redundant any direct comparison with the number of SNPs ‘discovered’ by p-value alone.

So how well did it work?

It worked OK.

We applied the technique to ten autoimmune diseases, and were able to declare non-null a few SNPs with p values greater than 5e-8. We also found a few new SNP-disease associations, although we obviously cannot stand by these until they are replicated (which we are not looking to do). Generally, using the technique bolstered the number of SNPs discovered by about 10% in similar phenotypes, albeit without being as sure about the overall false discovery rate.

Hopefully, we can start to use the technique to look at ‘how’ similar two phenotypes are; since the cFDR inherently uses this information, perhaps we can recover it. We would also like to move ‘backward’ and start looking at disease subtypes rather than separate diseases.

Thanks for reading!

Links:

]]>Haldane’s Sieve: Olly Burren writes about our latest preprint on arXiv, a method for relating GWAS summary statistics to functionally defined gene sets which doesn’t require access to raw genotyping data.

]]>This is what you expect a typical SNP to look like:

There are three clearly distinguishable clusters representing the three possible genotypes at that locus (**TT**, **TC** and **CC**).

But some SNPs look like this:

What is going on here?

`Well, the last plot is a SNP from a complex genomic region known as KIR. Genes in this region exhibit great allelic copy number diversity. Consequently, SNP probes in this region can bind to several copies of a particular allele, which leads to noisy multi-cluster signals, such as the one pictured here. Moreover, since little is known about the KIR region, the SNP probes may not always bind to the expected locations within those genes, which complicates further the interpretation of the signal.`

Wouldn’t it be nice if we could make some sense out of these SNPs to utilise this data for large scale association studies?

One naive approach would be to assume that there is a straight one-to-one mapping between SNP clusters and single gene copy numbers. This idea is good for detecting regions of common copy number variation but from our experience in the KIR region, the clusters are hard to distinguish and cannot be mapped to copy number variation of a single gene. In order to explain what is really going on, we need to resort to a different technology.

Ideally, we would like to fully sequence the KIR region in a large number of individuals. But because of great sequence similarity in this region, very long reads would be required for correct assembly. However, we have a more targeted, cheaper and readily available technology at our disposal for measuring copy number variation: quantitative Polymerase Chain Reaction (qPCR).

The idea is simple and, we found, can work remarkably well: first do qPCR in a subset of samples, then use supervised classification to link qPCR copy numbers to SNPs patterns.

For example, if we do qPCR for the *KIR3DL1**/**3DS1* genes on a subset of samples for the above SNP, we get:

This is the approach we developed in our recently published BMC Genomics paper and applied to testing *KIR3DL1/3DS1* copy number association with T1D.

Notice, however, that certain qPCR samples lie within the wrong SNP copy number cluster. For example, samples with a qPCR copy number of 0-2 lie in the SNP cluster 1-1. Here, we attribute the error to imperfect linkage disequilibrium between the tagging SNP and target genes: this SNP does not in fact lie in the *KIR3DL1* or *KIR3DS1* genes but in the neighbouring gene KIR2DL4**005, an allele which undergoes copy number variation along with *KIR3DL1/3DS1.*

This idea of imputing KIR genes from tagging SNPs in the region is something that other groups are researching. And we know from attending ASHG 2013, of the ambitious ongoing work by Gil McVean and collaborators (poster 1919W) at Oxford to extend this approach to all KIR genes. We are very interested in seeing the outcomes of their research (or for that matter anyone else’s who is imputing KIR copy number from SNP data).

In the immediate future (until long read sequencing becomes sufficiently cheap), we would like to see similar hybrid qPCR/SNP approaches applied more widely to leverage existing SNP datasets, so that non-genotypable regions like KIR can be assessed more thoroughly and with sufficient power.

We hope that our work might inspire you to revisit your GWAS SNP data and carefully select samples on which to do qPCR, to conduct similar analysis for regions of common copy number variation. We would recommend preferentially selecting samples to qPCR from smaller SNP clouds, since these are likely to correlate with rarer copy number groups (for example the 3-0 group above). This could achieve better prediction rate for a smaller number of samples (as we suggest in Figure 4 of our paper).

In particular, it would be great to see adoption of this approach in KIR association studies which have so far been hindered by embarrassingly small sample sizes (especially when large case-control ImmunoChip cohorts are already available).

]]>

In this study, we were interested in investigating if T1D patients showed evidence of an exacerbated IFN response, by measuring the global transcriptional profile of blood cells by microarray. One major challenge was to integrate a large amount of transcriptional data into a quantitative metric of IFN responses. For that we identified a set of IFN-inducible genes based on timecourse data from stimulated cells, and applied a principal components (PC) analysis. Between the T1D low and high expression groups and between the control groups, clear batch effects were observed, which could not be removed by various normalisation methods. We were unable to use the standard PC correction as this also removed evidence for the IFN signature within batches. Instead, we projected the T1D cases and controls onto the first PC that explained over 60% of variation in the homogeneous group – SLE samples – to circumvent batch effects. This first PC was then defined as the quantitative measure of an underlying IFN signature.

In comparison with SLE patients, we found that established T1D patients cannot be clearly clustered according to the expression of this IFN signature. However, we were also able to characterise the IFN signature in a large prospective birth cohort of children at high risk of developing T1D (BABYDIET), with longitudinal expression measurements. Linear mixed models were fitted to allow for within-subject correlations. Interestingly, in this cohort, we found evidence for an increased expression of IFN-inducible genes before the development of T1D-specific autoantibodies which correlated temporally with parental reports of recent viral infection.

The relationship of IFN gene expression with future autoantibody detection was replicated in a completely independent study from a Finnish cohort, in a co-submitted manuscript (link?), which strongly supports the hypothesis that increased IFN signalling is a risk factor for the development of the first autoimmune events in T1D.

This study was only possible through collaboration with Ezio Bonifacio and Annette Ziegler who are responsible for the unique BABYDIET cohort of children at risk from type 1 diabetes, followed longitudinally from birth. We thank those children and their families for their participation in this study which continues to reveal information about the earliest events preceding type 1 diabetes.

]]>We just advertised a new position (two years, in the first instance), at postdoc or senior postdoc level. This is an opportunity to develop and apply **statistics** to support the DIL’s aim of understanding the mechanism through which genetic variation can influence risk of type 1 diabetes. We use extensive molecular biological phenotyping both of healthy individuals who carry genetic susceptibility variants and, within the context of intervention trials, of individuals with new onset diabetes. We are located in the Cambridge Biomedical Research Campus and have strong collaborative links with the MRC Biostatistics Unit under its recently appointed Director, Professor Sylvia Richardson.

See bit.ly/1966vUB for more details. Informal enquiries are welcome and may be addressed to me <chris.wallace@cimr.cam.ac.uk>, or to John Todd <john.todd@cimr.cam.ac.uk>.

]]>I have been following the debate about open peer review: not just reviewers for traditional journals signing their reviews, but the idea of community-sourced peer review: you publish your paper when you are happy with it, other scientists comment and point out weaknesses, you revise it appropriately and publicly. This all sounds like a great idea, and is part of why this paper was first published on arXiv. I really care about the appropriate use of statistical methods for colocalisation and think the topic of data integration is an important one. Having the paper on arXiv has been useful for sharing my paper with others, and for giving a reference url in talks. But, although some people have told me they read it, and I know some are using the software, no one has given me any criticism of the paper itself.

In parallel, I had submitted it to a traditional journal. The model there is that reviewers get asked by editors to spend some time reading a manuscript and comment in detail. This system has been subject to plenty of valid criticism for the delays introduced, and the often contrary reviewer 3, but the comments from one of the reviewers for this paper were fantastic: detailed, critical about areas where I had not been clear, and suggesting some important additional things to explore. I know, often, reviewers’ comments can seem pedantic, but occasionally you find a reviewer like this, who takes the time to read carefully everything you write, and spots holes and ways to improve it.

I don’t know who the reviewer is. I suspect s/he is a statistician working in or familiar with genetics, but perhaps not in the specific area tackled by my paper. I don’t know whether, under the community-sourced open peer review model, someone like this would have even read my paper. As a reviewer, I have to base my decision to review a paper on the fact that the editor thought I would know something about it and my reading of the abstract. Sometimes, the abstract doesn’t match the paper that well, and I realise that it will require two weeks of wading through treacle, or doing some unanticipated background reading in order for me to be able to give a fair and helpful review. Under a community sourced model, I would be able to see the paper in its entirety and would probably say no in some cases where I would have accepted based on the abstract alone.

There are some good arguments here about how reviewers could be motivated to submit reviews in an open peer review system, but I worry that it may cause us all (as reviewers) to focus on a smaller area of science: that most closely related to our work and which we feel most able to make comment on. The reviewer who is less intimately involved and can give an outside perscpective is often very useful. I don’t fully understand how the community-sourced peer review model will work. I like it as an ideal, but I worry a little about the practical reality of its implementation.

]]>Power calculations are great. I really like biologists who want to do power calculations without me having to prod them with pointy sticks. But the appropriate time for a power calculation is when a study is designed. They address the question “how big a sample do I need to have a good chance of detecting an effect of a size I believe may exist?” or, alternatively, “if I can collect this many samples, do I have a good chance of detecting an effect of a size I believe may exist?”

If the power is low, then the proposed study is unlikely to reveal anything useful, and effort needs to be put into accessing more samples. But once a study has been completed and analysed, retrospective, post-hoc power calculations should not be done. Ever. ^{1}

There are two kinds of post-hoc power calculations for null studies. One uses the effect size estimated from the data to calculated the “observed power”. As the observed power has a one-to-one relationship with the p value from any study, it should be clear this can add no information to any analysis. The other kind asks “what is the smallest effect size I had power to detect with the sample I collected”. Ignoring that this question should have been asked before the study began, it is now meaningless. If you didn’t find association, does that mean you can rule out the chance that such an effect size exists? Of course not! If you estimated the effect size associated with 80% power as is common, your study could well be in the 20%, how could you tell?

Instead, the data that have been so carefully collected should be used to infer what possible possible underlying effect size can be declared unlikely. The confidence interval is a good place to start. Its definition can seem a little convoluted, as it is based on the frequentist notion of repeated sampling. If you repeated the experiment 100 times, and constructed a 95% confidence interval in the same manner each time, you could expect that in 95 of your 100 experiments the interval would include the true value of the parameter in the population. So, given that any given study is more likely to be in the 95% rather than the 5%, it is reasonable to conclude that the true value of the parameter is unlikely to lie outside the estimated confidence interval. If 5% seems too big, you could always construct a 99% confidence interval.

Reading around a little for this post, I found one absolutely fantastic reference ^{2}. This blog post could have just said “Read Hoenig and Heisey to understand why post-hoc power calculations shouldn’t be performed”.

^{1} NB I don’t mean that power calculations addressing the sample size needed to replicate the study (detect association again in an independent sample) should be avoided, as they are clearly prospective. I mean it is meaningless to ask “what is the power of this study I have just performed”.

^{2} John M. Hoenig and Dennis M. Heisey The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis The American Statistician 2001 55(1):19-24