Adaptation of a Bayesian conditional false discovery rate

We have a new paper out on bioarXiv, and in press in PLOS Genetics, to do with levering GWAS for similar phenotypes.

Say we have two large GWAS, for diseases A and B, and at a particular SNP the p values are 1e-7, 1e-4. What can we say from this?
– It looks like the SNP is associated with disease A, though it does not quite reach GW-significance
– It may also be associated with disease B, but the evidence is not so good.
– It is likely to be genome-wide significant for A OR B; but…

It is tempting to use one p-value to inform the other; the likely association with disease A should raise our suspicions about association with disease B. However, the information shared is not symmetric; even if we account for the mutual low P-values, there is better evidence for association with disease A than with disease B. And what if the diseases are completely unrelated? The low p values are more likely to be just coincidence.

One elegant way to approach this problem is to analyse the ‘conditional false discovery rate’ (cFDR), developed by Andreasson et al in 2013. The idea behind this is to only consider a subset of SNPs with p values for disease B less than some threshold, and assess association for A amongst only these SNPs, using a Benjamini-Hochberg/Storey’s Q like procedure. Ultimately, it estimates:


that is, the posterior probability that a SNPs is not associated with phenotype A (H0(a)) given cutoffs on the p values (Pa, Pb) for phenotypes A and B.

This tests the hypothesis of association with only one disease at the time, while incorporating information from the other phenotype, and accounting for the overall similarity between the two phenotypes. This fundamentally simple idea allows us to tidily address the above problem

One shortcoming of the original technique was that it required that the two p values be distributed independently at null SNPs. This implied that the two GWAS in question had to share no controls. We extended the technique to allow controls to be shared between studies, increasing the potential power (ie, larger control groups) and applicability (only needing summary statistics and knowledge of control overlap) of the technique.

The major part of this work was developing a way to adjust the p value for A to an ‘expected quantile’ which accounted for the shared controls. Exactly how much to adjust it depends on what the true allelic difference for the SNP is for phenotype B, which we call ‘eta’. Essentially, the shared controls ‘pull’ the observed effect size for A towards eta. Obviously, we do not know eta, but if we consider it to be a random variable, we can get a handle on its distribution by looking at the observed effect sizes for phenotype B – a neat instance of the empirical Bayes technique.

On the advice of a reviewer, we investigated what happens if we wrongly estimate the distribution of eta (not much), and proved some inequalities bounding the expected quantile.

We noticed a difficulty with the technique in terms of limiting the overall false discovery rate amongst ‘discovered’ SNPs. We compute the cFDR ‘at’ a SNP as the probability that a SNP with smaller p values is null. This is analogous to Storey’s Q value, or the Benjamini-Hochberg technique – but there is an important caveat. If we declare non-null all SNPs with Storey’s Q less than some value alpha, the total false discovery rate is less than alpha. This, unfortunately, does not hold for the cFDR!

We found an aesthetic but inefficient geometry-based bound on the overall FDR, but it is not optimal.

This makes it quite hard to decide on a cutoff for significance. We settled on ‘the smallest cFDR amongst SNPs with p value for phenotype A of less than 5e-8’. However, by doing this, we make redundant any direct comparison with the number of SNPs ‘discovered’ by p-value alone.

So how well did it work?

It worked OK.

We applied the technique to ten autoimmune diseases, and were able to declare non-null a few SNPs with p values greater than 5e-8. We also found a few new SNP-disease associations, although we obviously cannot stand by these until they are replicated (which we are not looking to do). Generally, using the technique bolstered the number of SNPs discovered by about 10% in similar phenotypes, albeit without being as sure about the overall false discovery rate.

Hopefully, we can start to use the technique to look at ‘how’ similar two phenotypes are; since the cFDR inherently uses this information, perhaps we can recover it. We would also like to move ‘backward’ and start looking at disease subtypes rather than separate diseases.

Thanks for reading!


Our paper

Original cFDR paper


Author post: A hybrid SNP/qPCR approach for large scale association testing in KIR

SNP arrays are a great way of cheaply genotyping a large number of individuals for genome-wide association studies.

This is what you expect a typical SNP to look like:


There are three clearly distinguishable clusters representing the three possible genotypes at that locus (TT, TC and CC).

But some SNPs look like this:


What is going on here?

Continue reading

Increased IFN signalling is a risk factor for the development of the first autoimmune events in T1D

Our group has a manuscript just out in Diabetes in which we have investigated the role of type 1 interferon signalling in the pathogenesis of the autoimmune disease T1D. The work was led by Ricardo Ferreira and Hui Guo. Type 1 interferon (IFN) signalling is a evolutionarily conserved biological pathway that plays a major role in the defense against viral infections. Every mammal expresses IFN genes and birds, amphibians and fish also express functionally homologous molecules. However, a side effect of the IFN responses is that they can also cause bystander tissue damage and can also lead to the activation of an autoimmune response. In fact, in humans chronically activated IFN signalling has been recently implicated in the aetiology of several systemic autoimmune diseases such as systemic lupus erythematosus (SLE) or vasculitis. Importantly, in T1D, genetic evidence from genome-wide association studies has pointed to an important role of this biological pathway in this disease, including the identification of IFIH1, a major sensor of viral infections, as a susceptibility gene.

Continue reading


We’re hiring!

We just advertised a new position (two years, in the first instance), at postdoc or senior postdoc level.  This is an opportunity to develop and apply to support the DIL’s aim of understanding the mechanism through which genetic variation can influence risk of type 1 diabetes.  We use extensive molecular biological phenotyping both of healthy individuals who carry genetic susceptibility variants and, within the context of intervention trials, of individuals with new onset diabetes. We are located in the Cambridge Biomedical Research Campus and have strong collaborative links with the MRC Biostatistics Unit under its recently appointed Director, Professor Sylvia Richardson. 

See for more details.  Informal enquiries are welcome and may be addressed to me <>, or to John Todd <>.