Another colocalisation paper

I am a co-author on another paper about colocalisation posted on arXiv. It’s a novel approach, using Bayesian inference based on Approximate Bayes Factors derived from p values, making colocalisation testing much more practical when data is not often as open access as claimed. My co-author, Vincent Plagnol, has written a nice post about it on Haldane’s Sieve. The software to conduct these tests is in the coloc package, v2.0 now available on CRAN.


Some thoughts on open and community sourced peer review

We have revised our paper on arXiv detailing some work on colocalisation analysis, a method to determine whether two traits share a common causal variant.

I have been following the debate about open peer review: not just reviewers for traditional journals signing their reviews, but the idea of community-sourced peer review: you publish your paper when you are happy with it, other scientists comment and point out weaknesses, you revise it appropriately and publicly. This all sounds like a great idea, and is part of why this paper was first published on arXiv. I really care about the appropriate use of statistical methods for colocalisation and think the topic of data integration is an important one. Having the paper on arXiv has been useful for sharing my paper with others, and for giving a reference url in talks. But, although some people have told me they read it, and I know some are using the software, no one has given me any criticism of the paper itself.

Continue reading

Post-hoc power calculations

Someone who had done a candidate gene study which uncovered no evidence for association asked me whether he should perform a power calculation. Yes, candidate gene studies have been widely and justifiably criticised, mostly because of small sample sizes and over-interpretation of results, but in this particular case, a candidate gene study wasn’t so bad – it’s a great biological candidate, impossible to genotype using GWAS chips, and he had a sample size close to an order of magnitude larger than previous studies. But, a post-hoc power calculation? I may have had a slightly over-dramatic reaction.

Power calculations are great. I really like biologists who want to do power calculations without me having to prod them with pointy sticks. But the appropriate time for a power calculation is when a study is designed. They address the question “how big a sample do I need to have a good chance of detecting an effect of a size I believe may exist?” or, alternatively, “if I can collect this many samples, do I have a good chance of detecting an effect of a size I believe may exist?”

Continue reading

Why post on arXiv

I’ve just posted my first paper on arXiv.  Why?  Well, all the cool kids are doing it 🙂  But mainly because I’ve thought quite a lot about its subject, I’ve finished the paper, I’m excited about the results, I want to talk about it NOW, not in 6 months or whenever it gets through reviews and possibly (multiple?) rejections.  It’s also a field I know others are working in, and by posting to arXiv before it gets published I am ensuring I don’t get scooped.  This argument seems odd in the world of biology, where people can hang onto results until papers are accepted for fear someone else is going to copy their experiments, but it’s true. No one can scoop a result published on arXiv, because once the paper is there, the idea is published, albeit in preprint format, with authors and publication date and everything.

There are other arguments for posting on arXiv of course, including making sure work is open access, but as I am submitting to an open access journal, for me it is mainly about immediacy.

Continue reading

New paper on colocalisation testing

We have a new paper on arXiv detailing some work on colocalisation analysis, a method to determine whether two traits share a common causal variant. This is of interest in autoimmune disease genetics as the associated loci of so many autoimmune diseases overlap 1, but, for some genes, it appears the causal variants are distinct. It is also relevant for integrating disease association and eQTL data, to understand whether association of a disease to a particular locus is mediated by a variant’s effect on expression of a specific gene, possibly in a specific tissue.

Continue reading