My background is in maths and stats, and after a stint as a statistical programmer I did a PhD at the London School of Hygiene and Tropical Medicine in the genetics of susceptibility leprosy. My interest had always been in biology, though I was dissuaded from pursuing this at school, so I now work with statistics and computers to analyse large datasets generated by biologists. I have made a conscious choice to base myself within biology labs so that I understand the biological questions of interest, collaborate in experimental design and produce analyses that are relevant, as I believe it is all too easy for statisticians in statistics departments to do beautiful statistics that miss the biological goals. I now work in the Cambridge Institute for Medical Research and recently became head of the DIL stats group.
I am interested in integrating public genomic datasets to better understand the biological pathways underlying susceptibility to human disease, and in applying technologies in new ways to understand the effects of genetic variation, environment or treatment on gene expression or methylation in the small numbers of samples which may be available in the early dose finding studies that are underway into new treatments for type 1 diabetes.
I am interested in exploring how the internet can help with scientific communication, both to other scientists and to interested lay public, and this blog is part of that exploration. I also post snippets of R code I don’t want to
lose and updates on my CRAN packages to another blog and have a more static official page.
I am funded by the Wellcome Trust.
My first degree was in Psychology and Applied maths at Sydney University. Afterwards I worked for 7 years as a statistician for a neuroscience technology company linked with the university. This post included clinical trials analysis, development of markers and cognitive tests, multivariate methods for EEG and MRI, with academic publication. In 2011 I moved to England and completed the MPhil in Computational Biology at Cambridge. As well as a lot of overlap with the work I’d done in neuroscience, there was also a huge amount to learn, the course was full and intense. John Todd from the DIL gave a talk for our cohort and I was very impressed by him and the Cambridge Bioresource database, which is used by the DIL for a range of human genetics research projects.
I arranged for my MPhil research placement to be with John Todd and Corina Shtir at the DIL, on quality control for copy number variation detection using SNP arrays. I initially intended to go back to biotechnology after the MPhil, but I found I was very excited to be learning so much about both biology and statistics in the formal education environment. From several alternatives I chose to continue at the DIL and commence a PhD in Medical Genetics with John Todd and Silvia Richardson as supervisors. I really value the range of large datasets available at the DIL, and the daily exposure to talented bioligists and statisticians within the group and linked to the group.
So far I am continuing my work from my MPhil placement, implementing the CNV method as an R package – ‘plumbCNV’ – to be submitted soon. After that I hope to be working with bayesian algorithms for fine mapping of causal SNPs within Type I Diabetes genes. Other areas of interest include the microbiome and a potential future project might involve an investigation of the influence of the human genome on metabolites mediated by microbiota.
I am on a three year MRC funded PhD in Medical Genetics (2011-2014) at the DIL stats group under the supervision of Chris Wallace (first supervisor) and Anna Petrunkina-Harrison (second supervisor). I primarily research and develop computational methods for clustering of noisy data sets and the statistical techniques to analyse these. My focus is in flow cytometry which generates large amounts of multidimensional cell phenotypes from which we seek to identify different types of cells based on the expression of extra and intra-cellular markers. Due to the plasticity of cells (they are frequently in transition from one cell type to another) and the poor signal-to-noise ratio of some of these markers (particularly intra-cellular ones), flow cytometric data is often very noisy and downstream analysis needs to allow for uncertainty, especially when dealing with rare cell populations (for example regulatory T cells).
My academic path started with an MEng in Computer Science from University College London (2003-2007) which was followed by an MSci in Bioinformatics from Imperial College (2008-2009) after working as a Python programmer at Gambit Research (2008-2009). Before starting my PhD I worked at the European Bioinformatics Institute as a tools and production engineer at Uniprot (2009-2011).
My background is in mathematics: I have a BA and MMath from the University of Cambridge. I am now enrolled on the Wellcome Trust PhD Programme in Mathematical Genomics and Medicine under the supervision of Chris Wallace (my first supervisor), Sylvia Richardson and John Todd.
I am currently working on methods to test genetic data for evidence of common causal variants between complex traits such as autoimmune diseases, in the case of a common control dataset. My algorithms are implemented in an R package, “colocCommonControl”, which can be found on github. In the future, I intend to work upon enrichment analysis of disease associated SNPs, and also the use of Bayesian networks to analyze biomarker data from adaptive clinical trials.