What
Modern DNA sequencing technology makes it possible to accurately and affordably determine an individual's genome. However, predicting the effect (if any) of individual genomic variants on phenotypic traits (e.g. disease risk) remains difficult - especially if the variants are carried by few individuals.
In this project, I will develop a new statistical method for identifying variants that affect gene expression. The method is based on a new modelling approach that enables sharing of information across variants and hereby increases the power to detect rare-variant associations - I hope!
I will apply the model to large, publicly available datasets to identify new mutations that affect gene expression and leverage these signals to further our understanding of gene expression regulation.
Why
A key approach to studying biological systems in both basic and applied research is to search for associations between genomic variation and phenotypic traits. However, while the genomic revolution has provided the data foundation for performing such studies, it remains difficult to identify phenotypic associations with rare variants using current statistical methods. As any individual carries a large number of rare variants and as basic evolutionary theory tells us that such variants are more likely to have significant phenotypic effects, new statistical approaches are needed.
How
The development and application of the model will be done in an iterative fashion, where I - starting from a simple model - will iterate between modifying the model and applying it to a real dataset reserved for testing. I expect that I by continuously studying the model's performance on real data will be able to both guide the modelling process and determine when the model is accurate enough for large scale application to real datasets for identification of new variants. This approach will ensure that new biological results are obtained as early as possible in the project.
SSR
The genomics revolution has sparked an immense interest in using genomic data to inform diagnosis and treatment decisions in the hope that it will both improve treatment and reduce costs and hereby help counter the trend of increasing costs of healthcare. However, translating the millions of variants in a patient's genome into clinically relevant information remains a major bottleneck for realising this promise of "precision medicine". I expect that the model I will develop in this project will be a major aid in prioritising variants for clinical interpretation and hereby help facilitate the transition into the precision medicine era.