Haplotype-aware analyses of RNA sequencing data using a pantranscriptome reference

Navn på bevillingshaver

Jonas Andreas Sibbesen

Beløb

DKK 1,181,935

År

2021

Bevillingstype

Reintegration Fellowships

Hvad?

One of the most important resources for analyzing human sequencing data is the reference genome. It serves as a crucial coordinate system for genomic variation and functional elements, such as genes. For instance, by comparing RNA sequencing (RNA-seq) reads to the sequence of the reference genome one can efficiently quantify the expression of annotated genes in a sample. The reference, however, only represents a single haplotype and thus does not capture the diversity of the human population. In this project I will explore the use of a reference for transcriptomic analyses that consists of many genomes instead of only one. More specifically, I will develop computational methods that use a pantranscriptome reference to reduce bias and improve analyses of RNA-seq data.

Hvorfor?

One of the major limitations of the human reference genome is that it only represents a single haplotype. Because of this, it is generally easier to analyse sequencing data from individuals whose genome is closer to the reference. This is called reference-bias and can lead to researchers missing important variants affecting gene expression. This bias is especially a problem in complex regions of the human genome, such as in and around segmental duplications or in highly polymorphic regions. Improving expression analyses in these regions is important as many of them contain genes associated with important functions including immune response and brain development, and some have been implicated in disease.

Hvordan?

A way to mitigate reference-bias is to use a reference that consists of many genomes instead of only one. This is known as a pangenome reference. While important work has already been made towards showing the tremendous usefulness of pangenomes for genomic analyses, less attention has been given to its use in transcriptomics. In this project I will develop computational methods that use a human pantranscriptome reference for haplotype-aware RNA-seq analysis. A pantranscriptome reference integrates both genomic and transcriptomic information as a set of haplotype-specific transcripts. The development will focus on using this reference structure to improve inference of transcript expression in duplicated genes using both short and long RNA-seq data.

Tilbage til oversigtssiden