Profile Explorer for Roadmap Epigenomics data is a web application to browse some results obtain from the Roadmap Epigenomics consortium. We are not affiliated to the Roadmap Epigenomics consortium, and this analyse is not endorsed by Roadmap Epigenomics.
Many resources allow exploration of Roadmap Epigenomics results one region at a time, such as the genome browser available through the Roadmap Epigenomics data portal. Through this app we propose two other complementary analyses:
Scripts corresponding to the steps described bellow can be found in this repository: github.com/gdevailly/perepigenomicsAnalysis
For WGBS, we downloaded bigwig files of fractional methylation and read coverage from the Roadmap Epigenomics data portal. Histone and DNAse1 data were downloaded as consolidated, not subsampled, tagAlign files from the data portal. Gencode human annotation version 29 (main annotation file), were downloaded from their website as gff3 files. Reads from RNA-seq data were etrieved from the European Nucleotide Archive using this table as a reference.
The PEREpigenomicsis app is divided in 5 tabs:
The 'Explore' tab allows the browsing and downloading of many thousands plots linking epigenetic marks and gene expression features. The first option, '1- Order by:' allows users to browse the dataset selecting the epigenetic assays before or after selecting the cell type of interest. It does not change the list of available plots. The second option, '2- Focus on:' allow the user to focus on 3 different gene features, corresponding to 4 different orderings. 'Transcription Start Sites (by gene TPM)' will center the plots on gene starts, and sort genes from high expression on top to no expression on the bottom. 'Transcription Termination Sites (by gene TPM)' will center plots on gene ends, and sort genes from high expression on top to no expression on the bottom. 'Middle exons (by exon expression)' will center plots on the starts of protein coding genes' middle exons (neither first nor last exons), with highly expressed exons on top and unexpressed exons on the bottom. 'Middle exons (by inclusion ratio)' will center plots on the starts of protein coding genes' middle exons, with included exons on top and excluded exons on the bottom. The third option, '3- Choose an assay' allows the selection of the epigenetic mark of interest. The fourth option, '4- Choose a cell type' allows the selection of the cell type of interest. Option 1 will swap the order between option 3 and option 4. Option 4 will show only the data available according to the choice made at option 3. If the feature selected in option 2 is either 'Transcription Start Sites' or 'Transcription Termination Sites', the fifth option, '5- Choose a gene category' will restrict the plot to only genes belonging to a defined category: main Gencode gene types short (<1kb), long (>3kb) and intermediate size genes.
The plot can be read as followed:
The 'Compare' tab allows to view two different plots at the same time, a surpisingly powerfull tool to explore this dataset. The 'Plot 1' panel control the plot on the left of the screen, the 'Plot 2' pannel controls the plot on the right of the screen. This tab is more enjoyable viewed on a large screen. Note that panel controls in this tab can be moved by holding the mouse button.
While the 'Explore' and 'Compare' tabs display all the genes within one cell type, the 'Correlate' tabs take the complementary perspective of comparing the same gene accross cell types. The 'Correlate' tab is subdivided in two sub-tabs: The 'Gene by Gene' sub-tab will display scatter plots of the gene expression levels (respectively exon expression levels or exons inclusion ratio) vs the amount of epigenetic marks at TSS, TTS or middle exons starts. The first option, '1- Summerise mark at:' controls the window of epigenetic marks summerisation: TSS (±500bp), TTS (±500bp), middle exons (±100bp). Middle exons can be sorted by expression level or inclusion ratio. The second option allows the choice of the epigenetic marks of interest. In the third option, '3- Search for a gene', users are requested to search for their gene of interest, either through there HUGO gene symbol, or through there ensembl number. A fourth option will appears with the different genes matching the user search terms. Exons are search as for the genes, with the exact exon then identified by genomic coordinates (in hg19). Once the gene/exon is selected, two interactive scatter plots will appears, one for the epigenetic mark of interest, the second for the matching control assay (WGBS coverage for WGBS data, Inputs for DNAse1 and ChIP-seq assays). Linear regression statistics for the epigenetic marks of interest will be displayed bellow the plot. Pop-up informations will display the cell code corresponding to cell types as defined by the Roadmap Epigenomics consortium. The 'Accross all genes' tabs will display the distribution of slopes according to the linear regression coefficient R2 for all the genes. The first option, '1- Summerise mark at:' controls the window of epigenetic marks summerisation: TSS (±500bp), TTS (±500bp), middle exons (±100bp). Middle exons can be sorted by expression level or inclusion ratio. The second option allows the choice of the epigenetic marks of interest. The third options allows to restrict the analyses to groups of genes belonging to a defined category: main Gencode gene types short (<1kb), long (>3kb) and intermediate size genes. This options is not present for middle exons, as all middle exons are here from protein coding genes.
This table, which is fully searchable and downloadable, lists the available profiles in tabulations 'Explore' and 'Compare'.