Overview

The OnCorr (Oncology Correlation) portal provides pan-cancer and tissue-specific mRNA~protein correlations from three public multi-omic cancer datasets.

Rationale

Gene expression is frequently used as a proxy for protein abundance in cancer. However, most gene expression measurements are only moderately correlated with protein abundances. Several factors can influence mRNA-protein correlations, including post-transcriptional regulation and measurement variability introduced by differences in sample processing, platform specific biases and data normalisation.

Here, we performed mRNA-protein correlations for gene-protein pairs measured in three large pan-cancer datasets:

Dataset Tissues Included # Proteins
ProCan-DepMapSanger Bladder, Bone, Breast, Central Nervous System, Cervix, Esophagus, Haematopoietic and Lymphoid, Head and Neck, Kidney, Large Intestine, Liver, Lung, Ovary, Pancreas, Peripheral Nervous System, Skin, Soft Tissue, Stomach, Thyroid 6,692
CPTAC Breast, Central Nervous System, Endometrium, Head and Neck, Large Intestine, Lung, Pancreas, Ovary 14,465
CCLE Bladder, Breast, Central Nervous System, Endometrium, Esophagus, Haematopoietic and Lymphoid, Kidney, Large Intestine, Liver, Lung, Pancreas, Skin, Stomach 10,437

Basic tutorial and panel overview

1. Search for your protein of interest

Use the dropdown menu to select a specific gene or protein. Proteins can be searched by using either HGNC-approved gene symbol.

Only a single protein can be searched at a time.

Users can also select their preferred focus dataset and tissue of interest using the dropdown menu.

By default, the homepage displays the mRNA-protein correlation for the EGFR in the ProCan-DepMapSanger dataset.

2. Navigating your results

For each search, the page will show several statistics and plots relating to the mRNA-protein correlation.

The top bar of the home page shows key metrics for the protein of interest.

Median Correlation: This shows the median mRNA-protein Spearman’s correlation of the protein of interest across all samples measured in the selected dataset. Only proteins that have measurements in at least 10 samples have their correlations shown.

Protein Reproducibility Rank: The protein reproducibility rank is shown for the protein of interest from Upadhya et al (2022). This publication describes the development of a reproducibility rank that quantifies the accuracy with which a protein has been measured across proteomic studies.

Median Correlation in Tissue: This shows the median mRNA-protein correlation of the protein of interest in the selected tissue. This value is calculated using Spearman’s correlation for tissues with at least 10 measurements of the protein of interest.

This plot shows the distribution of mRNA-protein correlations for the protein of interest across tissues. Each dataset is shown – ProCan-DepMap Sanger (pink solid line), CPTAC (blue solid line) and CCLE (green solid line) datasets. The background (solid grey line) indicates the mRNA-protein correlations from all gene-protein pairs across all tissues in the ProCan-DepMapSanger dataset. Only tissues with data from a > 10 samples are shown in the plot.

3. Saving your results All plots can be downloaded as .png files by clicking on the download icon on the top right side of each plo