The OnCorr (Oncology Correlation) portal provides pan-cancer and tissue-specific mRNA~protein correlations from three public multi-omic cancer datasets.
Gene expression is frequently used as a proxy for protein abundance in cancer. However, most gene expression measurements are only moderately correlated with protein abundances. Several factors can influence mRNA-protein correlations, including post-transcriptional regulation and measurement variability introduced by differences in sample processing, platform specific biases and data normalisation.
Here, we performed mRNA-protein correlations for gene-protein pairs measured in three large pan-cancer datasets:
| Dataset | Tissues Included | # Proteins |
|---|---|---|
| ProCan-DepMapSanger | Bladder, Bone, Breast, Central Nervous System, Cervix, Esophagus, Haematopoietic and Lymphoid, Head and Neck, Kidney, Large Intestine, Liver, Lung, Ovary, Pancreas, Peripheral Nervous System, Skin, Soft Tissue, Stomach, Thyroid | 6,692 |
| CPTAC | Breast, Central Nervous System, Endometrium, Head and Neck, Large Intestine, Lung, Pancreas, Ovary | 14,465 |
| CCLE | Bladder, Breast, Central Nervous System, Endometrium, Esophagus, Haematopoietic and Lymphoid, Kidney, Large Intestine, Liver, Lung, Pancreas, Skin, Stomach | 10,437 |
1. Search for your protein of interest
Use the dropdown menu to select a specific gene or protein. Proteins can be searched by using either HGNC-approved gene symbol.
Only a single protein can be searched at a time.
Users can also select their preferred focus dataset and tissue of interest using the dropdown menu.
By default, the homepage displays the mRNA-protein correlation for the EGFR in the ProCan-DepMapSanger dataset.
2. Navigating your results
For each search, the page will show several statistics and plots relating to the mRNA-protein correlation.
The top bar of the home page shows key metrics for the protein of interest.
Median Correlation: This shows the median mRNA-protein Spearman’s correlation of the protein of interest across all samples measured in the selected dataset. Only proteins that have measurements in at least 10 samples have their correlations shown.
Protein Reproducibility Rank: The protein reproducibility rank is shown for the protein of interest from Upadhya et al (2022). This publication describes the development of a reproducibility rank that quantifies the accuracy with which a protein has been measured across proteomic studies.
Median Correlation in Tissue: This shows the median mRNA-protein correlation of the protein of interest in the selected tissue. This value is calculated using Spearman’s correlation for tissues with at least 10 measurements of the protein of interest.
This plot shows the distribution of mRNA-protein correlations for the protein of interest across tissues. Each dataset is shown – ProCan-DepMap Sanger (pink solid line), CPTAC (blue solid line) and CCLE (green solid line) datasets. The background (solid grey line) indicates the mRNA-protein correlations from all gene-protein pairs across all tissues in the ProCan-DepMapSanger dataset. Only tissues with data from a > 10 samples are shown in the plot.
Heatmap (top right plot) This plot shows the mRNA-protein correlation in each tissue and dataset for the protein of interest. Grey denotes mRNA-protein correlations missing for that tissue type either due to small sample size ( <= 10 samples) or because that tissue or gene-protein pair was not measured in the dataset of interest.
Correlation between RNA and protein (bottom right plot) This plot shows the correlation between RNA expression and protein abundance of the protein of interest for each sample in the selected dataset. The plot reports the Spearman’s rank coefficient (r).
Correlation and Protein Reproducibility Rank (bottom middle plot) This plot shows the correlation between the mRNA-protein correlation (x-axis) of all gene-protein pairs in each dataset and the protein reproducibility rank (y-axis) derived from Upadhya et al (2022). The red triangle indicates the protein of interest. Proteins translated from genes with a high protein reproducibility rank, but low mRNA-protein correlation may be unreliable targets when identified from gene expression alone within a precision oncology pipeline.
Correlation between RNA and protein – tissue: This plot shows the correlation between RNA expression and protein abundance of the protein of interest for each sample in the selected tissue for a dataset. The plot also reports the Spearman rank coefficient (r) and p-value (p).
3. Saving your results All plots can be downloaded as .png files by clicking on the download icon on the top right side of each plo