<img alt="" src="https://secure.intelligent-consortium.com/791519.png" style="display:none;">

Introduction to Proteogenomics

Person in laboratory lifting a test tube

Introduction to Proteogenomics

Share:

Proteogenomics combines genomics, transcriptomics, and proteomics to better understand each of these areas.

  • Genomics is the study of the DNA sequence of an organism. Genomics alone can’t tell us that the DNA in question codes RNA.

  • Transcriptomics is the study of RNA transcripts, including mRNA and non-coding RNAs. Transcriptomics can’t tell us whether the mRNA is translated to a functional protein.

  • Proteomics examines the protein content of a cell. This includes small peptides, post-translation modifications, protein stability and degradation, and protein–protein interactions. By combining the study of all three disciplines, scientists can get a much deeper understanding of each constituent and the impacts of their interplay on their cell and their environment.

Proteogenomics can take multiple forms. For example, it can be used to confirm that the information discovered using NGS of DNA or RNA is correct. But combining NGS data and proteomics data can also be used to uncover relationships between the genome and the proteome. Pairing nucleic acid sequencing with proteomics also provides insights on protein-protein interactions. Proteogenomics is a powerful new approach to understanding mechanisms of disease and uncovering novel drug targets for diagnosis and treatment.

Methods Used in Proteogenomics

GWAS

Genomic and transcriptomic data are commonly acquired by some kind of next generation sequencing (NGS). The advent of NGS has enabled the rapid sequencing of thousands of genomes within a relatively short time (months). This has made it possible to conduct genome-wide association studies (GWAS).

GWAS studies screen thousands of genetic variants across genomes from individuals with and without a specific phenotype, often a disease. This data is then used to identify potential relationships between specific variants and the disease. Scientists compare the genomes of the groups with and without a disease to identify potentially causative differences. As of 2021, more than 5,700 GWAS had been conducted and led to insights such as novel drug targets for Crohn’s disease and PTPN22 as a marker for autoimmune disease risk.

That said, GWAS has its limitations. There are many phenotypes caused by a multitude of variants, each one only making a small contribution. In these cases, GWAS results often offer no clear direction in terms of health outcomes. In addition, a change in the DNA is only relevant if it leads to some change in the downstream moiety that is conducting the work, e.g. the RNA or the protein.

Nonetheless, GWAS is a powerful technique that continues to be widely used. Psomagen is currently conducting WGS for the Global Parkinson’s Genetics Program (GP2) as part of a GWAS study to facilitate the development and deployment of therapeutics for Parkinson’s Disease.

TWAS

NGS has also given us the power to sequence the sum of all the RNAs from an individual sample (bulk RNA-seq), and even the individual RNAs from a single cell (scRNA-seq). In GWAS, RNAseq is often used to confirm the relationship identified between the gene and phenotype.

Transcriptome-wide association studies (TWAS) are also an independent tool that typically provides higher gene resolution than GWAS. This isn’t a perfect solution, as it’s necessary to account for gene regulation and epigenetic factors.

Proteomics

Proteomics is the study of the entire proteome of an organism or cell. In a sense, proteomics represents the study of the molecules that are doing the vast majority of the work. After all, proteins are the output from gene to RNA to protein.

Proteomics was originally limited to two-dimensional gel electrophoresis, but many new methods and improvements in existing methods have led to a wealth of proteomic data becoming available.

Immunoassay

One example of an immunoassay is the at-home Covid-19 test. Immunoassays use antibodies to detect antigens by binding to a small region of the target protein (epitope) and eliciting a signal. In the case of some kinds of Covid tests, a colored line appears. But immunoassay outputs can range from colorimetric to radioactive to qPCR and NGS.

For example, Infinity Biosciences offers an immunoassay that consists of a panel of antigens (whole human proteome) that’s used to screen a subject’s blood for reactive antibodies using an NGS readout. In contrast, Olink proteomics assays use a panel of antibodies to screen biofluids to detect and quantify the proteins that are present. Of course, immunoassays can only identify proteins if the antibodies against them are available. With antibody-based methods, the entire proteome is currently out of reach.

Mass spectrometry

In contrast, mass spectrometry (MS) can be used to analyze the complete proteome in a sample. MS separates ions by the mass-to-charge ratios (m/z). There are many different types of MS used in proteomics analysis, and the following is a general description of shotgun proteomics.

Targeted MS has long been used to identify post-translation modifications such as glycosylation or phosphorylation in purified proteins. In proteomics, samples containing protein are digested to peptides, and that mixture of fragments is analyzed by MS to generate a highly specific profile. This profile can be compared against the information in known databases.

Although MS is a great technology, it's not always necessary or affordable to screen the entire proteome to make a significant scientific or clinical impact.

Overview of Proteogenomics

Proteogenomics applications

Imagine conducting a genome wide association study (GWAS) in hopes of identifying novel genetic variants that contribute to the onset of a disease. If the gene’s encoded protein is shown to play a causal role in the disease, you have identified a new drug target to prevent or treat said disease. The dataset consists of both healthy and affected individuals, and you are able to identify a set of genetic variants that are significantly more common in the affected population than in the healthy subjects.

The next step is to demonstrate a link between the protein’s levels and the disease phenotype. These associations have to meet certain stringent criteria (e.g. no evidence of pleiotropy). But when these are met, the new drug targets are much more likely to succeed in clinical trials. See Figure 1 to view the steps in Mendelian randomization that led to the discovery of 64 novel proteins with causal roles in diseases such as schizophrenia and cardiovascular disease.

Proteogenomics Figure 1

Figure 1. Mendelian randomization process from a study that led to the discovery of 64 novel proteins with causal roles in many diseases.


Proteogenomics has proven useful in disease research, like this oncology study. Protein coding sequences are associated with cancer development. However, because of cellular control mechanisms, we cannot be sure that a DNA sequence will lead to expressed proteins. This team of researchers proposed a proteogenomic approach to obtain protein-level evidence of cancer-causing genomic variants. In the next installment of this series, we will take a deep dive into proteogenomic discoveries in oncology.