<img alt="" src="https://secure.intelligent-consortium.com/791519.png" style="display:none;">

What Is Mendelian Randomization, and Why Might You Need It?

DNA and protein diagrams

What Is Mendelian Randomization, and Why Might You Need It?

Share:

Improvements in omics technologies have allowed for the generation of massive genetics datasets. With these developments, researchers are able to identify associations between genetic variation and phenotypic differences. However, biases like confounding variables and SNVs that affect multiple pathways can lead to false conclusions.

In light of these challenges, Mendelian randomization is a valuable tool for ensuring association studies’ results can be reported with confidence. Mendelian randomization was developed by epidemiologists to use genetic variation to infer causality between modifiable exposures that influence different outcomes. It overcomes issues of confounding, reverse causation, and other types of bias that have been problematic in traditional observational studies.

In this article, we will explain the principles, assumptions, and benefits of Mendelian randomization. We will also discuss some great examples of MR being used to infer causality.

What Is Mendelian Randomization?

To understand Mendelian randomization, we have to understand the law of segregation as discovered by Gregor Mendel. In Mendel’s experiments on pea plants, he demonstrated that alleles (single copy of a gene) from each parent segregate randomly into gametes. The key word here is “randomly,” and it serves as the basis for Mendelian randomization. The figure below depicts how each parent contributes one allele to each child.

recessive Gene Inheritance

Alleles undergo random segregation when gametes are formed. Each allele of one parent segregates randomly into gametes, and half of the parent's gametes carry each allele. 

MR provides a statistical measure of the confidence we can have in the impact of the risk factor/exposure on the outcome. However, this impact on outcome must only be through the risk factor/exposure. This is a fundamental requirement that must be met for the genetic variant to be treated as an Instrumental Variable (IV). An IV is only related to the exposure itself, not to the outcome from the exposure that is being studied. In other words, the IV is a variable associated with the risk factor of interest.

Instrumental variables are usually Single Nucleotide Polymorphisms (SNPs) that are common across all human populations. IVs are often Single Nucleotide Polymorphisms (SNPs) because genetic variants often meet the criteria of IVs. Genetic variants are fixed at birth, so the outcome cannot alter them. MR can be used to investigate exposures that modify health risk. MR has become more popular with the generation of many large, publicly available genome-wide association studies (GWAS)

Multiple statistical tests are commonly used in MR studies, including:

  • Hausman test

  • Partial F statistic

  • R2

  • Two-sample MR

  • MR-Egger

Key Principles and Assumptions of Mendelian Randomization

MR can utilize single or multiple genetic variants as IVs. However, these analyses are most successful when they are supported by large sample sets and the variant(s) explain the majority of the associated risk.

It is essential that the variant(s) only influence the outcome through this single risk factor. If they are affecting the outcome via other biological pathways, they are not suitable as instrumental variables.

To use MR for causal inference, some key criteria must be met:

  • The genetic variant (G) must be strongly associated with the risk/exposure being studied.

  • The genetic variant must not share common causes/confounders (C, U) with the outcome (Y).

  • The genetic variant must impact the outcome (Y) solely via its effect on the exposure (X).

mendelian randomization diagram

Directed acyclic graph for Mendelian randomization analysis. The genetic variant (G) is associated with the exposure of interest (X); there are no confounders (C, U) of the association between genetic variant (G) and outcome (Y); and the genetic variant (G) does not affect the outcome (Y) except through its effect on the exposure (X).

Several key factors stress the importance of these assumptions and how violations might lead to bias in results:

  • If the genetic variant is not robustly associated with the risk/exposure, it is not going to provide reliable statistical evidence of causation.

  • If the variant is associated with potential confounders, it violates the definition of an instrumental variable.

  • If the variant can be affecting the outcome through pathways other than the risk/exposure, then it violates the definition of an instrumental variable and should not be utilized.



Benefits of Mendelian Randomization

There are many key benefits to MR. They include eliminating bias that can confound observational studies, gaining the ability to infer causation, and effectively gaining the benefits of a randomized control trial without the expense.

Mendelian randomization vs. control trial

This overview compares and contrasts the parallels between Mendelian randomization (MR) and randomized controlled trials (RCTs). In MR, randomization is due to the random allocation of alleles whereas in RCTs, the subjects are randomly assigned to a control (placebo) or treatment group.

This conceptualization was originally based on between-sibling variation, where allocation of alleles is random and not dependent on population-level variation. Inference from MR in this way relies on the assumption of gene–environment equivalence. A change in the exposure caused by genetic variation has the same effect on the outcome as a change in that exposure caused by environmental factors.

Impacts of Mendelian Randomization

With consortia like the UK Biobank, FinnGen, and the China Kadoorie Biobank that have amassed genetic and proteomic data, MR has become quite common in peer-reviewed literature impacting all aspects of health and disease. Combining GWAS and proteomics has been described as transformative for human medicine. A recent Cell Genomics publication using MR identified novel druggable targets for asthma and uncovered a TLR1-IL27 asthma axis.

Endless disease research areas have benefitted from MR. NCI has released a proteogenomic data set of >1,000 tumors to facilitate new discoveries in cancer detection and treatment. In cardiovascular disease (CVD), a leading global cause of human mortality, researchers investigated 79 loci for potential causality in CVD resulting in causal mechanisms for more than 25 trans-acting loci. Results of the SCALLOP Consortium data set have led to the identification of 283 pQTLs (Cis- and Trans-) putatively causative for behavioral traits and psychiatric disorders.

 


Mendelian Randomization is a robust tool to infer causation as long as the key assumptions are met. The combination of available large genomic and proteomic data sets has allowed researchers to take advantage of MR to generate a tremendous number of new discoveries that are truly revolutionizing human health and medicine.