Learn

GWAS Basics: Applications, Methods, and Future Uses

Written by Psomagen | Apr 9, 2024 9:30:51 PM

What Is Population Genomics? 

Population genomics is a large-scale approach to genomic research, in which populations of individuals are studied. In many cases, the goal of population genomics is to create a genetic atlas for future research. Resources like the NHGRI-EBI Catalog or the Genome Aggregation Database are examples of population genomics data. 

Often, population genomics projects are conducted without knowing the specific research they will be applied to. Many individuals' genomes are sequenced to create a reference genome for future studies. Population genomics data are important tools for genome-wide association studies (GWAS).

GWAS compare “healthy” control genomes to the genomes of individuals with a specific disease. This enables researchers to identify genetic variants associated with that disease. By linking genetic variations with observable traits, researchers can develop targeted treatments and better understand disease pathogenesis.

What Can GWAS Detect?

When GWAS data is studied, researchers can pinpoint typical signatures of their targeted disease that are not present in the “healthy” genome. This can include:

  • Single Nucleotide Variants (SNVs). SNVs are changes to one base pair. When an SNV is present in more than 1% of the population, it is called a single nucleotide polymorphism (SNP). SNVs are implicated in inflammatory diseases like Celiac disease, atherosclerosis, ulcerative colitis, and others.


Uses for GWAS Data

Population genomics is an important tool in disease research. By identifying genetic mutations in diseased populations that are not present in healthy populations, researchers identify biomarkers of disease and potential therapeutic targets. Linking variant to function is a critical goal for GWAS projects. 

With large-scale sequencing technologies, it is becoming possible to identify the genetic markers of diseases with multiple causal factors. In Alzheimer’s disease, many genetic indicators add up to a person’s likelihood of developing the disease. In 2019, there were 40 known susceptibility loci associated with Alzheimer’s. A 2020 study with a larger sample size identified seven novel loci, with additional insights into how microglia and immune cells are implicated in disease development.

With larger reference datasets and studies with larger sample sizes, researchers are able to map the causes of complex diseases. 

Methods for Population Genomics

Many omics technologies have been applied to GWAS. As sequencing becomes more affordable and spans additional areas of the central dogma, population-level studies become more accessible and valuable. Common technologies used in GWAS include: 

  • Microarray Analysis. Microarray is a multiplex genomics approach that targets and quantifies specific genes. By hybridizing the nucleic acid to a chip, microarray analysis provides an output of genotyping data on a genome-wide scale. This is a cost-effective process for detecting common disease-associated SNPs, and has been in use for many years. In a 2003 study out of Virginia Commonwealth University, for example, microarray was used to identify SNPs linked to loss of heterozygosity in prostate tumors. 

  • Whole Genome Sequencing. WGS can detect both known and unknown SNPs. This includes common, low-frequency, and rare variants. WGS isn’t just for human health — GWAS projects on cattle, pigs, and even tilapia have provided insights on animal health, physical traits, and reproductive qualities. 

  • Low-Pass Whole Genome Sequencing. Low-pass WGS explores the whole genome at a reduced depth and with simplified library preparation. This technique provides a high-level genome screen. It is useful for high-throughput population screenings and polygenic risk score calculations. 

  • Long-Read Sequencing. Long reads make it possible to map difficult genome regions, like repetitive regions too long for traditional short-read technologies. With PacBio long-read technology, it is now possible to produce individual DNA or RNA reads up to 20,000 bases long. Researchers can reliably map large structural variants. 

  • Transcriptome-Wide Association Studies (TWAS). Until recently, genome reference datasets were the main focus of population genomics. However, a new wealth of transcriptomic reference datasets (and even proteomic reference datasets) make it possible to conduct population-level studies of the transcriptome or proteome. When this data is partnered with GWAS data, researchers are able to strengthen their assumptions associated with gene-disease links.

Diversity in Genomics

In 2019, nearly 80% of GWAS data came from participants of European ancestry. The vast majority of that 80% comes from only three countries — the United States, the United Kingdom, and Iceland. This leads to an unequal clinical treatment environment, where the use of GWAS data for precision medicine is more accurate for this racial group than it is for others. 

The disparity becomes clear when a diverse set of genomes are used for GWAS projects. A 2021 GWAS lipid study used sequencing data from roughly 1.65 million individuals. This included 350,000 individuals of non-European ancestry. 

This study made an argument for increased diversity in population genomics. The group’s results uncovered significant loci not identified in European subjects. This included 15 loci unique to admixed African or African participants, 6 loci to East Asian participants, 6 to Hispanic participants, and 1 to South Asian participants. 

The limited availability of genetic data in some populations has a negative impact on personalized medicine and the development of genetic tests. Efforts like the Genome Asia 100k Project or Dr. Taras Oleksyk’s work in Ukraine (who we profiled in this recent blog post) are key examples of studies working to close this gap in population genomics. 

GWAS is an important tool for the future of biomedical research and personalized medicine. Unraveling genetic associations with diseases and phenotype provides valuable information for drug development and diagnostics. With improved technology and growing datasets, population-level data will be accessible and valuable in many disciplines of research.