Advantages over Short-Read Sequencing in Biomedical Research
Long-read sequencing has enabled in-depth exploration of the human genome, with wide implications for the future of health and disease. In this blog post, we discuss the advantages of long-read sequencing over short-read sequencing in biomedical research. With a more comprehensive view of the genome, long-read sequencing is making major strides in human biology.
Long Read vs. Short Read Sequencing
Short read sequencing typically breaks the genome into small segments prior to sequencing. At most, these DNA fragments are a few hundred bases long. Those short sequences are then aligned to their place in the larger genetic sequence.
By contrast, long-read sequencing technology can sequence hundreds of thousands of bases at one time. Long reads have traditionally been less accurate than short-read sequencing. However, new technologies are making long reads a more accurate, affordable option for researchers.
Resolving Complex Genomic Regions
Short-read sequencing struggles with accurately mapping and assembling repetitive and structurally complex genomic regions due to the limited read length. In contrast, long-read sequencing offers extended read lengths. This enables researchers to resolve these challenging regions with higher accuracy.
This advantage is critical in studying disease-associated regions rich in repeats, such as tandem repeats and segmental duplications. Accuracy in these regions provides a better understanding of their functional implications in health and disease.
Characterizing Structural Variations
Structural variations (SVs), including insertions, deletions, inversions, and translocations, are essential contributors to genetic diversity and disease. While short-read sequencing provides high coverage and accuracy across regions of low complexity, it often fails to accurately capture and characterize large SVs.
Long-read sequencing, with its ability to generate much longer reads, offers a superior approach for detecting and characterizing SVs. This enables a comprehensive analysis of the genomic structural landscape. Researchers can uncover novel disease-associated SVs and understand their impact on phenotype and disease progression.
Using PacBio long-read technology, Australian researchers successfully allele typed the CYP2D6 gene. CYP2D6 is responsible for metabolizing many pharmaceuticals. Before long read sequencing, variants made it difficult to type this gene correctly. However, long-read sequencing successfully navigated CNVs, SNPs, InDels, SVs, and gene fusions.
This study sorted individuals into four categories, based on their CYP2D6 allele and their ability or inability to metabolize drugs. A lack of screening preceding opiate ingestion has even led to multiple deaths due to the "hypermetabolizer" phenotype. By screening patients prior to prescribing medications, drugs can be safer and more effective.
Unraveling Transcriptomic Complexity
Short-read sequencing cannot always characterize transcriptomic complexity, including alternative splicing events, isoform diversity, and fusion transcripts.
Long-read sequencing provides a more complete picture of the transcriptome by capturing full-length transcripts. This allows for the identification of alternative splicing events and isoform variations. This comprehensive analysis helps in understanding gene regulation, deciphering disease mechanisms, and identifying potential therapeutic targets.
Decoding Repetitive Elements
Repetitive elements are a significant portion of the human genome. These regions play crucial roles in gene regulation, genome stability, and disease development. Short-read sequencing struggles to accurately map reads to repetitive regions. This leads to fragmented genome assemblies and incomplete understanding of their functional relevance.
Long-read sequencing can span repetitive elements. Researchers can use long reads to obtain more complete and accurate reconstructions of repetitive regions. This breakthrough empowers the comprehensive analysis of repetitive elements. It sheds light on their role in genome evolution, disease etiology, and phenotypic variations.
In 2018, researchers used nanopore technology to develop an ultra-long read generation protocol. Using this technology, they successfully created a reference genome for a GM12878 cell line. This group successfully detected large structural variants and genetic modifications.
This long-read data covered over 85% of the genome, prior to adding additional short-read data. Data collected from this study successfully closed gaps in the existing reference genome. These results point to Oxford Nanopore or similar portable technologies being useful in future point-of-care settings.
Studying Epigenetic Modifications
Epigenetic modifications, such as DNA methylation and histone modifications, are key regulators of gene expression. They play a vital role in development, disease, and cellular processes. Short-read sequencing provides insights into epigenetic modifications at specific loci, but it lacks the ability to read modified bases directly.
Long-read sequencing offers an advantage by enabling the direct detection and mapping of epigenetic modifications. This allows for a better understanding of the regulatory mechanisms underlying disease processes and the identification of potential epigenetic biomarkers.
Long-read sequencing is revolutionizing biomedical research. It has already made major advancements in resolving complex genomic regions, characterizing structural variations, deciphering repetitive elements, unraveling transcriptomic diversity, and studying epigenetic modifications.
This technology has significantly advanced our understanding of human biology, disease mechanisms, and precision medicine. As long-read sequencing technologies continue to evolve and become more accessible, their impact on biomedical research is poised to grow. These advancements will pave the way for groundbreaking discoveries and improved patient care.
Psomagen has recently added long-read sequencing platforms to their labs. The PacBio Revio offers enhanced resolution and accuracy for long-read projects.