1. Introduction
An individual’s DNA is the blueprint of that individual’s traits. Although the human genome project was completed in 2003[
1] multiple genomic projects to determine the genetic makeup of other organisms are still ongoing[
1]. To completely identify the order of nucleotide arrangement in any organism, nucleotide sequencing holds the key[
2]. Determining the complete genetic makeup of any organism will help us to identify single-nucleotide polymorphisms (SNPs)/genetic variations which can lead to the development of various diseases including cardiovascular, neurological, infectious, cancers, and autoimmune disorders[
3].
Mutations within both coding and non-coding regions can result in abnormal protein production, which can cause multiple pathophysiology [
4]. Throughout history, mutations are also a driving force of evolution. These mutations also result in the emergence of pathogenic species/strains [
5] and should be studied extensively for the betterment of the human race and environment [
6]. Comparing the genetic makeup of a healthy individual with a diseased one aids in the identification of various ailments, including genetic disorders [
7], can act as a diagnostic marker to determine the prognosis of any disease [
8], and can distinguish between numerous pathogenic microorganisms [
9]. Knowledge of the nucleotide arrangement of any given individual also allows for effective and enhanced personalized medical care [
10]. Earlier Sanger sequencing was the most popular method to decode any given DNA sequence [
11]. This, along with the Maxam-Gilbert method[
11] formed the basis of first generation sequencing [
11]. However, these traditional methods of DNA sequencing were complicated and cumbersome when it comes to the whole genome sequencing (WGS) of any given organism [
12]. Furthermore, low throughput, higher requirement of starting samples, time taken to analyze the results, and its inability to detect variability in the genetic pattern in parallel samples makes Sanger sequencing an expensive and unreliable methodology for sequencing studies[
13]. With the rise in pandemics, rare genetic disorders, climate change, and environmental pollution, it has now become necessary to invest in novel methods of nucleic acid sequencing for better accuracy and affordability. Because of the inability of traditional sequencing methods to effectively handle large-scale genome projects with scalability, accuracy and low throughput and turnaround being an issue, it is critical to develop new sequencing technologies for advancing genomic research and applications. High throughput sequencing (HTS) with the capabilities of massively parallel sequencing, speed, and accuracy has revolutionized the total genome sequencing project of any organism [
14]. Also, because its application can be further extended to clinical and translational aspects, agriculture, and environmental health, HTS thus is now becoming a benchmark in the new era of genomics and molecular biology research [
15].
High throughput sequencing includes next-generation sequencing methods, with a high demand of low-cost sequencing [
16]. It ensures high throughput with massively parallel sequencing by running millions of sequences simultaneously. High throughput sequencing provides a low-cost, effective DNA sequencing and its modification. Many such techniques have been developed, which have made a tremendous change in the field of genome sequencing. The ultimate goal of genome sequencing is to ensure preventive and personalized medicine in curing diseases[
17], to identify and differentiate pathogenic species/strains or superbugs, in the diagnosis of communicable and non-communicable diseases, and to create aviable environment and future [
18]. This review details the advent of the high-throughput sequencing, how its functions, limitations and its future application in multiple fields has the ability to transform our present into a better future, with the promise of superior healthcare, healthy lifestyle, and sustainable environment.
3. Applications
In the past few years, next generation sequencing has changed the paradigm of genome sequencing. It has become a critical tool in medical research, diagnostic, therapeutics, environmental, and evolutionary studies [
65]. It has been used worldwide by researchers to address a wide variety of biological problems. NGS was first employed to determine single nucleotide variations (SNVs), insertion and deletions called indels, and structural variations in DNA sequence [
65]. However, with the advancement in technology, NGS has come a long way and now being widely used in different fields.
In many medical complexities, gene therapies are opening up new avenues for treatment. Severe Combined Immune Deficiency (SCID) was the very first hereditary disorder for which gene therapy was found to be effective [
66]. As SCID is caused by mutation in genes, understanding these mutations and their subsequent pathology through NGS leads to the development of effective gene therapy for SCID [
67,
68].
Whole genome sequencing (WGS) is one of the comprehensive ways to examine the entire genome of an individual organism. Monitoring outbreaks of infectious diseases, recognizing genetic diseases, and describing mutations that lead to the development of cancer have all been made possible by genetic information obtained via whole genome sequencing [
69]. Advancing Real-Time Infection Control Network (ARTIC), an international collaboration involving various organizations like Wellcome Sanger Institute and various other researchers around the world, had developed a whole genome sequencing (WGS) approach for SARS-CoV-2 by using Oxford Nanopore sequencing technology [
70]. Immunoinformatic-based vaccination research against SARS-CoV-2 was sparked by viral genome sequencing, which resulted in the development of COVID-19 vaccines such as mRNA, viral vectors and protein subunit vaccines [
71].
Similarly, the 1000 Bull Genome Project was initiated to analyse the bovine genomes various cattle breeds worldwide for enhancing the productivity and controlling animal disease etc. Through HTS, this project has not only identified various genetic variants to improve milk productions and enhanced disease resistance capacity by selective breeding but also mutations which could be lethal to the livestock [
72]. Such is the success of this project that the investigators are planning to further extend the same analysis on other farm animals to enhance the livelihood of farmers [
73].
The Centre for Disease Control and Prevention (CDC) in the United States incorporates pathogen genomics into all of its infectious disease programs [
74]. To better understand pathogen genetics, pathogen genomics entails DNA/RNA sequencing of pathogens. This is critical in understanding the pathogen evolution, detecting epidemics, comprehending patterns of transmission, and designing targeted therapies and vaccines. This represents a major move towards a more accurate and successful approach in public health management and disease prevention techniques, which requires high throughput sequencing techniques [
75].
The introduction of next-generation sequencing techniques has rendered metagenomic sequencing a feasible method for use in clinical evaluations [
76]. Metagenomic high throughput sequencing (mHTS), detects and identifies various pathogenic or non-pathogenic micro-organisms of animal and plant origin [
77]. Within the leaf samples of grapevine plants, high-throughput sequencing has identified seven viruses and two viriods [
78]. Identification of grapevine pathogen, their interactions with host plant leads to the development of innovative control strategies and improved practices over time [
79]. This will further increase the fruit yield and will help in selective breeding of disease resistant crops. The ability of cold-resistant bacteria to breakdown petroleum products in the Arctic [
80] has been demonstrated using gene sequencing, which offers significant new details about their ability to adapt and the hydrocarbon degradation capabilities of these bacteria [
81]. This discovery is extremely critical in addressing major environmental problems arising from petroleum spillage, which is the foremost concern for marine flora and fauna [
82].
The Cancer Genome Atlas (TCGA), the publicly financed initiative and Genome India Project (GIP) make use of next-generation sequencing techniques and microarray based high-throughput methods to study genetic abnormalities in various types of cancer [
83]. TCGA has discovered many cancer driver genes that led to the development of tumours and the advancement of cancer [
84]. A groundbreaking program in cancer genomics has molecularly described more than 20,000 primary cancer cases and comparable normal specimens from 33 different cancer types [
85].Genome India Project (GIP) also seeks to generate personalized medicine by employing patient’s genome to predict and control illnesses. For instance, differences in individual genomes of South Asians and Africans could account for the fact that heart diseases are more prominent in south Asians while African populations are mainly affected from strokes [
86].
High-throughput accuracy with little turnover time of NGS had made this technology suitable for the advancement of precision medicines. Precision medicine involves the knowledge of individual genomic details to tailor treatment according to the individual needs. High-throughput sequencing creates datasets that will help in the development of drugs and patient stratification for better treatment [
87]. NGS has made oncological care more precise and personalized by using genetic data to customize treatment plans based on the unique features of each patient’s cancer. NGS is expected to become more and more important in the era of precision medicine as technology develops and sequencing prices come down [
88]. Trastuzumab, bevacizumab, cetuximab, and panitumumab are a few of the well-known precision medications developed with the help of NGS for metastatic colorectal cancer treatment [
89].
NGS has been of tremendous help in understanding our environment. The bacterium
Deinococcus radiodurans was the very first free-living organism to undergo whole genome sequencing [
90]. How this bacterium is highly resistant to DNA damage from a variety of ionizing rays, ultraviolet rays, oxidizing elements, and electrophilic mutagens was revealed through whole genome sequencing [
91]. It was revealed that this bacterium harbours two copies of DNA repair enzymes, which enables them to survive the hostile environment enriched with ionizing radiation [
92]. This finding has enabled us to consider using
Deinococcus radiodurans in the treatment of wastewater contaminated with radioactive uranium thus solving the major environmental concern generated from nuclear waste [
92].
Infection in healthcare is continuously rising. Overuse of antibiotics and their release into our environment results with rise in many multi-drug resistance strains also known as “Superbugs”. These new pathogens not only pose formidable health challenges against humans but can also cause life-threatening conditions in animals and livestock. Whole genome sequencing now has revealed genetic variations that can transform a bacterium into a superbug. Examples of such superbugs are
S. aureus [
93],
V. cholerae, C. difficile, K. pneumoniae, P. aeruginosa, P. mirabilis etc. [
94,
95]. NGS is now being extensively used to identify other superbugs so that proactive measures can be taken to stop their future outbreak [
94].
Instead of identifying variations in a limited number of genes, whole exome sequencing (WES) enables the detection of multiple variations in the protein-coding region of any gene. Whole Exome Sequencing (WES) through NGS is a promising method for identifying potential mutations that can cause diseases because most known disease-causing mutations occur in exons. Rare modifications within the KRT82 gene have been identified via exome sequencing in cases of the autoimmune disease Alopecia Areata [
96]. Whole exome sequencing was also used to identify variations in the SLC34A1 in Chinese Han Kids suffering from rare condition known as urolithiasis, a condition causing renal dysfunction[
97]. This approach has led to the early detection of this disease among various cohorts, which can ultimately aid us in the development of timely interventions [
98]. Next-generation sequencing techniques also help to investigate the structure of different RNAs which provides insights into the structure-function relationship of any RNA in various disorders [
99,
100]. NGS coupled with mutational profiling has been used to determine RNA structure of
T. brucei telomerase RNA causing trypanosomiasis [
101], SARS-CoV2 genomic RNA [
102,
103], Dengue virus [
104], Chikungunya virus [
105],
Plasmodium falciparum parasite [
106], rotavirus [
107], and various long non-coding RNA and miRNA giving rise to various disorders like miR-675 [
108], MALAT1 [
109] and HOTAIR [
110].
Cell identity, functioning, and behaviour are significantly influenced by various epigenetic regulators. Epigenetic regulators have the ability to regulate any cellular process and control gene expression without altering the gene sequence of any organisms [
111]. Nucleotide modifications are one of the epigenetic signatures which are ubiquitously present genome wide [
112]. They not only dictate the cellular process temporally but can also mask the immunogenic effect of host towards the invading pathogen thus making the latter stealthy [
112]. Identification of these epigenetic signatures will be pivotal in determining the underlying processes which drives the virulence and pathogenicity, and disease development [
113]. Targeted insertion of promoters (TIP-Seq)[
114], Serial analysis of gene expression (ChIP-SAGE)[
115], Paired-end ditag sequencing (ChIP-PET)[
116], and Next-generation sequencing (ChIP-Seq)[
117] are the best high-throughput sequencing tools to identify various epigenetic signatures present on DNA [
118]. Transposase-Accessible Chromatin (ATAC-seq) coupled with high-throughput sequencing technique is a popular way to identify accessible chromatin regions for gene expression [
119]. Recently, oxford nanopore sequencing has been used to identify modifications present on RNA [
120]. These modifications act as epigenetic regulator to control gene expression in many organisms including but not limited to humans [
121], bacteria [
122], and viruses [
122]. In fact, viral RNA modifications mimics the host RNA, thus masking them from the host immune response [
123].
Single-cell sequencing technique is the most advanced method for deciphering the variability and complexity of DNA and RNA transcripts inside a specific cell [
124]. This is employed to acquire information about gene expression or mutations spatially and temporally at the level of a single cell. In disease-like conditions, such as cancer, the DNA of each cancerous cell can be sequenced to reveal information about mutations or genetic variations altering normal physiology of each cell. So, transcriptome characterization at single-cell resolution in millions of individual cells has been rendered possible by these techniques [
125].
NGS is also employed for Cell
-free DNA (cfDNA) sequencing that involves analysing free, or non-cellular DNA from biological samples such as plasma, urine, and cerebrospinal fluid (CSF) in the laboratory [
126]. The main goal of this technique is to search for genomic variants/biomarkers linked to various genetic medical conditions in various biological fluids. With the help of high-throughput sequencing, Cell-free DNA (cfDNA) sequencing has been used to diagnose tumours, neurological and cardiovascular disorder etc [
127]. To provide efficient, targeted therapies, recognizing particular genetic modifications, determining prognosis, and tracking treatment efficacy over time, cfDNA sequencing is becoming a crucial tool [
128].
Figure 1.
Various high throughput sequencing methods. High throughput sequencing mainly divides into second, third and fourth generation. Illumina, Roche 454 and SOLiD sequencing comes under second generation sequencing methods. Third generation sequencing methods include PacBio, Helicos sequencing and Nanopore-based sequencing techniques. Massive Parallel Spatially Resolved Sequencing and Single Cell In Situ Transcriptomics come under the fourth generation high throughput sequencing technique.
Figure 1.
Various high throughput sequencing methods. High throughput sequencing mainly divides into second, third and fourth generation. Illumina, Roche 454 and SOLiD sequencing comes under second generation sequencing methods. Third generation sequencing methods include PacBio, Helicos sequencing and Nanopore-based sequencing techniques. Massive Parallel Spatially Resolved Sequencing and Single Cell In Situ Transcriptomics come under the fourth generation high throughput sequencing technique.
Figure 2.
Steps involved in Illumina sequencing. Library generation involves the fragmentation of genomic DNA and attachment of adaptors to both ends of. Bridge building and bridge amplification is done on the flow cells for cluster generation. Sequencing is initiated by the addition of fluorescently labeled nucleotides. Resultant reads were demultiplexed and are compared back to reference sequence.
Figure 2.
Steps involved in Illumina sequencing. Library generation involves the fragmentation of genomic DNA and attachment of adaptors to both ends of. Bridge building and bridge amplification is done on the flow cells for cluster generation. Sequencing is initiated by the addition of fluorescently labeled nucleotides. Resultant reads were demultiplexed and are compared back to reference sequence.
Figure 3.
Steps involved in Roche sequencing. Different adapters are attached to the DNA fragment where one adapter B is biotinylated. This library is further immobilized onto a streptavidin-coated bead. Immobilized libraries are subjected to emulsion PCR to produce multiple DNA copies followed by addition into PicoTitre plate for sequencing .
Figure 3.
Steps involved in Roche sequencing. Different adapters are attached to the DNA fragment where one adapter B is biotinylated. This library is further immobilized onto a streptavidin-coated bead. Immobilized libraries are subjected to emulsion PCR to produce multiple DNA copies followed by addition into PicoTitre plate for sequencing .
Figure 4.
Steps involved in SOLiD sequencing. Step.1 Multiple libraries are produced via emulsion PCR and immobilized and amplified onto a glass slide. Step.2 involved addition of fluorescently labeled 8-mer nucleotide single-stranded DNA containing phosphorothioate linkage between 5th and 6th nucleotide to initiate sequencing. Step.3 When the first two nucleotides form complementary base pairing with the template strand, phosphorothioate linkage is cleaved thus generating fluorescent emission.
Figure 4.
Steps involved in SOLiD sequencing. Step.1 Multiple libraries are produced via emulsion PCR and immobilized and amplified onto a glass slide. Step.2 involved addition of fluorescently labeled 8-mer nucleotide single-stranded DNA containing phosphorothioate linkage between 5th and 6th nucleotide to initiate sequencing. Step.3 When the first two nucleotides form complementary base pairing with the template strand, phosphorothioate linkage is cleaved thus generating fluorescent emission.
Figure 5.
Steps involved in Ion Torrent sequencing. 1. DNA in which nucleotide sequence is to be recognized is obtained. 2. Libraries are prepared by attaching adapters to both ends of DNA fragments. 3. Prepared genomic libraries are allowed to bind with bead. 4. Libraries loaded beads are then added to ion sensitive chip to initiate sequencing. 5. During nucleotide incorporation, release of H+ ions cause pH fluctuations. 6. Based on pH signals, specialized software produces a sequence of base calls.
Figure 5.
Steps involved in Ion Torrent sequencing. 1. DNA in which nucleotide sequence is to be recognized is obtained. 2. Libraries are prepared by attaching adapters to both ends of DNA fragments. 3. Prepared genomic libraries are allowed to bind with bead. 4. Libraries loaded beads are then added to ion sensitive chip to initiate sequencing. 5. During nucleotide incorporation, release of H+ ions cause pH fluctuations. 6. Based on pH signals, specialized software produces a sequence of base calls.
Figure 6.
Steps involved in PacBio Sequencing. DNA fragments are attached to particular adapters from both ends and form circular DNA. In a zero-mode waveguide (ZMW) chamber, the real-time incorporation of nucleotides is done by DNA polymerase. Different fluorescent signals produce raw data which is further processed to obtain a sequence of sample.
Figure 6.
Steps involved in PacBio Sequencing. DNA fragments are attached to particular adapters from both ends and form circular DNA. In a zero-mode waveguide (ZMW) chamber, the real-time incorporation of nucleotides is done by DNA polymerase. Different fluorescent signals produce raw data which is further processed to obtain a sequence of sample.
Figure 7.
Steps involved in Helicos sequencing. 1. Fragment generation of sample DNA. 2. Fragmented DNA is polyadenylated. 3. The template strand is allowed to attach to the flow cell with the help of Poly-T tail which is already attached to the flow cell. 4. Sequencing starts with the addition of fluorescent-labeled nucleotides. 5. Imaging data allows for the reconstruction of DNA sequences for sample template.
Figure 7.
Steps involved in Helicos sequencing. 1. Fragment generation of sample DNA. 2. Fragmented DNA is polyadenylated. 3. The template strand is allowed to attach to the flow cell with the help of Poly-T tail which is already attached to the flow cell. 4. Sequencing starts with the addition of fluorescent-labeled nucleotides. 5. Imaging data allows for the reconstruction of DNA sequences for sample template.
Figure 8.
Oxford nanopore sequencing technique. Genomic libraries prepared from targeted DNA are put into a nanopore sequencer. Ionic current flowing through the nanopore gets disrupted when genomics library is passed through it. Disturbances in electrical signals are recorded and base calling data is produced by specialized software.
Figure 8.
Oxford nanopore sequencing technique. Genomic libraries prepared from targeted DNA are put into a nanopore sequencer. Ionic current flowing through the nanopore gets disrupted when genomics library is passed through it. Disturbances in electrical signals are recorded and base calling data is produced by specialized software.
Figure 9.
Top and side views of different nanopores. a) Heptameric α-hemolysin toxin from Staphylococcus aureus b) Octameric MspA porin from mycobacterium smegmatis c) Dodecamer connector channel from bacteriophage phi29 packaging motor.
Figure 9.
Top and side views of different nanopores. a) Heptameric α-hemolysin toxin from Staphylococcus aureus b) Octameric MspA porin from mycobacterium smegmatis c) Dodecamer connector channel from bacteriophage phi29 packaging motor.
Figure 10.
Applications of high throughput sequencing. In future high throughput sequencing has multiple applications in various fields such as in environment risk management, diagnostics or biomarkers, precision therapeutics, forensics, virus screening, evolutionary genomics etc.
Figure 10.
Applications of high throughput sequencing. In future high throughput sequencing has multiple applications in various fields such as in environment risk management, diagnostics or biomarkers, precision therapeutics, forensics, virus screening, evolutionary genomics etc.