Open main menu

Biolecture.org β

Changes

Sequencing Methods

45,041 bytes added, 17:35, 1 December 2018
Created page with "<h1>Sanger sequencing</h1> <p>From Wikipedia, the free encyclopedia</p> <p><a href="https://en.wikipedia.org/wiki/File:Sanger-sequencing.svg"><img alt="" src="https://upload.wi..."
<h1>Sanger sequencing</h1>

<p>From Wikipedia, the free encyclopedia</p>

<p><a href="https://en.wikipedia.org/wiki/File:Sanger-sequencing.svg"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b2/Sanger-sequencing.svg/390px-Sanger-sequencing.svg.png" style="height:268px; width:390px" /></a></p>

<p><strong>Sanger sequencing</strong>&nbsp;is a method of&nbsp;<a href="https://en.wikipedia.org/wiki/DNA_sequencing" title="DNA sequencing">DNA sequencing</a>&nbsp;first commercialized by&nbsp;<a href="https://en.wikipedia.org/wiki/Applied_Biosystems" title="Applied Biosystems">Applied Biosystems</a>, based on the selective incorporation of chain-terminating&nbsp;<a href="https://en.wikipedia.org/wiki/Dideoxynucleotide" title="Dideoxynucleotide">dideoxynucleotides</a>&nbsp;by&nbsp;<a href="https://en.wikipedia.org/wiki/DNA_polymerase" title="DNA polymerase">DNA polymerase</a>&nbsp;during&nbsp;<a href="https://en.wikipedia.org/wiki/In_vitro" title="In vitro">in vitro</a>&nbsp;<a href="https://en.wikipedia.org/wiki/DNA_replication" title="DNA replication">DNA replication</a>.<sup><a href="https://en.wikipedia.org/wiki/Sanger_sequencing#cite_note-Sanger75-1">[1]</a></sup><sup><a href="https://en.wikipedia.org/wiki/Sanger_sequencing#cite_note-Sanger1977-2">[2]</a></sup>&nbsp;Developed by&nbsp;<a href="https://en.wikipedia.org/wiki/Frederick_Sanger" title="Frederick Sanger">Frederick Sanger</a>&nbsp;and colleagues in 1977, it was the most widely used sequencing method for approximately 40 years. More recently, higher volume Sanger sequencing has been supplanted by&nbsp;<a href="https://en.wikipedia.org/wiki/Dna_sequencing#Next-generation_methods" title="Dna sequencing">&quot;Next-Gen&quot;</a>&nbsp;sequencing methods, especially for large-scale, automated&nbsp;<a href="https://en.wikipedia.org/wiki/Genome" title="Genome">genome</a>&nbsp;analyses. However, the Sanger method remains in wide use, for smaller-scale projects, validation of Next-Gen results and for obtaining especially long contiguous DNA sequence reads (&gt; 500&nbsp;<a href="https://en.wikipedia.org/wiki/Nucleotide" title="Nucleotide">nucleotides</a>).</p>

<p>&nbsp;</p>

<h2>Method</h2>

<p>The classical chain-termination method requires a single-stranded DNA template, a DNA&nbsp;<a href="https://en.wikipedia.org/wiki/Primer_(molecular_biology)" title="Primer (molecular biology)">primer</a>, a&nbsp;<a href="https://en.wikipedia.org/wiki/DNA_polymerase" title="DNA polymerase">DNA polymerase</a>, normal deoxynucleosidetriphosphates (dNTPs), and modified di-deoxynucleotidetriphosphates (ddNTPs), the latter of which terminate DNA strand elongation. These chain-terminating nucleotides lack a 3&#39;-<a href="https://en.wikipedia.org/wiki/Hydroxyl" title="Hydroxyl">OH</a>&nbsp;group required for the formation of a&nbsp;<a href="https://en.wikipedia.org/wiki/Phosphodiester_bond" title="Phosphodiester bond">phosphodiester bond</a>&nbsp;between two nucleotides, causing DNA polymerase to cease extension of DNA when a modified ddNTP is incorporated. The ddNTPs may be radioactively or&nbsp;<a href="https://en.wikipedia.org/wiki/Fluorescence" title="Fluorescence">fluorescently</a>&nbsp;labelled for detection in automated sequencing machines.</p>

<p>The DNA sample is divided into four separate sequencing reactions, containing all four of the standard&nbsp;<a href="https://en.wikipedia.org/wiki/Deoxynucleotides-triphosphate" title="Deoxynucleotides-triphosphate">deoxynucleotides</a>&nbsp;(dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction is added only one of the four&nbsp;<a href="https://en.wikipedia.org/wiki/Dideoxynucleotides" title="Dideoxynucleotides">dideoxynucleotides</a>&nbsp;(ddATP, ddGTP, ddCTP, or ddTTP), while the other added nucleotides are ordinary ones. The dideoxynucleotide concentration should be approximately 100-fold lower than that of the corresponding deoxynucleotide (e.g. 0.005mM ddTTP&nbsp;: 0.5mM dTTP) to allow enough fragments to be produced while still transcribing the complete sequence.<sup><a href="https://en.wikipedia.org/wiki/Sanger_sequencing#cite_note-Sanger1977-2">[2]</a></sup>&nbsp;Putting it in a more sensible order, four separate reactions are needed in this process to test all four ddNTPs. Following rounds of template DNA extension from the bound primer, the resulting DNA fragments are heat&nbsp;<a href="https://en.wikipedia.org/wiki/DNA_denaturation" title="DNA denaturation">denatured</a>&nbsp;and separated by size using&nbsp;<a href="https://en.wikipedia.org/wiki/Gel_electrophoresis" title="Gel electrophoresis">gel electrophoresis</a>. In the original publication of 1977,<sup><a href="https://en.wikipedia.org/wiki/Sanger_sequencing#cite_note-Sanger1977-2">[2]</a></sup>&nbsp;the formation of base-paired loops of ssDNA was a cause of serious difficulty in resolving bands at some locations. This is frequently performed using a denaturing&nbsp;<a href="https://en.wikipedia.org/wiki/Polyacrylamide_gel" title="Polyacrylamide gel">polyacrylamide</a>-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C). The DNA bands may then be visualized by&nbsp;<a href="https://en.wikipedia.org/wiki/Autoradiography" title="Autoradiography">autoradiography</a>&nbsp;or UV light and the DNA sequence can be directly read off the&nbsp;<a href="https://en.wikipedia.org/wiki/Radiography" title="Radiography">X-ray film</a>&nbsp;or gel image.</p>

<p><a href="https://en.wikipedia.org/wiki/File:Sequencing.jpg"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/c/cb/Sequencing.jpg" style="height:332px; width:160px" /></a></p>

<p>Part of a radioactively labelled sequencing gel</p>

<p>In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The relative positions of the different bands among the four lanes, from bottom to top, are then used to read the DNA sequence.</p>

<p><a href="https://en.wikipedia.org/wiki/File:DNA_Sequencin_3_labeling_methods.jpg"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/df/DNA_Sequencin_3_labeling_methods.jpg/220px-DNA_Sequencin_3_labeling_methods.jpg" style="height:293px; width:220px" /></a></p>

<p>DNA fragments are labelled with a radioactive or fluorescent tag on the primer (1), in the new DNA strand with a labeled dNTP, or with a labeled ddNTP.</p>

<p>Technical variations of chain-termination sequencing include tagging with nucleotides containing radioactive phosphorus for&nbsp;<a href="https://en.wikipedia.org/wiki/Radioisotopic_labelling" title="Radioisotopic labelling">radiolabelling</a>, or using a primer labeled at the 5&#39; end with a&nbsp;<a href="https://en.wikipedia.org/wiki/Fluorescence" title="Fluorescence">fluorescent</a>&nbsp;dye. Dye-primer sequencing facilitates reading in an optical system for faster and more economical analysis and automation. The later development by&nbsp;<a href="https://en.wikipedia.org/wiki/Leroy_Hood" title="Leroy Hood">Leroy Hood</a>&nbsp;and coworkers<sup><a href="https://en.wikipedia.org/wiki/Sanger_sequencing#cite_note-3">[3]</a></sup><sup><a href="https://en.wikipedia.org/wiki/Sanger_sequencing#cite_note-4">[4]</a></sup>&nbsp;of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing.</p>

<p><a href="https://en.wikipedia.org/wiki/File:Radioactive_Fluorescent_Seq.jpg"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/3d/Radioactive_Fluorescent_Seq.jpg/220px-Radioactive_Fluorescent_Seq.jpg" style="height:313px; width:220px" /></a></p>

<p>Sequence ladder by radioactive sequencing compared to fluorescent peaks</p>

<p>Chain-termination methods have greatly simplified DNA sequencing. For example, chain-termination-based kits are commercially available that contain the reagents needed for sequencing, pre-aliquoted and ready to use. Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence.</p>

<h3>Dye-terminator sequencing</h3>

<p><a href="https://en.wikipedia.org/wiki/File:CE_Basic.jpg"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/fe/CE_Basic.jpg/220px-CE_Basic.jpg" style="height:133px; width:220px" /></a></p>

<p>Capillary electrophoresis</p>

<p><em>Dye-terminator sequencing</em>&nbsp;utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which emit light at different&nbsp;<a href="https://en.wikipedia.org/wiki/Wavelength" title="Wavelength">wavelengths</a>.</p>

<p>Owing to its greater expediency and speed, dye-terminator sequencing is now the mainstay in automated sequencing. Its limitations include dye effects due to differences in the incorporation of the dye-labelled chain terminators into the DNA fragment, resulting in unequal peak heights and shapes in the electronic DNA sequence trace&nbsp;<a href="https://en.wikipedia.org/wiki/Chromatogram" title="Chromatogram">chromatogram</a>&nbsp;after&nbsp;<a href="https://en.wikipedia.org/wiki/Capillary_electrophoresis" title="Capillary electrophoresis">capillary electrophoresis</a>&nbsp;(see figure to the left).</p>

<p>This problem has been addressed with the use of modified DNA polymerase enzyme systems and dyes that minimize incorporation variability, as well as methods for eliminating &quot;dye blobs&quot;. The dye-terminator sequencing method, along with automated high-throughput DNA sequence analyzers, was used for the vast majority of sequencing projects until the introduction of Next Generation Sequencing.</p>

<h3>Automation and sample preparation[<a href="https://en.wikipedia.org/w/index.php?title=Sanger_sequencing&amp;action=edit&amp;section=3" title="Edit section: Automation and sample preparation">edit</a>]</h3>

<p><a href="https://en.wikipedia.org/wiki/File:Sanger_sequencing_read_display.png"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/98/Sanger_sequencing_read_display.png/220px-Sanger_sequencing_read_display.png" style="height:51px; width:220px" /></a></p>

<p>View of the start of an example dye-terminator read</p>

<p>Automated DNA-sequencing instruments (<a href="https://en.wikipedia.org/wiki/DNA_sequencers" title="DNA sequencers">DNA sequencers</a>) can sequence up to 384 DNA samples in a single batch. Batch runs may occur up to 24 times a day. DNA sequencers separate strands by size (or length) using&nbsp;<a href="https://en.wikipedia.org/wiki/Capillary_electrophoresis" title="Capillary electrophoresis">capillary electrophoresis</a>, they detect and record dye fluorescence, and output data as fluorescent peak trace&nbsp;<a href="https://en.wikipedia.org/wiki/Chromatogram" title="Chromatogram">chromatograms</a>. Sequencing reactions (<a href="https://en.wikipedia.org/wiki/Thermocycler" title="Thermocycler">thermocycling</a>&nbsp;and labelling), cleanup and re-suspension of samples in a&nbsp;<a href="https://en.wikipedia.org/wiki/Buffer_solution" title="Buffer solution">buffer solution</a>&nbsp;are performed separately, before loading samples onto the sequencer. A number of commercial and non-commercial software packages can trim low-quality DNA traces automatically. These programs score the quality of each peak and remove low-quality base peaks (which are generally located at the ends of the sequence). The accuracy of such algorithms is inferior to visual examination by a human operator, but is adequate for automated processing of large sequence data sets.</p>

<h3>Challenges</h3>

<p>Common challenges of DNA sequencing with the Sanger method include poor quality in the first 15-40 bases of the sequence due to primer binding<sup>[<em><a href="https://en.wikipedia.org/wiki/Wikipedia:Citation_needed" title="Wikipedia:Citation needed">citation needed</a></em>]</sup>&nbsp;and deteriorating quality of sequencing traces after 700-900 bases. Base calling software such as&nbsp;<a href="https://en.wikipedia.org/wiki/Phred_base_calling" title="Phred base calling">Phred</a>&nbsp;typically provides an estimate of quality to aid in trimming of low-quality regions of sequences.<sup><a href="https://en.wikipedia.org/wiki/Sanger_sequencing#cite_note-urlPhred_-_Quality_Base_Calling-5">[5]</a></sup><sup><a href="https://en.wikipedia.org/wiki/Sanger_sequencing#cite_note-urlBase-calling_for_next-generation_sequencing_platforms_%E2%80%94_Brief_Bioinform-6">[6]</a></sup></p>

<p>In cases where DNA fragments are&nbsp;<a href="https://en.wikipedia.org/wiki/Cloned" title="Cloned">cloned</a>&nbsp;before sequencing, the resulting sequence may contain parts of the&nbsp;<a href="https://en.wikipedia.org/wiki/Cloning_vector" title="Cloning vector">cloning vector</a>. In contrast,&nbsp;<a href="https://en.wikipedia.org/wiki/PCR" title="PCR">PCR</a>-based cloning and next-generation sequencing technologies based on&nbsp;<a href="https://en.wikipedia.org/wiki/Pyrosequencing" title="Pyrosequencing">pyrosequencing</a>&nbsp;often avoid using cloning vectors. Recently, one-step Sanger sequencing (combined amplification and sequencing) methods such as Ampliseq and SeqSharp have been developed that allow rapid sequencing of target genes without cloning or prior amplification.<sup><a href="https://en.wikipedia.org/wiki/Sanger_sequencing#cite_note-7">[7]</a></sup><sup><a href="https://en.wikipedia.org/wiki/Sanger_sequencing#cite_note-8">[8]</a></sup></p>

<p>Current methods can directly sequence only relatively short (300-1000&nbsp;<a href="https://en.wikipedia.org/wiki/Nucleotides" title="Nucleotides">nucleotides</a>&nbsp;long) DNA fragments in a single reaction. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide.</p>

<p>&nbsp;</p>

<h1>Shotgun sequencing</h1>

<p>From Wikipedia, the free encyclopedia</p>

<p>In&nbsp;<a href="https://en.wikipedia.org/wiki/Genetics" title="Genetics">genetics</a>,&nbsp;<strong>shotgun sequencing</strong>&nbsp;is a method used for&nbsp;<a href="https://en.wikipedia.org/wiki/Sequencing" title="Sequencing">sequencing</a>&nbsp;long&nbsp;<a href="https://en.wikipedia.org/wiki/DNA" title="DNA">DNA</a>&nbsp;strands. It is named by analogy with the rapidly expanding, quasi-random firing pattern of a&nbsp;<a href="https://en.wikipedia.org/wiki/Shotgun" title="Shotgun">shotgun</a>.</p>

<p>The&nbsp;<a href="https://en.wikipedia.org/wiki/Sanger_sequencing#Method" title="Sanger sequencing">chain termination method</a>&nbsp;of&nbsp;<a href="https://en.wikipedia.org/wiki/DNA_sequencing" title="DNA sequencing">DNA sequencing</a>&nbsp;(&quot;Sanger sequencing&quot;) can only be used for short DNA strands of 100 to 1000&nbsp;<a href="https://en.wikipedia.org/wiki/Base_pair" title="Base pair">base pairs</a>. Due to this size limit, longer sequences are subdivided into smaller fragments that can be sequenced separately, and these sequences are&nbsp;<a href="https://en.wikipedia.org/wiki/Sequence_assembly" title="Sequence assembly">assembled</a>&nbsp;to give the overall sequence.</p>

<p>There are two principal methods for this fragmentation and sequencing process.&nbsp;<a href="https://en.wikipedia.org/wiki/Primer_walking" title="Primer walking">Primer walking</a>&nbsp;(or &quot;chromosome walking&quot;) progresses through the entire strand piece by piece, whereas shotgun sequencing is a faster but more complex process that uses random fragments.</p>

<p>In shotgun sequencing,<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-Staden-1">[1]</a></sup><sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-2">[2]</a></sup>&nbsp;DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain&nbsp;<em>reads</em>. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a continuous sequence.<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-Staden-1">[1]</a></sup></p>

<p>Shotgun sequencing was one of the precursor technologies that was responsible for enabling&nbsp;<a href="https://en.wikipedia.org/wiki/Full_genome_sequencing" title="Full genome sequencing">full genome sequencing</a>.</p>

<p>&nbsp;</p>

<h2>Hierarchical shotgun sequencing</h2>

<p><a href="https://en.wikipedia.org/wiki/File:Whole_genome_shotgun_sequencing_versus_Hierarchical_shotgun_sequencing.png"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/bd/Whole_genome_shotgun_sequencing_versus_Hierarchical_shotgun_sequencing.png/220px-Whole_genome_shotgun_sequencing_versus_Hierarchical_shotgun_sequencing.png" style="height:220px; width:220px" /></a></p>

<p>In whole genome shotgun sequencing (top), the entire genome is sheared randomly into small fragments (appropriately sized for sequencing) and then reassembled. In hierarchical shotgun sequencing (bottom), the genome is first broken into larger segments. After the order of these segments is deduced, they are further sheared into fragments appropriately sized for sequencing.</p>

<p>Although shotgun sequencing can in theory be applied to a genome of any size, its direct application to the sequencing of large genomes (for instance, the&nbsp;<a href="https://en.wikipedia.org/wiki/Human_genome" title="Human genome">human genome</a>) was limited until the late 1990s, when technological advances made practical the handling of the vast quantities of complex data involved in the process.<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-genome_sequencing-13">[13]</a></sup>&nbsp;Historically, full-genome shotgun sequencing was believed to be limited by both the sheer size of large genomes and by the complexity added by the high percentage of repetitive DNA (greater than 50% for the human genome) present in large genomes.<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-venter-14">[14]</a></sup>&nbsp;It was not widely accepted that a full-genome shotgun sequence of a large genome would provide reliable data. For these reasons, other strategies that lowered the computational load of sequence assembly had to be utilized before shotgun sequencing was performed.<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-venter-14">[14]</a></sup>&nbsp;In hierarchical sequencing, also known as top-down sequencing, a low-resolution&nbsp;<a href="https://en.wikipedia.org/wiki/Gene_mapping#Physical_Mapping" title="Gene mapping">physical map</a>&nbsp;of the genome is made prior to actual sequencing. From this map, a minimal number of fragments that cover the entire chromosome are selected for sequencing.<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-textbook-15">[15]</a></sup>&nbsp;In this way, the minimum amount of high-throughput sequencing and assembly is required.</p>

<p>The amplified genome is first sheared into larger pieces (50-200kb) and cloned into a bacterial host using&nbsp;<a href="https://en.wikipedia.org/wiki/Bacterial_artificial_chromosome" title="Bacterial artificial chromosome">BACs</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/P1-derived_artificial_chromosome" title="P1-derived artificial chromosome">PACs</a>. Because multiple genome copies have been sheared at random, the fragments contained in these clones have different ends, and with enough coverage (see section above) finding a&nbsp;<strong>scaffold</strong>&nbsp;of&nbsp;<a href="https://en.wikipedia.org/wiki/Contig#BAC_contigs" title="Contig">BAC contigs</a>&nbsp;that covers the entire genome is theoretically possible. This scaffold is called a&nbsp;<strong>tiling path</strong>.</p>

<p><a href="https://en.wikipedia.org/wiki/File:Tiling_path.png"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Tiling_path.png/220px-Tiling_path.png" style="height:138px; width:220px" /></a></p>

<p>A BAC contig that covers the entire genomic area of interest makes up the tiling path.</p>

<p>Once a tiling path has been found, the BACs that form this path are sheared at random into smaller fragments and can be sequenced using the shotgun method on a smaller scale.</p>

<p>Although the full sequences of the BAC contigs is not known, their orientations relative to one another are known. There are several methods for deducing this order and selecting the BACs that make up a tiling path. The general strategy involves identifying the positions of the clones relative to one another and then selecting the least number of clones required to form a contiguous scaffold that covers the entire area of interest. The order of the clones is deduced by determining the way in which they overlap.<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-genome_map-16">[16]</a></sup>&nbsp;Overlapping clones can be identified in several ways. A small radioactively or chemically labeled probe containing a&nbsp;<a href="https://en.wikipedia.org/wiki/Sequence-tagged_site" title="Sequence-tagged site">sequence-tagged site</a>&nbsp;(STS) can be hybridized onto a microarray upon which the clones are printed.<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-genome_map-16">[16]</a></sup>&nbsp;In this way, all the clones that contain a particular sequence in the genome are identified. The end of one of these clones can then be sequenced to yield a new probe and the process repeated in a method called chromosome walking.</p>

<p>Alternatively, the BAC&nbsp;<a href="https://en.wikipedia.org/wiki/BAC_library#Genomic_libraries" title="BAC library">library</a>&nbsp;can be restriction-digested. Two clones that have several fragment sizes in common are inferred to overlap because they contain multiple similarly spaced restriction sites in common.<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-genome_map-16">[16]</a></sup>&nbsp;This method of genomic mapping is called restriction fingerprinting because it identifies a set of restriction sites contained in each clone. Once the overlap between the clones has been found and their order relative to the genome known, a scaffold of a minimal subset of these contigs that covers the entire genome is shotgun-sequenced.<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-textbook-15">[15]</a></sup></p>

<p>Because it involves first creating a low-resolution map of the genome, hierarchical shotgun sequencing is slower than whole-genome shotgun sequencing, but relies less heavily on computer algorithms than whole-genome shotgun sequencing. The process of extensive BAC library creation and tiling path selection, however, make hierarchical shotgun sequencing slow and labor-intensive. Now that the technology is available and the reliability of the data demonstrated,<sup><a href="https://en.wikipedia.org/wiki/Shotgun_sequencing#cite_note-venter-14">[14]</a></sup>&nbsp;and the speed and cost efficiency of whole-genome shotgun sequencing has made it the primary method for genome sequencing.</p>

<p>&nbsp;</p>

<h1>Illumina dye sequencing</h1>

<p>From Wikipedia, the free encyclopedia</p>

<p><a href="https://en.wikipedia.org/wiki/File:Cluster_Generation.png"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/65/Cluster_Generation.png/440px-Cluster_Generation.png" style="height:352px; width:440px" /></a></p>

<p>The DNA attaches to the flow cell via complementary sequences. The strand bends over and attaches to a second oligo forming a bridge. A polymerase synthesizes the reverse strand. The two strands release and straighten. Each forms a new bridge (bridge amplification). The result is a cluster of DNA forward and reverse strands clones.</p>

<p><strong>Illumina dye sequencing</strong>&nbsp;is a technique used to determine the series of base pairs in&nbsp;<a href="https://en.wikipedia.org/wiki/DNA" title="DNA">DNA</a>, also known as&nbsp;<a href="https://en.wikipedia.org/wiki/DNA_sequencing" title="DNA sequencing">DNA sequencing</a>. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-1">[1]</a></sup><sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-2">[2]</a></sup>&nbsp;It was developed by&nbsp;<a href="https://en.wikipedia.org/wiki/Shankar_Balasubramanian" title="Shankar Balasubramanian">Shankar Balasubramanian</a>&nbsp;and&nbsp;<a href="https://en.wikipedia.org/wiki/David_Klenerman" title="David Klenerman">David Klenerman</a>&nbsp;of Cambridge University,<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-3">[3]</a></sup>&nbsp;who subsequently founded Solexa, a company later acquired by&nbsp;<a href="https://en.wikipedia.org/wiki/Illumina_(company)" title="Illumina (company)">Illumina</a>. This sequencing method is based on reversible dye-terminators that enable the identification of single bases as they are introduced into DNA strands. It can also be used for whole-<a href="https://en.wikipedia.org/wiki/Genome" title="Genome">genome</a>&nbsp;and region sequencing,&nbsp;<a href="https://en.wikipedia.org/wiki/Transcriptome" title="Transcriptome">transcriptome</a>&nbsp;analysis,&nbsp;<a href="https://en.wikipedia.org/wiki/Metagenomics" title="Metagenomics">metagenomics</a>, small&nbsp;<a href="https://en.wikipedia.org/wiki/RNA" title="RNA">RNA</a>&nbsp;discovery,&nbsp;<a href="https://en.wikipedia.org/wiki/Methylation" title="Methylation">methylation</a>&nbsp;profiling, and genome-wide&nbsp;<a href="https://en.wikipedia.org/wiki/Protein" title="Protein">protein</a>-<a href="https://en.wikipedia.org/wiki/Nucleic_acid" title="Nucleic acid">nucleic acid</a>&nbsp;interaction analysis.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Weiss-4">[4]</a></sup><sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Meyer-5">[5]</a></sup></p>

<p>&nbsp;</p>

<h2>Procedure</h2>

<h3>Tagmentation</h3>

<p>The first step after DNA purification is tagmentation. Enzymes called transposases randomly cut the DNA into short segments (&quot;tags&quot;). Adapters are added on either side of the cut points (ligation). Strands that fail to have adapters ligated are washed away.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Illumina,_Inc_2013-6">[6]</a></sup></p>

<p><a href="https://en.wikipedia.org/wiki/File:DNA_Processing_Preparation.png"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/7b/DNA_Processing_Preparation.png/330px-DNA_Processing_Preparation.png" style="height:248px; width:330px" /></a></p>

<p>Double stranded DNA is cleaved by transposomes. The cut ends are repaired and adapters, indices, primer binding sites, and terminal sites are added to each strand of the DNA. Image based in part on illumina&#39;s sequencing video<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Illumina,_Inc_2013-6">[6]</a></sup></p>

<h3>Reduced cycle amplification</h3>

<p>The next step is called reduced cycle amplification. During this step, sequences for primer binding, indices, and terminal sequences are added. Indices are usually six base pairs long and are used during DNA sequence analysis to identify samples. Indices allow for up to 96 different samples to be run together. During analysis, the computer will group all reads with the same index together.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Feng-7">[7]</a></sup><sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Illumina,_Inc_Multiplex-8">[8]</a></sup>The terminal sequences are used for attaching the DNA strand to the flow cell. Illumina uses a &quot;sequence by synthesis&quot; approach.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Illumina,_Inc_Multiplex-8">[8]</a></sup>This process takes place inside of an acrylamide-coated glass flow cell.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Quail-9">[9]</a></sup>&nbsp;The flow cell has oligonucleotides (short nucleotide sequences) coating the bottom of the cell, and they serve to hold the DNA strands in place during sequencing. The oligos match the two kinds of terminal sequences added to the DNA during reduced cycle amplification. As the DNA enters the flow cell, one of the adapters attaches to a complementary oligo.</p>

<p><a href="https://en.wikipedia.org/wiki/File:Oligonucleotide_chains_in_Flow_Cell.png"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/54/Oligonucleotide_chains_in_Flow_Cell.png/240px-Oligonucleotide_chains_in_Flow_Cell.png" style="height:180px; width:240px" /></a></p>

<p>Millions of oligos line the bottom of each flow cell lane.</p>

<h3>Bridge amplification</h3>

<p>Once attached, cluster generation can begin. The goal is to create hundreds of identical strands of DNA. Some will be the forward strand; the rest, the reverse. Clusters are generated through bridge amplification. Polymerases move along a strand of DNA, creating its complementary strand. The original strand is washed away, leaving only the reverse strand. At the top of the reverse strand there is an adapter sequence. The DNA strand bends and attaches to the oligo that is complementary to the top adapter sequence. Polymerases attach to the reverse strand, and its complementary strand (which is identical to the original) is made. The now double stranded DNA is denatured so that each strand can separately attach to an oligonucleotide sequence anchored to the flow cell. One will be the reverse strand; the other, the forward. This process is called bridge amplification, and it happens for thousands of clusters all over the flow cell at once.</p>

<h3>Clonal amplification</h3>

<p>Over and over again, DNA strands will bend and attach to oligos. Polymerases will synthesize a new strand to create a double stranded segment, and that will be denatured so that all of the DNA strands in one area are from a single source (clonal amplification). Clonal amplification is important for quality control purposes. If a strand is found to have an odd sequence, then scientists can check the reverse strand to make sure that it has the complement of the same oddity. The forward and reverse strands act as checks to guard against artifacts. Because Illumina sequencing uses polymerases, base substitution errors have been observed,<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Morozova-10">[10]</a></sup>&nbsp;especially at the 3&#39; end.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Jeon-11">[11]</a></sup>&nbsp;Paired end reads combined with cluster generation can confirm an error took place. The reverse and forward strands should be complementary to each other, all reverse reads should match each other, and all forward reads should match each other. If a read is not similar enough to its counterparts (with which it should be a clone), an error may have occurred. A minimum threshold of 97% similarity has been used in some labs&#39; analyses.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Jeon-11">[11]</a></sup></p>

<h3>Sequence by synthesis</h3>

<p>At the end of clonal amplification, all of the reverse strands are washed off the flow cell, leaving only forward strands. Primers attach to the forward strands and a polymerase adds fluorescently tagged nucleotides to the DNA strand. Only one base is added per round. A reversible terminator is on every nucleotide to prevent multiple additions in one round. Using the four-colour chemistry, each of the four bases has a unique emission, and after each round, the machine records which base was added. Starting with the launch of the NextSeq and later the MiniSeq, Illumina introduced a new two-colour sequencing chemistry. Nucleotides are distinguished by either one of two colours (red or green), no colour (&quot;black&quot;) or binding both colours (appearing orange as a mixture between red and green).</p>

<p><a href="https://en.wikipedia.org/wiki/File:Sequence_By_Synthesis.png"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/51/Sequence_By_Synthesis.png/220px-Sequence_By_Synthesis.png" style="height:220px; width:220px" /></a></p>

<p>Tagged nucleotides are added in order to the DNA strand. Each of the four nucleotides have an identifying label that can be excited to emit a characteristic wavelength. A computer records all of the emissions, and from this data, base calls are made.</p>

<p>Once the DNA strand has been read, the strand that was just added is washed away. Then, the index 1 primer attaches, polymerizes the index 1 sequence, and is washed away. The strand forms a bridge again, and the 3&#39; end of the DNA strand attaches to an oligo on the flow cell. The index 2 primer attaches, polymerizes the sequence, and is washed away.</p>

<p>A polymerase sequences the complementary strand on top of the arched strand. They separate, and the 3&#39; end of each strand is blocked. The forward strand is washed away, and the process of sequence by synthesis repeats for the reverse strand.</p>

<h3>Data analysis[<a href="https://en.wikipedia.org/w/index.php?title=Illumina_dye_sequencing&amp;action=edit&amp;section=8" title="Edit section: Data analysis">edit</a>]</h3>

<p>The sequencing occurs for millions of clusters at once, and each cluster has ~1,000 identical copies of a DNA insert.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Morozova-10">[10]</a></sup>&nbsp;The sequence data is analyzed by finding fragments with overlapping areas, called&nbsp;<a href="https://en.wikipedia.org/wiki/Contig" title="Contig">contigs</a>, and lining them up. If a reference sequence is known, the contigs are then compared to it for variant identification.</p>

<p>This piecemeal process allows scientists to see the complete sequence even though an unfragmented sequence was never run; however, because Illumina read lengths are not very long<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Jeon-11">[11]</a></sup>&nbsp;(HiSeq sequencing can produce read lengths around 90 bp long<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Feng-7">[7]</a></sup>), it can be a struggle to resolve short tandem repeat areas.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Feng-7">[7]</a></sup><sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Morozova-10">[10]</a></sup>&nbsp;Also, if the sequence is de novo and so a reference doesn&#39;t exist, repeated areas can cause a lot of difficulty in sequence assembly.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Morozova-10">[10]</a></sup>&nbsp;Additional difficulties include base substitutions (especially at the 3&#39; end of reads<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Jeon-11">[11]</a></sup>) by inaccurate polymerases, chimeric sequences, and PCR-bias, all of which can contribute to generating an incorrect sequence.<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Jeon-11">[11]</a></sup></p>

<h2>Comparison with other sequencing methods</h2>

<p>This technique offers a number of advantages over traditional sequencing methods such as&nbsp;<a href="https://en.wikipedia.org/wiki/Sanger_sequencing" title="Sanger sequencing">Sanger sequencing</a>. Due to the automated nature of Illumina dye sequencing it is possible to sequence multiple strands at once and gain actual sequencing data quickly. Additionally, this method only uses&nbsp;<a href="https://en.wikipedia.org/wiki/DNA_polymerase" title="DNA polymerase">DNA polymerase</a>&nbsp;as opposed to multiple, expensive&nbsp;<a href="https://en.wikipedia.org/wiki/Enzymes" title="Enzymes">enzymes</a>&nbsp;required by other sequencing techniques (i.e.&nbsp;<a href="https://en.wikipedia.org/wiki/Pyrosequencing" title="Pyrosequencing">pyrosequencing</a>).<sup><a href="https://en.wikipedia.org/wiki/Illumina_dye_sequencing#cite_note-Pettersson-12">[12]</a></sup></p>

<p>&nbsp;</p>

<p>&nbsp;</p>

<h1>Nanopore sequencing</h1>

<p>From Wikipedia, the free encyclopedia</p>

<p><a href="https://en.wikipedia.org/wiki/File:Transport_of_Alpha-Hemolysin_and_dsDNA_Complex_Towards_a_Solid_State_Nanopore.svg"><img alt="" src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Transport_of_Alpha-Hemolysin_and_dsDNA_Complex_Towards_a_Solid_State_Nanopore.svg/234px-Transport_of_Alpha-Hemolysin_and_dsDNA_Complex_Towards_a_Solid_State_Nanopore.svg.png" style="height:198px; width:234px" /></a></p>

<p>On the left is a drawing of the complex formed between alpha-hemolysin and dsDNA with linkage through an oligomer. On the right, movement of this complex in relation to a nanopore channel is shown sequentially in two steps (I) and (II). Once the complex is inserted into the nanopore, the alpha-hemolysin protein will be functional in the newly formed hybrid, biological and solid state, nanopore system.</p>

<p><strong>Nanopore sequencing</strong>&nbsp;is a third generation<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-1">[1]</a></sup>&nbsp;approach used in the&nbsp;<a href="https://en.wikipedia.org/wiki/Sequencing" title="Sequencing">sequencing</a>&nbsp;of&nbsp;<a href="https://en.wikipedia.org/wiki/Biopolymer" title="Biopolymer">biopolymers</a>- specifically,&nbsp;<a href="https://en.wikipedia.org/wiki/Polynucleotide" title="Polynucleotide">polynucleotides</a>&nbsp;in the form of&nbsp;<a href="https://en.wikipedia.org/wiki/DNA" title="DNA">DNA</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/RNA" title="RNA">RNA</a>.</p>

<p>Using nanopore sequencing, a single molecule of DNA or RNA can be sequenced without the need for&nbsp;<a href="https://en.wikipedia.org/wiki/PCR" title="PCR">PCR</a>&nbsp;amplification or chemical labeling of the sample. At least one of these aforementioned steps is necessary in the procedure of any previously developed sequencing approach. Nanopore sequencing has the potential to offer relatively low-cost&nbsp;<a href="https://en.wikipedia.org/wiki/Genotyping" title="Genotyping">genotyping</a>, high mobility for testing, and rapid processing of samples with the ability to display results in real-time. Publications on the method outline its use in rapid identification of viral pathogens,<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-2">[2]</a></sup>&nbsp;monitoring ebola,<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-3">[3]</a></sup>environmental monitoring,<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-4">[4]</a></sup>&nbsp;food safety monitoring, human genome sequencing,<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-5">[5]</a></sup>&nbsp;plant genome sequencing,<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-6">[6]</a></sup>&nbsp;monitoring of&nbsp;<a href="https://en.wikipedia.org/wiki/Antimicrobial_resistance" title="Antimicrobial resistance">antibiotic resistance</a>,<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-7">[7]</a></sup>&nbsp;haplotyping<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-8">[8]</a></sup>&nbsp;and other applications.</p>

<h2>Principles for detection and base identification</h2>

<p>Nanopore sequencing uses&nbsp;<a href="https://en.wikipedia.org/wiki/Electrophoresis" title="Electrophoresis">electrophoresis</a>&nbsp;to transport an unknown sample through an&nbsp;<a href="https://en.wikipedia.org/wiki/Nanopore" title="Nanopore">orifice of 10<sup>&minus;9</sup>&nbsp;meters in diameter</a>. A nanopore system always contains an&nbsp;<a href="https://en.wikipedia.org/wiki/Electrolyte" title="Electrolyte">electrolytic</a>&nbsp;solution- when a constant&nbsp;<a href="https://en.wikipedia.org/wiki/Electric_field" title="Electric field">electric field</a>&nbsp;is applied, an&nbsp;<a href="https://en.wikipedia.org/wiki/Electric_current" title="Electric current">electric current</a>&nbsp;can be observed in the system. The magnitude of the electric&nbsp;<a href="https://en.wikipedia.org/wiki/Current_density" title="Current density">current density</a>&nbsp;across a nanopore surface depends on the nanopore&#39;s dimensions and the composition of DNA or RNA that is occupying the nanopore. Sequencing is made possible because, when close enough to nanopores, samples cause characteristic changes in electric current density across nanopore surfaces. The total charge flowing through a nanopore channel is equal to the surface integral of electric current density flux across the nanopore unit normal surfaces between times t<sub>1</sub>&nbsp;and t<sub>2</sub>.</p>

<h2>Challenges</h2>

<p>One challenge for the &#39;strand sequencing&#39; method was in refining the method to improve its resolution to be able to detect single bases. In the early papers methods, a nucleotide needed to be repeated in a sequence about 100 times successively in order to produce a measurable characteristic change. This low resolution is because the DNA strand moves rapidly at the rate of 1 to 5&mu;s per base through the nanopore. This makes recording difficult and prone to background noise, failing in obtaining single-nucleotide resolution. The problem is being tackled by either improving the recording technology or by controlling the speed of DNA strand by various protein engineering strategies and Oxford Nanopore employs a &#39;kmer approach&#39;, analyzing more than one base at any one time so that stretches of DNA are subject to repeat interrogation as the strand moves through the nanopore one base at a time.<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-28">[28]</a></sup>&nbsp;Various techniques including algorithmic have been used to improve the performance of the MinION technology since it was first made available to users.<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-29">[29]</a></sup>&nbsp;More recently effects of single bases due to secondary structure or released mononucleotides have been shown.<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-30">[30]</a></sup><sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-31">[31]</a></sup></p>

<p>Professor Hagan Bayley proposed in 2010 that creating two recognition sites within an alpha hemolysin pore may confer advantages in base recognition.<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-pmid=20014084-32">[32]</a></sup></p>

<p>One challenge for the &#39;exonuclease approach&#39;,<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-33">[33]</a></sup>&nbsp;where a processive enzyme feeds individual bases, in the correct order, into the nanopore, is to integrate the exonuclease and the nanopore detection systems. In particular,<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-Rusk_244%E2%80%93245-34">[34]</a></sup>&nbsp;the problem is that when an exonuclease hydrolyzes the phosphodiester bonds between nucleotides in DNA, the subsequently released nucleotide is not necessarily guaranteed to directly move into, say, a nearby&nbsp;<a href="https://en.wikipedia.org/wiki/Hemolysin" title="Hemolysin">alpha-hemolysin nanopore</a>. One idea is to attach the exonuclease to the nanopore, perhaps through&nbsp;<a href="https://en.wikipedia.org/wiki/Biotinylation" title="Biotinylation">biotinylation</a>&nbsp;to the&nbsp;<a href="https://en.wikipedia.org/wiki/Beta_barrel" title="Beta barrel">beta barrel</a>&nbsp;hemolysin.<sup><a href="https://en.wikipedia.org/wiki/Nanopore_sequencing#cite_note-Rusk_244%E2%80%93245-34">[34]</a></sup>&nbsp;The central pore of the protein may be lined with charged residues arranged so that the positive and negative charges appear on opposite sides of the pore. However, this mechanism is primarily discriminatory and does not constitute a mechanism to guide nucleotides down some particular path.</p>

<p>&nbsp;</p>