Open main menu

Biolecture.org β

Changes

Detailed explanation of DNA sequencing

64 bytes removed, 13:45, 28 February 2008
no edit summary
<p>The term <strong>DNA sequencing</strong> encompasses biochemical methods for determining the order of the nucleotide bases, adenine, guanine, cytosine, and thymine, in a DNA oligonucleotide. The sequence of DNA constitutes the heritable genetic information in nuclei, plasmids, mitochondria, and chloroplasts that forms the basis for the developmental programs of all living organisms. Determining the DNA sequence is therefore useful in basic research studying fundamental biological processes, as well as in applied fields such as diagnostic or forensic research. The advent of DNA sequencing has significantly accelerated biological research and discovery. The rapid speed of sequencing attainable with modern DNA sequencing technology has been instrumental in the large-scale sequencing of the human genome, in the Human Genome Project. Related projects, often by scientific collaboration across continents, have generated the complete DNA sequences of many animal, plant, and microbial genomes.</p>
<div class="thumb tright">
<div class="thumbinner" style="WIDTH: 502px"><img class="thumbimage" height="106" alt="DNA Sequence Trace" width="500" border="0" src="http://upload.wikimedia.org/wikipedia/en/thumb/8/89/Mutation_Surveyor_Trace.jpg/500px-Mutation_Surveyor_Trace.jpg" width="500" border="0" />
<div class="thumbcaption">
<div class="magnify"><img height="11" alt="" width="15" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></div>
DNA Sequence Trace</div>
</div>
</div>
<script type="text/javascript">//<![CDATA[ if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); } //]]></script>
<p>&nbsp;</p>
<h2><span class="mw-headline">Early methods</span></h2>
<h2><span class="mw-headline">Chain-termination methods</span></h2>
<div class="thumb tright">
<div class="thumbinner" style="WIDTH: 162px"><img class="thumbimage" height="332" alt="Part of a radioactively labelled sequencing gel" width="160" border="0" src="http://upload.wikimedia.org/wikipedia/commons/c/cb/Sequencing.jpg" width="160" border="0" />
<div class="thumbcaption">
<div class="magnify"><img height="11" alt="" width="15" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></div>
Part of a radioactively labelled sequencing gel</div>
</div>
<p>The newly synthesized and labeled DNA fragments are heat denatured, and separated by size (with a resolution of just one nucleotide) by gel electrophoresis on a denaturing polyacrylamide-urea gel. Each of the four DNA synthesis reactions is run in one of four individual lanes (lanes A, T, G, C); the DNA bands are then visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image. In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The terminal nucleotide base can be identified according to which dideoxynucleotide was added in the reaction giving that band. The relative positions of the different bands among the four lanes are then used to read (from bottom to top) the DNA sequence as indicated.</p>
<div class="thumb tleft">
<div class="thumbinner" style="WIDTH: 182px"><img class="thumbimage" height="240" alt="DNA fragments can be labeled by using a radioactive or fluorescent tag on the primer (1), in the new DNA strand with a labeled dNTP, or with a labeled ddNTP. (click to expand)" width="180" border="0" src="http://upload.wikimedia.org/wikipedia/en/thumb/d/df/DNA_Sequencin_3_labeling_methods.jpg/180px-DNA_Sequencin_3_labeling_methods.jpg" width="180" border="0" />
<div class="thumbcaption">
<div class="magnify"><img height="11" alt="" width="15" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></div>
DNA fragments can be labeled by using a radioactive or fluorescent tag on the primer (1), in the new DNA strand with a labeled dNTP, or with a labeled ddNTP. (click to expand)</div>
</div>
<p>There are some technical variations of chain-termination sequencing. In one method, the DNA fragments are tagged with nucleotides containing radioactive phosphorus for radiolabelling. Alternatively, a primer labeled at the 5&rsquo; end with a fluorescent dye is used for the tagging. Four separate reactions are still required, but DNA fragments with dye labels can be read using an optical system, facilitating faster and more economical analysis and automation. This approach is known as 'dye-primer sequencing'. The later development by L Hood and coworkers<sup class="reference" id="_ref-9">[10]</sup><sup class="reference" id="_ref-10">[11]</sup> of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing.</p>
<div class="thumb tright">
<div class="thumbinner" style="WIDTH: 182px"><img class="thumbimage" height="256" alt="Sequence ladder by radioactive sequencing compared to fluorescent peaks (click to expand)" width="180" border="0" src="http://upload.wikimedia.org/wikipedia/en/thumb/3/3d/Radioactive_Fluorescent_Seq.jpg/180px-Radioactive_Fluorescent_Seq.jpg" width="180" border="0" />
<div class="thumbcaption">
<div class="magnify"><img height="11" alt="" width="15" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></div>
Sequence ladder by radioactive sequencing compared to fluorescent peaks (click to expand)</div>
</div>
<h3><span class="mw-headline">Dye-terminator sequencing</span></h3>
<div class="thumb tleft">
<div class="thumbinner" style="WIDTH: 182px"><img class="thumbimage" height="109" alt="Capillary electrophoresis (click to expand)" width="180" border="0" src="http://upload.wikimedia.org/wikipedia/en/thumb/f/fe/CE_Basic.jpg/180px-CE_Basic.jpg" width="180" border="0" />
<div class="thumbcaption">
<div class="magnify"><img height="11" alt="" width="15" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></div>
Capillary electrophoresis (click to expand)</div>
</div>
<h3><span class="mw-headline">Automation and sample preparation</span></h3>
<div class="thumb tright">
<div class="thumbinner" style="WIDTH: 182px"><img class="thumbimage" height="42" alt="View of the start of an example dye-terminator read (click to expand)" width="180" border="0" src="http://upload.wikimedia.org/wikipedia/commons/thumb/4/44/Sanger_sequencing_read_display.gif/180px-Sanger_sequencing_read_display.gif" width="180" border="0" />
<div class="thumbcaption">
<div class="magnify"><img height="11" alt="" width="15" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></div>
View of the start of an example dye-terminator read (click to expand)</div>
</div>
<p>Current methods can directly sequence only relatively short (300-1000 nucleotides long) DNA fragments in a single reaction. [2]. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide. Limitations on ddNTP incorporation were largely solved by Tabor at Harvard Medical, Carl Fuller at USB biochemicals, and their coworkers<sup class="reference" id="_ref-Reeve.2CFuller_0">[12]</sup>.</p>
<div class="thumb tleft">
<div class="thumbinner" style="WIDTH: 182px"><img class="thumbimage" height="240" alt="Genomic DNA is fragmented into random pieces and cloned as a bacterial library. DNA from individual bacterial clones is sequenced and the sequence is assembled by using overlapping regions.(click to expand)" width="180" border="0" src="http://upload.wikimedia.org/wikipedia/en/thumb/6/60/DNA_Sequencing_gDNA_libraries.jpg/180px-DNA_Sequencing_gDNA_libraries.jpg" width="180" border="0" />
<div class="thumbcaption">
<div class="magnify"><img height="11" alt="" width="15" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></div>
Genomic DNA is fragmented into random pieces and cloned as a bacterial library. DNA from individual bacterial clones is sequenced and the sequence is assembled by using overlapping regions.(click to expand)</div>
</div>
<p>Large-scale sequencing aims at sequencing very long DNA fragments. Even relatively small bacterial genomes contain millions of nucleotides, and the human chromosome 1 alone contains about 246 million bases. Therefore, some approaches consist of cutting (with restriction enzymes) or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. The fragmented DNA is cloned into a DNA vector, usually a bacterial plasmid, and amplified in <em>Escherichia coli</em>. The amplified DNA can then be purified from the bacterial cells (a disadvantage of bacterial clones for sequencing is that some DNA sequences may be inherently <em>un-clonable</em> in some or all available bacterial strains, due to deleterious effect of the cloned sequence on the host bacterium or other effects). These short DNA fragments purified from individual bacterial colonies are then individually and completely sequenced and assembled electronically into one long, contiguous sequence by identifying 100%-identical overlapping sequences between them (shotgun sequencing). This method does not require any pre-existing information about the sequence of the DNA and is often referred to as <em>de novo</em> sequencing. Gaps in the assembled sequence may be filled by Primer walking, often with sub-cloning steps (or transposon-based sequencing depending on the size of the remaining region to be sequenced). These strategies all involve taking many small <em>reads</em> of the DNA by one of the above methods and subsequently assembling them into a contiguous sequence. The different strategies have different tradeoffs in speed and accuracy; the shotgun method is the most practical for sequencing large genomes, but its assembly process is complex and potentially error-prone - particularly in the presence of sequence repeats. Because of this, the assembly of the human genome is not literally complete &mdash; the repetitive sequences of the centromeres, telomeres, and some other parts of chromosomes result in gaps in the genome assembly. Despite having only 93% of the full genome assembled, the Human Genome Project was declared complete because their definition of human genome sequencing was limited to euchromatic sequence (99% complete at the time), excluding these intractable repetitive regions.<sup class="reference" id="_ref-11">[13]</sup></p>
<div class="thumb tright">
<div class="thumbinner" style="WIDTH: 182px"><img class="thumbimage" height="247" alt="Resequencing steps. Sample prep: Extraction of nucleic acid. Template prep: Amplification and preparation of a small region of the target region. Sequencing steps. (click to expand)" width="180" border="0" src="http://upload.wikimedia.org/wikipedia/en/thumb/a/a1/Sequencing_workflow.jpg/180px-Sequencing_workflow.jpg" width="180" border="0" />
<div class="thumbcaption">
<div class="magnify"><img height="11" alt="" width="15" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></div>
Resequencing steps. Sample prep: Extraction of nucleic acid. Template prep: Amplification and preparation of a small region of the target region. Sequencing steps. (click to expand)</div>
</div>
<h2><span class="mw-headline">Major landmarks in DNA sequencing</span></h2>
<ul>
<li>1953 Discovery of the structure of the DNA double helix.</li>
</ul>
<ul>
<li>1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA.</li>
</ul>
<ul>
<li>1975 The first complete DNA genome to be sequenced is that of bacteriophage &phi;X174</li>
</ul>
<ul>
<li>1977 Allan Maxam and Walter Gilbert publish &quot;DNA sequencing by chemical degradation&quot; [3]. <font color="#810081">Fred Sanger</font>, independently, publishes &quot;DNA sequencing by enzymatic synthesis&quot;.</li>
</ul>
<ul>
<li>1980 Fred Sanger and Wally Gilbert receive the Nobel Prize in Chemistry</li>
</ul>
<ul>
<li>1982 Genbank starts as a public repository of DNA sequences.
<ul>
<li>Andre Marion and Sam Eletr from Hewlett Packard start Applied Biosystems in May, which comes to dominate automated sequencing.</li> <li>Akiyoshi Wada proposes automated sequencing and gets support to build robots with help from Hitachi.</li>
</ul>
</li>
</ul>
<ul>
<li>1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb.</li>
</ul>
<ul>
<li>1985 Kary Mullis and colleagues develop the polymerase chain reaction, a technique to replicate small fragments of DNA</li>
</ul>
<ul>
<li>1986 Leroy E. Hood's laboratory at the California Institute of Technology and Smith announce the first semi-automated DNA sequencing machine.</li>
</ul>
<ul>
<li>1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370.
<ul>
<li>Walter Gilbert leaves the U.S. National Research Council genome panel to start Genome Corp., with the goal of sequencing and commercializing the data.</li>
</ul>
</li>
<li>1990 The U.S. National Institutes of Health (NIS) begins large-scale sequencing trials on <em>Mycoplasma capricolum</em>, <em>Escherichia coli</em>, <em>Caenorhabditis elegans</em>, and <em>Saccharomyces cerevisiae</em> (at 75 cents (US)/base).
<ul>
<li>Lipman, Myers publish the BLAST algorithm for aligning sequences.</li> <li>Barry Karger (January<sup class="reference" id="_ref-25">[30]</sup>), Lloyd Smith (August<sup class="reference" id="_ref-26">[31]</sup>), and Norman Dovichi (September<sup class="reference" id="_ref-27">[32]</sup>) publish on capillary electrophoresis.</li>
</ul>
</li>
<li>1991 Craig Venter develops strategy to find expressed genes with ESTs (Expressed Sequence Tags).
<ul>
<li>Uberbacher develops GRAIL, a gene-prediction program.</li>
</ul>
</li>
<li>1992 Craig Venter leaves NIH to set up The Institute for Genomic Research (TIGR).
<ul>
<li>William Haseltine heads Human Genome Sciences, to commercialize TIGR products.</li> <li>Wellcome Trust begins participation in the Human Genome Project.</li> <li>Simon et al. develop BACs (Bacterial Artificial Chromosomes) for cloning.</li>
<li>First chromosome physical maps published:
<ul>
<li>Page et al. - Y chromosome<sup class="reference" id="_ref-28">[33]</sup>;</li> <li>Cohen et al. chromosome 21<sup class="reference" id="_ref-29">[34]</sup>.</li> <li>Lander - complete mouse genetic map<sup class="reference" id="_ref-30">[35]</sup>;</li> <li>Weissenbach - complete human genetic map<sup class="reference" id="_ref-31">[36]</sup>.</li>
</ul>
</li>
<li>1993 Wellcome Trust and MRC open Sanger Centre, near Cambridge, UK.
<ul>
<li>The GenBank database migrates from Los Alamos (DOE) to NCBI (NIH).</li>
</ul>
</li>
<li>1995 Venter, Fraser and Smith publish first sequence of free-living organism, <em>Haemophilus influenzae</em> (genome size of 1.8 Mb).
<ul>
<li>Richard Mathies et al. publish on sequencing dyes (PNAS, May)<sup class="reference" id="_ref-32">[37]</sup>.</li> <li>Michael Reeve and Carl Fuller, thermostable polymerase for sequencing<sup class="reference" id="_ref-Reeve.2CFuller_1">[12]</sup>.</li>
</ul>
</li>
<li>1996 International HGP partners agree to release sequence data into public databases within 24 hours.
<ul>
<li>International consortium releases genome sequence of yeast <em>S. cerevisiae</em> (genome size of 12.1 Mb).</li> <li>Yoshihide Hayashizaki's at RIKEN completes the first set of full-length mouse cDNAs.</li> <li>ABI introduces a capillary electrophoresis system, the ABI310 sequence analyzer.</li>
</ul>
</li>
</ul>
<ul>
<li>1997 Blattner, Plunkett et al. publish the sequence of E. coli (genome size of 5 Mb)<sup class="reference" id="_ref-33">[38]</sup></li>
</ul>
<ul>
<li>1998 Phil Green and Brent Ewing of Washington University publish <code>&ldquo;phred&rdquo;</code> for interpreting sequencer data (in use since &lsquo;95)<sup class="reference" id="_ref-34">[39]</sup>.
<ul>
<li>Venter starts new company &ldquo;Celera&rdquo;; &ldquo;will sequence HG in 3 yrs for $300m.&rdquo;</li> <li>Applied Biosystems introduces the 3700 capillary sequencing machine.</li> <li>Wellcome Trust doubles support for the HGP to $330 million for 1/3 of the sequencing.</li> <li>NIH &amp; DOE goal: &quot;working draft&quot; of the human genome by 2001.</li> <li>Sulston, Waterston et al finish sequence of <em>C. elegans</em> (genome size of 97Mb)<sup class="reference" id="_ref-35">[40]</sup>.</li>
</ul>
</li>
<li>1999 NIH moves up completion date for rough draft, to spring 2000.
<ul>
<li>NIH launches the mouse genome sequencing project.</li> <li>First sequence of human chromosome 22 published<sup class="reference" id="_ref-36">[41]</sup>.</li>
</ul>
</li>
<li>2000 Celera and collaborators sequence fruit fly <em>Drosophila melanogaster</em> (genome size of 180Mb) - validation of Venter's shotgun method. HGP and Celera debate issues related to data release.
<ul>
<li>HGP consortium publishes sequence of chromosome 21.<sup class="reference" id="_ref-37">[42]</sup></li> <li>HGP &amp; Celera jointly announce working drafts of HG sequence, promise joint publication.</li> <li>Estimates for the number of genes in the human genome range from 35,000 to 120,000. International consortium completes first plant sequence, <em>Arabidopsis thaliana</em> (genome size of 125 Mb).</li>
</ul>
</li>
<li>2001 HGP consortium publishes Human Genome Sequence draft in Nature (15 Feb)<sup class="reference" id="_ref-38">[43]</sup>.
<ul>
<li>Celera publishes the Human Genome sequence<sup class="reference" id="_ref-39">[44]</sup>.</li>
</ul>
</li>
</ul>
<ul>
<li>2005 420,000 VariantSEQr human resequencing primer sequences published on new NCBI Probe database.</li>
</ul>
<ul>
<li>2007 For the first time, a set of closely related species (12 Drosophilidae) are sequenced, launching the era of phylogenomics.
<ul>
<li>Craig Venter publishes his full diploid genome: the first human genome to be sequenced completely.</li>
</ul>
</li>
<h2><span class="mw-headline">See also</span></h2>
<ul>
<li>Sequencing</li> <li>Genome project - how entire genomes are assembled from these short sequences.</li> <li>Applied Biosystems - provided most of the chemistry and equipment for the genome projects. Next-generation technology for very high data generation rates.</li> <li>454 Life Sciences - company specializing in high-throughput DNA sequencing using a sequencing-by-synthesis approach.</li> <li>Illumina (company) - Advancing genetic analysis one billion bases at a time; whole genome sequencing.</li> <li>Joint Genome Institute - sequencing center from the US Department of Energy whose mission is to provide integrated high-throughput sequencing and computational analysis to enable genomic-scale/systems-based scientific approaches to DOE-relevant challenges in energy and the environment.</li> <li>DNA field-effect transistor</li> <li>For a description of the basic technology for tagging DNA with high Z atoms for direct imaging using transmission electron microsopy [TEM] and sequencing strands of &gt;10,000 bp per image captured, see High Z tagging technology.</li>
</ul>
<p>&nbsp;</p>
<div class="references-small" style="-moz-column-count: 2">
<ol class="references">
<li id="_note-0"><strong><a title="" href="http://en.wikipedia.org/wiki/DNA_sequencing#_ref-0">^< <li id="_note-0"><strong><a title="" href="http://en.wikipedia.org/wiki/DNA_sequencing#_ref-0">^<
</ol>
</div>
<h2><span class="mw-headline">External links</span></h2>
<ul>
<li><a class="external text" title="http://www.millegen.com" rel="nofollow" href="http://www.millegen.com/" rel="nofollow">Millegen DNA Sequencing platform</a></li> <li><a class="external text" title="http://www.jgi.doe.gov/education/how/how30minflash.html" rel="nofollow" href="http://www.jgi.doe.gov/education/how/how30minflash.html" rel="nofollow">DNA Sequencing: Dye Terminator Animation</a></li> <li><a class="external text" title="http://www.genomics.xprize.org" rel="nofollow" href="http://www.genomics.xprize.org/" rel="nofollow">Archon Genomics X PRIZE</a> - $10 million competition for fast and inexpensive sequencing technology</li>
</ul>