Difference between revisions of "Personal genomics, bioinformatics, and variomics"

From Biolecture.org
imported>Sskimb
imported>Kangho11
 
(15 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
<font size="4">&nbsp;</font>
 
<font size="4">&nbsp;</font>
<div align="left"><span style="FONT-SIZE: 13.5pt">Personal genomics, bioinformatics, and variomics</span><span style="FONT-SIZE: 9pt">&nbsp;<br />
+
<div align="left"><span style="font-size: 13.5pt;">Personal genomics, bioinformatics, and variomics</span><span style="font-size: 9pt;">&nbsp;<br />
 
<br />
 
<br />
</span><strong><span style="FONT-SIZE: 9pt">Jong Bhak, Ho Ghang, Rohit Reja,&nbsp;and Sangsoo Kim</span></strong><span style="FONT-SIZE: 9pt"><br />
+
</span><strong><span style="font-size: 9pt;">Jong Bhak<sup>1</sup>, Ho Ghang<sup>1</sup>, Rohit Reja<sup>1</sup>,&nbsp;and Sangsoo Kim<sup>2</sup>*</span></strong><span style="font-size: 9pt;"><br />
 
<br />
 
<br />
KOBIC (Korean Bioinformation Center), KRIBB, Daejeon, Korea. Dept. of Bioinformatics, Soongsil Univ., Seoul, Korea.<br />
+
<strong><sup>1</sup></strong>KOBIC (Korean Bioinformation Center), KRIBB, Daejeon 305-806, Korea. <strong><sup>2</sup></strong>Dept. of Bioinformatics, Soongsil Univ., Seoul 156-743, Korea.<br />
 
<br />
 
<br />
Correspondence: </span><span style="FONT-SIZE: 9pt"><a href="mailto:jongbhak@yahoo.com"><font color="#0000ff">jongbhak@yahoo.com</font></a>, sskimb@ssu.ac.kr</span></div>
+
*Correspondence to: E-mail &nbsp;<a href="mailto:sskimb@ssu.ac.kr"><font color="#000080">sskimb@ssu.ac.kr</font></a> Tel +82-2-820-0457 Fax +82-2-824-4383<br />
<div align="left"><span style="FONT-SIZE: 9pt"><br />
 
</span><strong><span style="FONT-SIZE: 9pt">Abstract</span></strong><span style="FONT-SIZE: 9pt"><br />
 
There are at least five complete genome sequences available&nbsp;in 2008. It is known that there are over 15,000,000 genetic variants called SNPs in&nbsp;the dbSNP&nbsp;database. The cost of a full genome sequencing in 2009&nbsp;will be&nbsp;claimed to be less than $5000 USD.&nbsp;The genomics era has arrived in 2008. This review introduces&nbsp;technologies, bioinformatics,&nbsp;genomics visions, and variomics projects. Variomics is&nbsp;the&nbsp;study of the total genetic variation in an individual and&nbsp;populations.&nbsp;Research on&nbsp;genetic variation is the most&nbsp;valuable among many genomics research branches.&nbsp;Genomics and variomics projects will change biology and the society so dramatically that biology will become an everyday technology as personal computers and the internet. 'BioRevolution' is the term that can adequately describe this change.<br />
 
 
<br />
 
<br />
 +
Running title: Genomics revolution achieved by cheap sequencing for common people<br />
 
<br />
 
<br />
</span><strong><span style="FONT-SIZE: 9pt">Introduction</span></strong><span style="FONT-SIZE: 9pt"><br />
+
</span><span style="font-size: 9pt;"> </span><strong><span style="font-size: 9pt;"><br />
Since the launch of the Human Genome Project (HGP)&nbsp;in 1990 by NIH of USA, researchers have been developing faster DNA sequencers </span><span style="FONT-SIZE: 9pt">(Shendure, Mitra et al. 2004; Chan 2005; Metzker 2005; Gupta 2008; Mardis 2008)</span><span style="FONT-SIZE: 9pt">. HGP was said to be led by James Watson who modeled DNA in Cambridge, UK in 1953. In 2003, the International Human Genome Sequencing Consortium held a press conference to announce the completion of the human genome </span><span style="FONT-SIZE: 9pt">(IHGSC 2004)</span><span style="FONT-SIZE: 9pt">. In 2008, after 55 years, his complete genome sequence was publicized by using 454 DNA sequencers developed by a company </span><span style="FONT-SIZE: 9pt">(Wheeler, Srinivasan et al. 2008)</span><span style="FONT-SIZE: 9pt">.&nbsp;In 2007, Craig Venter of former Celera founder published his own personal genome in PLoS Biology </span><span style="FONT-SIZE: 9pt">(Levy, Sutton et al. 2007)</span><span style="FONT-SIZE: 9pt">.&nbsp;We are entering the personalized biology era with the advent of next generation sequencing technologies.<br />
+
</span></strong><span style="font-size: 9pt;"><strong>Abstract</strong><br />
 +
In 2008 at least five complete genome sequences are available. It is known that there are over 15,000,000 genetic variants, called SNPs, in the dbSNP database. The cost of full genome sequencing in 2009 is claimed to be less than $5000 USD. The genomics era has arrived in 2008. This review introduces technologies, bioinformatics, genomics visions, and variomics projects. Variomics is the study of the total genetic variation in an individual and populations. Research on genetic variation is the most valuable among many genomics research branches. Genomics and variomics projects will change biology and the society so dramatically that biology will become an everyday technology like personal computers and the internet. 'BioRevolution' is the term that can adequately describe this change.<br />
 +
&nbsp;<br />
 +
<strong>Introduction</strong><br />
 +
Since the launch of the Human Genome Project (HGP) in 1990 by NIH of USA, researchers have been developing faster DNA sequencers (Chan, 2005; Gupta, 2008; Mardis, 2008; Metzker, 2005; Shendure et al., 2004). HGP has been said to be led by James Watson who modeled DNA in Cambridge, UK in 1953. In 2003, the International Human Genome Sequencing Consortium held a press conference to announce the completion of the human genome (IHGSC, 2004). In 2008, after 55 years, Watson's complete genome sequence was publicized by using 454 DNA sequencers developed by a company rather than a research institute (Wheeler et al., 2008). In 2007, Craig Venter, a former Celera founder, published his own personal genome in PLoS Biology (Levy et al., 2007). We are entering the personalized biology era with the advent of next generation sequencing technologies.<br />
 
<br />
 
<br />
</span><strong><span style="FONT-SIZE: 9pt">DNA sequencing</span></strong><span style="FONT-SIZE: 9pt"><br />
+
<strong>DNA sequencing</strong><br />
The first breakthrough in genome sequencing came from Watson's&nbsp;colleague in Cambridge, Fred Sanger. In 1977, Sanger and his team produced the first useful DNA sequencing method and publicized the first complete genome </span><span style="FONT-SIZE: 9pt">(Sanger, Air et al. 1977)</span><span style="FONT-SIZE: 9pt">. It was a tiny virus genome known as phi X 174. Soon after phi X 174, he published the first complete organelle genome which was mitochondrion </span><span style="FONT-SIZE: 9pt">(Anderson, Bankier et al. 1981)</span><span style="FONT-SIZE: 9pt">. By 1998, researchers in the US evaluated multiplex genome sequencing technologies and were aware that one person's whole genome could be sequenced in one day using contemporary technologies. George Church was the Ph.D. student of Walter Gilbert who received a Nobel Prize with Sanger for developing a sequencing method. Gilbert's method was not used much. However, his colleague Church kept developing sequencing methods. One of them is based on Polony idea </span><span style="FONT-SIZE: 9pt">(Porreca, Shendure et al. 2006)</span><span style="FONT-SIZE: 9pt">. This technology is used by KNOME Inc. that is a full genome sequencing company. Genome sequencing technology&nbsp;is moving forward to the level as computer CPUs are universally used. DNA sequencing is one of the most important industrial technologies in biology due to its perpetual use and new applications in the future.&nbsp;<br />
+
The first breakthrough in genome sequencing came from Watson's colleague, Fred Sanger, in Cambridge, UK. In 1977, Sanger and his team produced the first useful DNA sequencing method and publicized the first complete genome (Sanger et al., 1977). It was a tiny virus genome known as phi X 174. Soon after phi X 174, he published the first complete organelle genome which was a mitochondrion (Anderson et al., 1981). By 1998, researchers in the US evaluated multiplex genome sequencing technologies and were aware that one person's whole genome could be sequenced in a day using contemporary technologies. George Church was a Ph.D. student of Walter Gilbert who received a Nobel Prize with Sanger for developing a sequencing method. Gilbert's method was not widely used. However, his colleague Church continued to develop sequencing methods. One of them is based on the Polony idea (Porreca et al., 2006). This technology is used by KNOME Inc., a full genome sequencing company. Along with KNOME, other companies, such as Complete Genomics, are now producing DNA sequences cheaply and in an unprecedented capacity. The speed of sequencing is advancing many folds per year, much faster than the cycle of semiconductor chips in computer industries. Also, genome sequencing technology is becoming an everyday technology at the level as computer CPUs are universally used. In five years' time, experts predict that everyone in developed nations will be able to have his or her own genome information. Due to its far reaching consequences in medicine, health, biology, nanotechnology, and information technology, DNA sequencing will become the most important industrial technology ever developed during the next decades. &nbsp;<br />
 
<br />
 
<br />
</span><strong><span style="FONT-SIZE: 9pt">Personal Genomics</span></strong><span style="FONT-SIZE: 9pt"><br />
+
<strong>Personal Genomics</strong><br />
In 2009, genome sequencing technologies will achieve one person's whole genome per day in terms of DNA fragments sequenced. Personal genomics is a new term that utilizes such fast sequencers. In 2008, the cost for one personal genome is less than $300,000 USD. If the cost goes down below $1,000 USD, the impact of personal genomics is predicted to be the largest ever in biology&nbsp;on common people's life.&nbsp;PGP (Personal Genome Project) is a project to sequence as many people as possible with low costs </span><span style="FONT-SIZE: 9pt">(Church 2005)</span><span style="FONT-SIZE: 9pt">. Google Inc. and Church group are working together to sequence 100,000 people's genetic regions of DNA. In Saudi Arabia, the government is planning to sequence 100 Arabic people. In Europe, there are various groups of people and nations who have been genotyping the populations. Especially, Iceland has been successful in that effort by utilizing their well-kept genealogical data encompassing 100,000s people. In Asia, Jeongsun Seo of Seoul National University has been working on East Asia Genome Project in the past years. His group collected thousands of samples from Mongolian tribes with a gigantic genealogical tree among them
+
In 2009, genome sequencing technologies will achieve one person's whole genome per day in terms of DNA fragments sequenced. Personal genomics is a new term that utilizes such fast sequencers. In 2008, the cost for one personal genome is less than $350,000 USD. If the cost goes down below $1,000 USD, the impact of personal genomics is predicted to be the largest ever in biology in common people's lives. Reflecting this technological advancement to society is the PGP (Personal Genome Project), a project to sequence as many people as possible with lowest possible cost (Church, 2005). At present, Google, Inc. and the Church group are working together to sequence 100,000 people's genetic regions of DNA. In Saudi Arabia, the government is planning to sequence 100 Arabic people's genome. In Europe, there are various groups of people and nations who have been genotyping those populations. Iceland has been especially successful in that effort by utilizing their well-kept genealogical data encompassing hundreds of thousands of people. In Asia, Jeongsun Seo of Seoul National University has been working on the East Asia Genome Project during the past several years. His group has collected thousands of samples from Mongolian tribes with a extremely large genealogical tree among them (Park et al., 2008; Sung et al., 2008). Seo is said to be sequencing at least 100 Korean genomes in collaboration with Church and Green Cross, Inc. of Korea. The aim of Seo's genome project is to produce a resource for East Asians. He is presently sequencing at least two Korean people. In China, Beijing Genome Institute has been successful in terms of sequencing. Their first achievement came from a plant genome, rice. After rice, they launched a 100 Han Chinese genome sequencing project. In Nov. 2008, they published their first Chinese genome in a journal, Nature. In Dec. 2008, another Korean group, Lee Gilyeo Cancer and Diabetes Institute (LCDI) and Korean Bioinformation Center (KOBIC) made a Korean genome sequence public. The genome was sequenced by Solexa paired-end sequencer, and comparative genomics analyses and SNP data were uploaded as a public resource. It took only one week to analyze the 7.8x Korean genome using 150 computer CPUs to produce mapping DNA fragments to a reference genome, generate new SNP information, compare that with other individual genomes, and map it with 1600 already known phenotype information from the public literature.<br />
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">(</font><a href="http://www.macrogen.co.kr/eng/macrogen/state.jsp"><font color="#0000ff" size="2">http://www.macrogen.co.kr/eng/macrogen/state.jsp</font></a><font size="2">)</font></div>
+
&nbsp;<br />
</span><span style="FONT-SIZE: 9pt">(Park et al. 2008; Sung et al. 2008)</span><span style="FONT-SIZE: 9pt">. Seo is planning on sequencing at least 100 Korean genomes in collaboration with Church and Green Cross Inc. of Korea. The aim of Seo's genome project is to produce a resource for the East Asians as well as Koreans. He is presently sequencing at least two Korean people. In China, Beijing Genome Institute has been successful in terms of sequencing. Their first achievement came from a plant genome, rice. After rice, they launched a 99 Han Chinese genome sequencing project. In Nov. 2008, they published their first Chinese genome in a magazine, Nature. In Dec. 2008, another Korean group Lee Gilyeo Cancer and Diabetes Institute and Korean Bioinformation Center (KOBIC) made a Korean genome sequence public. The genome was sequenced by Solexa paired-end sequencer and comparative genomics analyses and SNP data were uploaded as a public resource for everyone.&nbsp;<br />
+
<strong>Genome Revolution </strong><br />
 +
These public genome data alongside previously known Craig Venter's and James Watson's mark that full genome sequences are not soley in academic domain anymore. Anyone who has money and the will can sequence human genomes. This 'genomic revolution' will eventually lead to the 'BioRevolution' in terms of making the most essential human information completely mapped and publically available. This is revolutionary, because humans can now engineer themselves with a map or a blue print not directly relying on trial and error style conventional evolutionary methods. This indicates that evolution has moved to a conscious level driving evolution. We are in effect designing evolution using computers. &nbsp;<br />
 
<br />
 
<br />
</span><strong><span style="FONT-SIZE: 9pt">Genome revolution&nbsp;</span></strong><strong><span style="FONT-SIZE: 9pt"><br />
+
<strong>Genomes and Personalized Medicine</strong><br />
</span></strong><span style="FONT-SIZE: 9pt">These public genome data alongside previously known Craig Venter's and James Watson's mark that full genome sequences are not in academic domain anymore. Anyone who has money and will can sequence human genomes. This 'genomic revolution' will eventually lead to the 'BioRevolution' in terms of making the most essential human information completely mapped and publically available. These are revolutionary because humans can now engineer themselves with a map or a blue print not directly relying on trial and error style conventional evolutionary methods. This indicates that evolution went into a conscious level of driving evolution. It is almost designing the evolution using computers.&nbsp;<br />
+
The consequences of 'BioRevolution' where genomic information is utilized by scientists to engineers all kinds of biological processes, including evolution itself, will bring us personalized medicine. The essence of personalized medicine is that enzymes in our tissues, such as cytochrome P450, have distinct differences among individuals and populations. Certain drugs produce different responses in individuals. <br />
 
<br />
 
<br />
</span><strong><span style="FONT-SIZE: 9pt">Genomes and personalized medicine</span></strong><span style="FONT-SIZE: 9pt"><br />
+
<strong>Cytochrome p450 family example</strong><br />
The consequences of 'BioRevolution' where genomic information is utilized by scientists to engineers all kinds of biological processes including evolution itself will bring us the personalized medicine. The essence of personalized medicine is that enzymes in our tissues such as cytochrome P450 have distinct differences among individuals and populations. Certain drugs produce different responses in individuals.&nbsp;<br />
+
The cytochrome P450 (CYP) family of liver enzymes is responsible for breaking down more than 30 different classes of drugs during Phase I of drug metabolism. Structural and SNP variations of the genes that code for these enzymes can influence their ability to metabolize certain drugs. Based upon this, a population can be categorized into four major types of drug metabolizers: <br />
 +
&quot;&nbsp;&nbsp; &nbsp;Extensive metabolizers: Individuals that can be administered with normal drug dosage <br />
 +
&quot;&nbsp;&nbsp; &nbsp;Intermediate metabolizers: Individuals that metabolize drugs with a&nbsp; slower than normal rate. <br />
 +
&quot;&nbsp;&nbsp; &nbsp;Poor metabolizers: Individuals with poor metabolizing rates. Drugs may accumulate and cause serious adverse effects. <br />
 +
&quot;&nbsp;&nbsp; &nbsp;Ultra metabolizers: Individuals with metabolizing rates even faster than extensive metabolizers. They may experience no effect of drug activity. <br />
 
<br />
 
<br />
</span><strong><span style="FONT-SIZE: 9pt">Cytochrome p450 family example</span></strong><span style="FONT-SIZE: 9pt"><br />
+
In early 2005, the US FDA cleared the AmpliChip&reg; CYP450 Test, which measures variations in two genes of the CYP450 enzyme system: CYP2D6 and CYP2C19. The Roche AmpliChip CYP450 Test is intended to identify a patient's CYP2D6 and CYP2C19 genotype from genomic DNA extracted from a whole blood sample. Information about CYP2D6 and CYP2C19 genotype may be used as an aid to clinicians in determining therapeutic strategy and treatment dose for therapeutics that are metabolized by the CYP2D6 or CYP2C19 gene product.<br />
The cytochrome P450 (CYP) family of liver enzymes&nbsp;are responsible for breaking down more than 30 different classes of drugs during phase I of drug metabolism. Structural and SNP variations of the&nbsp;genes that code for these enzymes can influence their ability to metabolize certain drugs. Based upon this, a population can be categorized into four major types of drug metabolizers: </span></div>
+
&nbsp;<br />
<ul type="disc">
+
<strong>Variomics</strong><br />
     <li style="TEXT-ALIGN: left"><span style="FONT-SIZE: 9pt">Extensive metabolizers: The&nbsp;individuals that can be administered with normal drug dosage </span></li>
+
The most important scientific data out of personal genomes are the precise sequence differences among individuals. Such differences have many types. There are structural differences among chromosomes. There can be insertions and deletions of DNA segments. There are certain fragments that appear as repeats in genomes. Mapping all these structural genetic variations can be briefly termed 'variomics'. A variome is the totality of genetic variation found in an individual, a population, and a species. Among all the variations we know, the most common is the single nucleotide polymorphisms (SNP). In Korea, mapping the variome has been pursued relatively early, and there are several groups who are mapping the genetic variations. KOBIC has several very early stage, if not the earliest in the world, variome servers: http://variome.net and http://variomics.net. Along with SNP variation, the copy number variation (CNV) is also important. Some recent reports tell us that CNVs can be as variable as or even more variable than SNPs that are simple DNA base changes in populations. Yeun-Jun Chung of the Catholic University of Korea has been mapping CNVs among Korean people (Kim et al., 2008).<br />
     <li style="TEXT-ALIGN: left"><span style="FONT-SIZE: 9pt">Intermediate metabolizers&nbsp;: The individuals that metabolizes drug with a rate slower than the normal rate. </span></li>
+
<br />
     <li style="TEXT-ALIGN: left"><span style="FONT-SIZE: 9pt">Poor metabolizers: The individuals with poor metabolizing rate. Drugs make accumulate and cause serious adverse effects. </span></li>
+
<strong>Human Variome Project (HVP)</strong><br />
     <li style="TEXT-ALIGN: left"><span style="FONT-SIZE: 9pt">Ultra metabolizers: Individuals with metabolizing rate faster than extensive metabolizers. They may experience no effect of drug activity. </span></li>
+
As an international collaboration, headed by Richard Cotton, HVP was launched in 2006 (http://humanvariomeproject.org) (Ring et al., 2006). HVP aims to make clinicians who have been working on rare diseases, to work together with molecular biologists and bioinformaticians. Their goal is to link medical information with genotype information. Succinctly, this process is called genotype to phenotype mapping. As several full human genome sequences are already available, mapping phenotypes to full genomes will be the major challenge of biology in the next 20 years. <br />
 +
<br />
 +
<strong>Asian Variome Project (AVP)</strong><br />
 +
Alongside and with the associations of eIMBL, A-IMBN, and HVP, a variome project that is working to map the Asian population variome was launched in 2008. This was a group effort by Korean researchers who have been interested in genome sequences, SNPs, and CNVs. They have formed the KOrean VAriome Consortium (KOVAC: http://variome.kr) and support AVP as one of the first projects. eIMBL, the virtual laboratory network of Asia linking key biology groups modeled after EMBL, has acquired $80,000 USD in 2008 to support AVP. eIMBL aims to establish a virtual bioinformatics center in the Asia Pacific region that will link many bioinformation processing scientists in Asia.<br />
 +
<br />
 +
<strong>Construction of Reference Genomes for the world</strong><br />
 +
Sanger Center, EBI, NCBI, and the University of Washington Genome Center have formed a consortium to produce a reference genome (http://referencegenome.org). A reference standard is the most important standard among all the standards. Providing an accurate reference genome to biologists is an important task. The first reference genome by the above consortium is based on Caucasian genomes. Due to the extent of SNPs and CNVs, it is necessary to construct reference genomes for diverse ethnic groups. In Korea, since 2006, the reference standard genome project began and produced the first draft for Koreans in November, 2008, using a male donor. Through the bioinformatic analysis, the Korean researchers in LCDI and KOBIC found that there was a good justification for any nation to launch large scale genome projects to map population diversities. Even such close populations as Korean and the Chinese showed a large quantity of SNP differences.<br />
 +
<br />
 +
<strong>Bioinformatics for Personal Genomes and Variomes</strong><br />
 +
Bioinformatics is the key in personal genome projects and variome projects. Bioinformatics is not merely a set of tools but a scientific discipline. It regards life as a gigantic information processing phenomenon and works to map its components and to model the emerging networks of the components. Bioinformatics in 2008 is driving biology into an information science. Most biology research projects produce massive amounts of data that cannot be processed by hand. Nearly all biological research outcomes in the next five years will have some form of high throughput data such as genome sequences, microarray data, proteome analyses, SNPs, epigenome chips, and large scale phenotype mapping. Bioinformatics tools in genomics and variomics can be found from various internet resources. There are several bioinformatics hubs such as NCBI (National Center for Biotechnology Information), EBI (European Bioinformatics Institute), DDBJ (Databank of Japan), and KOBIC. Some others are: Bioinformatics Organization (http://Bioinformatics.Org), EMBnet (http://www.embnet.org/), and The International Society for Computational Biology (http://iscb.org). <br />
 +
<br />
 +
The following are major bioinformatics journals:<br />
 +
<br />
 +
</span>
 +
<ul>
 +
    <li><span style="font-size: 9pt;">Algorithms in Molecular Biology (http://www.almob.org/) </span></li>
 +
     <li><span style="font-size: 9pt;">Bioinformatics (http://bioinformatics.oxfordjournals.org/) </span></li>
 +
    <li><span style="font-size: 9pt;">BMC Bioinformatics (http://www.biomedcentral.com/bmcbioinformatics) </span></li>
 +
    <li><span style="font-size: 9pt;">Briefings in Bioinformatics (http://bib.oxfordjournals.org/) </span></li>
 +
     <li><span style="font-size: 9pt;">Genome Research (http://genome.cshlp.org/) </span></li>
 +
    <li><span style="font-size: 9pt;">Genomics and Informatics (http://www.genominfo.org) </span></li>
 +
    <li><span style="font-size: 9pt;">The International Journal of Biostatistics (http://www.bepress.com/ijb/) </span></li>
 +
     <li><span style="font-size: 9pt;">Journal of Computational Biology (http://www.liebertpub.com/Products/Product.aspx?pid=31&amp;AspxAutoDetectCookieSupport=1) </span></li>
 +
    <li><span style="font-size: 9pt;">Cancer Informatics (http://www.la-press.com/journal.php?pa=description&amp;journal_id=10) </span></li>
 +
     <li><span style="font-size: 9pt;">Molecular Systems Biology (http://www.nature.com/msb/index.html </span></li>
 +
    <li><span style="font-size: 9pt;">PLoS Computational Biology (http://www.ploscompbiol.org/home.action) </span></li>
 +
    <li><span style="font-size: 9pt;">International Journal of Bioinformatics Research and Applications (http://www.inderscience.com/browse/index.php?journalcode=ijbra) </span></li>
 
</ul>
 
</ul>
<div align="left"><span style="FONT-SIZE: 9pt"><br />
+
<span style="font-size: 9pt;"><br />
</span><strong><span style="FONT-SIZE: 9pt">Variomics</span></strong><span style="FONT-SIZE: 9pt"><br />
+
<strong>Sequencing DNA, Metagenomics, and Ecogenomics</strong><br />
The most important scientific data out of personal genomes are the precise sequence differences among individuals. Such differences have many types.&nbsp;There are structural differences between chromosomes. There can be insertions and deletions of DNA segments. There are certain fragments that appear as repeats in genomes. Mapping all these structural genetic variations can be briefly termed as 'variomics'. A variome is the totality of genetic variation found in an individual, a population, and a species. Among all the variations we know,&nbsp;the most common one is single nucleotide polymorphisms (SNP).&nbsp;In Korea, mapping the variome has been pursued relatively early and there are several groups who are mapping the genetic variations. KOBIC has several very early stage, if not the earliest in the world, variome servers; <a href="http://variome.net/"><font color="#800080">http://variome.net</font></a> and <a href="http://variomics.net/"><font color="#0000ff">http://variomics.net</font></a>. Along with SNP variation, the copy number variation (CNV) is also important. Some recent reports tell us that CNVs can be as variable as or even more variable than SNPs that are simple DNA base changes in populations. Yeun-Jun Chung of Catholic University of Korea has been mapping CNVs among Korean people (Kim et al. 2008).</span></div>
+
Next generation sequencing methods will not only map genomes. They will be used to map the environment. This is called ecogenomics. To humans the environment can mean various microbial, plant, and animal interactions around us. Microbial interaction is especially critical to our health. Gut bacteria are a natural environment within us. Metagenomics is a methodology that sequences the whole set of microbes in our food tract. Researchers are realizing that the human genome is complemented by such environmental genomes. A new term, 'ecogenomics' is now used to describe these concepts. Metagenomics and ecogenomics are for mapping the variations of environmental genetic factors.<br />
<div align="left"><span style="FONT-SIZE: 9pt">In early 2005, the US FDA cleared the AmpliChip<sup>&reg;</sup> CYP450 Test, which measures variations in two genes of the CYP450 enzyme system: CYP2D6 and CYP2C19. The Roche AmpliChip CYP450 Test is intended to identify a patient's CYP2D6 and CYP2C19 genotype from genomic DNA extracted from a whole blood sample. Information about CYP2D6 and CYP2C19 genotype may be used as an aid to clinicians in determining therapeutic strategy and treatment dose for therapeutics that are metabolized by the CYP2D6 or CYP2C19 gene product.<br />
 
 
<br />
 
<br />
</span></div>
+
<strong>Mapping Expression using DNA sequencing</strong><br />
<div align="left"><strong><span style="FONT-SIZE: 9pt">Human Variome Project (HVP)</span></strong></div>
+
DNA sequencing technologies were mostly used for mapping genotypes. However, they are now used to map RNA expression levels in cells. Cells produce various types of RNA. mRNA is the most abundant and important. In the past, microarray and DNA chips were used to measure expression levels. They are not accurate and take many bioinformatic adjustments before producing reliable expression data. New sequencing technologies can measure expression levels much more accurately. By sequencing the RNAs, we can now quantify the expression levels by precisely knowing the RNA sequences. Sequencing technologies will restructure the expression analyses in the future. <br />
<p align="left"><span style="FONT-SIZE: 9pt">As an international collaboration, headed by Richard Cotton, HVP was launched in 2006 (<a href="http://humanvariomeproject.org/"><font color="#0000ff">http://humanvariomeproject.org</font></a>) </span><span style="FONT-SIZE: 9pt">(Ring, Kwok et al. 2006)</span><span style="FONT-SIZE: 9pt">. HVP aims to&nbsp;make clinicians who have been working on rare diseases, to work together with molecular biologists and bioinformaticians. Their goal is to link medical information with genotype information. Succinctly this process is called genotype to phenotype mapping. As several full human genome sequences are already available, mapping phenotypes to the full genomes will be the major challenge of biology in the next 20 years.&nbsp;<br />
 
 
<br />
 
<br />
</span><strong><span style="FONT-SIZE: 9pt">Asian Variome Project (AVP)</span></strong><span style="FONT-SIZE: 9pt"><br />
+
<strong>Linking Genome information On-line </strong><br />
Alongside and with the associations of eIMBL, A-IMBN, and HVP, a variome project that tries to map Asian population variome was launched in 2008. This was a group effort of Korean researchers who have been interested in genome sequences,&nbsp;SNPs, and CNVs. They have formed a Korean Variome Consortium (KOVAC: <a href="http://variome.kr/"><font color="#0000ff">http://variome.kr</font></a>) and supported AVP as one of the first projects. eIMBL that is the virtual laboratory network of Asia linking key biology groups modeled after EMBL has acquired $80,000 USD in 2008 to support AVP. eIMBL aims to establish a virtual bioinformatics center in Asia Pacific region that links many bioinformation processing scientists in Asia.<br />
+
Sequencing a genome is basically the production of data, whereas analyzing the whole genome takes human minds networking their hypotheses, proofs, and discoveries, i.e. genomics is a scientific endeavor beyond mechanical sequencing. Therefore, a worldwide effort is required to link all the genome information for proper management and utilization. The internet is the best infrastructure for genome information exchange. Bioinformatics resources should be available as freely as possible for all nations, including those underdeveloped and developing. Genome sequencing and associated analyses should be done freely in certain instances by the support of local governments and international organizations. For maximum efficiency, an adequate data and information license should also be required. Some researchers propose an openfree sharing of bioinformatics analysis tools, as well as the genome sequences (under proper permission). One such movement is Free Genomics (http://freegenomics.org). <br />
 
<br />
 
<br />
</span><strong><span style="FONT-SIZE: 9pt">Bioinformatics for personal genomes and variomes</span></strong><span style="FONT-SIZE: 9pt"><br />
+
The following are on-line genomics sites.<br />
Bioinformatics is the key in personal genome projects and variome projects. Bioinformatics is not a set of tools but it is a proper scientific discipline. It regards life as a gigantic information processing phenomenon and tries to map its components and to model the emerging networks of the components. Bioinformatics in 2008 is driving biology into an information science. Most biology researches are now with massive amount of data that cannot be processed by hand. Nearly all the biological research outcomes in the next&nbsp;five years will have some form of high throughput data such as genome sequences, microarray data, proteome analyses, SNPs, epigenome chips, and large scale phenotype mapping. Bioinformatics tools in genomics and variomics can be found from various internet resources. There are various bioinformatics hubs such as NCBI (National Center for Biotechnology Information), EBI (European Bioinformatics Institute), DDBJ (Databank of Japan), and KOBIC.&nbsp;Some&nbsp;others are: Bioinformatics Organization (<a href="http://bioinformatics.org/"><font color="#0000ff">http://Bioinformatics.Org</font></a>), EMBnet (<a href="http://www.embnet.org/"><font color="#0000ff">http://www.embnet.org/</font></a>), and&nbsp;The International Society for Computational Biology (<a href="http://iscb.org/"><font color="#0000ff">http://iscb.org</font></a>). The following are major bioinformatics journals:<br />
 
 
<br />
 
<br />
Algorithms in Molecular Biology (http://www.almob.org/)<br />
+
</span>
Bioinformatics (http://bioinformatics.oxfordjournals.org/)<br />
+
<ul>
BMC Bioinformatics (http://www.biomedcentral.com/bmcbioinformatics)<br />
+
    <li><span style="font-size: 9pt;">Genomics portal: http://genomics.org</span></li>
Briefings in Bioinformatics (http://bib.oxfordjournals.org/)<br />
+
    <li><span style="font-size: 9pt;">Personal Genome Project: http://personalgenomes.org</span></li>
Genome Research&nbsp;(http://genome.cshlp.org/)<br />
+
    <li><span style="font-size: 9pt;">openfree Genomics Project: http://personalgenome.net</span></li>
Genomics and Informatics (<a href="http://kogo.or.kr/"><font color="#0000ff">http://kogo.or.kr</font></a>)<br />
+
    <li><span style="font-size: 9pt;">Personal Genome sequencing company: http://www.knome.com</span></li>
The International Journal of Biostatistics (http://www.bepress.com/ijb/)<br />
+
    <li><span style="font-size: 9pt;">Personal Genome SNP typing: http://decodeme.com</span></li>
Journal of Computational Biology (http://www.liebertpub.com/Products/Product.aspx?pid=31&amp;AspxAutoDetectCookieSupport=1)<br />
+
    <li><span style="font-size: 9pt;">Google's Personal Genome Typing: http://23andme.com</span></li>
Cancer Informatics (http://www.la-press.com/journal.php?pa=description&amp;journal_id=10)<br />
+
    <li><span style="font-size: 9pt;">The Sanger Centre: http://sanger.ac.uk</span></li>
Molecular Systems Biology (http://www.nature.com/msb/index.html<br />
+
    <li><span style="font-size: 9pt;">General Omics site: http://omics.org</span></li>
PLoS Computational Biology (http://www.ploscompbiol.org/home.action)<br />
+
    <li><span style="font-size: 9pt;">Korean Genome Data Site: http://koreagenome.org</span></li>
International Journal of Bioinformatics Research and Applications (http://www.inderscience.com/browse/index.php?journalcode=ijbra)<br />
+
    <li><span style="font-size: 9pt;">Korean Bioinformation Center: http://kobic.kr</span></li>
<br />
+
</ul>
</span><strong><span style="FONT-SIZE: 9pt">Sequencing DNA, Metagenomics, and Ecogenomics</span></strong><span style="FONT-SIZE: 9pt"><br />
+
<span style="font-size: 9pt;"><br />
Next generation sequencing methods are not only mapping genomes. They can be used to map the environment. It is called ecogenomics. Environment to humans can be various microbial, plant, and animal interactions around us. Especially, microbial interaction is critical to our health. Gut bacteria are natural environment to us. Metagenomics is a methodology that sequences the whole set of microbes in our food tract. Researchers are realizing that human genome is complemented by such environmental genomes. A new term, 'ecogenomics' is now used to describe these concepts. Metagenomics and ecogenomics are for mapping the variation of environmental genetic factors.<br />
+
<strong>Conclusion</strong><br />
 +
We have examined the current trends in genomics and variomics. In 2009 and onwards, personal genome projects will produce an unprecedented amount of biological data. New bioinformatics technologies will be required to handle them. New sequencing technologies will drive the next decades of biology and transform medical practices. Fast sequencing brought us interesting and unexpected applications such as metagenomics and ecogenomics. <br />
 
<br />
 
<br />
 +
<strong>Acknowledgements </strong><br />
 +
SK was supported by Soongsil University Research Fund. JB, GH, and RR were supported by KRIBB/KOBIC fund from the MEST of Korea. The authors thank Maryana Bhak for editing the manuscript.<br />
 
<br />
 
<br />
</span><strong><span style="FONT-SIZE: 9pt">Mapping expression using DNA sequencing</span></strong><span style="FONT-SIZE: 9pt"><br />
+
<strong>References</strong><br />
DNA sequencing technology&nbsp;used to be&nbsp;for mapping genotypes. However, they are now used to map expression levels in cells. Cells produce various RNAs. mRNA is the most abundant and important. In the past, microarray and DNA chips were used for measuring expression levels. They are not accurate and it takes many bioinformatic adjustments before it becomes reliable expression data. New sequencing technologies can measure expression levels much more accurately. By sequencing the RNAs, we can now quantify the mRNA levels by precisely knowing the RNA sequences. Sequencing technologies will restructure the expression analyses in the future.</span></p>
+
Anderson, S., A. T. Bankier, B. G. Barrell, M. H. de Bruijn, A. R. Coulson, J. Drouin, I. C. Eperon, D. P. Nierlich, B. A. Roe, F. Sanger, P. H. Schreier, A. J. Smith, R. Staden, and I. G. Young. (1981). Sequence and organization of the human mitochondrial genome. Nature 290:457-65.<br />
<div><strong><font size="2">Conclusion</font></strong><br />
+
Chan, E. Y. (2005). Advances in sequencing technology. Mutat Res 573:13-40.<br />
<font size="2">In 2009 and onwards, personal genome projects will produce unprecedented amount of biological data. New bioinformatics technologies will be required to handle them. New sequencing technologies will drive the next decades of biology and transform the medical practices in hospitals within the next decades. Fast sequencing unexpectedly brought us interesting applications such as metagenomics and ecogenomics. </font><font size="2">We have examined the current trends in genomics and variomics.</font><br />
+
Church, G. M. (2005). The personal genome project. Mol Syst Biol 1:2005 0030.<br />
 +
Gupta, P. K. (2008). Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol 26:602-11.<br />
 +
IHGSC. (2004). Finishing the euchromatic sequence of the human genome. Nature 431:931-45.<br />
 +
Kim, T.-M., S.-H. Yim, and Y. Chung. (2008). Copy Number Variations in the Human Genome: Potential Source for Individual Diversity and Disease Association Studies. Genomics &amp; Informatics 6(1):1-7.<br />
 +
Levy, S., G. Sutton, P. C. Ng, L. Feuk, A. L. Halpern, B. P. Walenz, N. Axelrod, J. Huang, E. F. Kirkness, G. Denisov, Y. Lin, J. R. MacDonald, A. W. Pang, M. Shago, T. B. Stockwell, A. Tsiamouri, V. Bafna, V. Bansal, S. A. Kravitz, D. A. Busam, K. Y. Beeson, T. C. McIntosh, K. A. Remington, J. F. Abril, J. Gill, J. Borman, Y. H. Rogers, M. E. Frazier, S. W. Scherer, R. L. Strausberg, and J. C. Venter. (2007). The diploid genome sequence of an individual human. PLoS Biol 5:e254.<br />
 +
Mardis, E. R. (2008). The impact of next-generation sequencing technology on genetics. Trends Genet 24:133-41.<br />
 +
Metzker, M. L. (2005). Emerging technologies in DNA sequencing. Genome Res 15:1767-76.<br />
 +
Park, H., K. J-H., S.-I. Cho, J. Sung, H.-L. Kim, Y. S. Ju, G. Bayasgalan, M.-K. Lee, and J.-S. Seo. (2008). Genome-wide Linkage Study for Plasma HDL Cholesterol Level in an Isolated Population of Mongolia. Genomics &amp; Informatics 6(1):8-13.<br />
 +
Porreca, G. J., J. Shendure, and G. M. Church. (2006). Polony DNA sequencing. Curr Protoc Mol Biol Chapter 7:Unit 7 8.<br />
 +
Ring, H. Z., P. Y. Kwok, and R. G. Cotton. (2006). Human Variome Project: an international collaboration to catalogue human genetic variation. Pharmacogenomics 7:969-72.<br />
 +
Sanger, F., G. M. Air, B. G. Barrell, N. L. Brown, A. R. Coulson, C. A. Fiddes, C. A. Hutchison, P. M. Slocombe, and M. Smith. (1977). Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265:687-95.<br />
 +
Shendure, J., R. D. Mitra, C. Varma, and G. M. Church. (2004). Advanced sequencing technologies: methods and goals. Nat Rev Genet 5:335-44.<br />
 +
Sung, J., M. K. Lee, and J.-S. Seo. (2008). Inbreeding Coefficients in Two Isolated Mongolian Populations - GENDISCAN Study. Genomics &amp; Informatics 6(1).<br />
 +
Wheeler, D. A., M. Srinivasan, M. Egholm, Y. Shen, L. Chen, A. McGuire, W. He, Y. J. Chen, V. Makhijani, G. T. Roth, X. Gomes, K. Tartaro, F. Niazi, C. L. Turcotte, G. P. Irzyk, J. R. Lupski, C. Chinault, X. Z. Song, Y. Liu, Y. Yuan, L. Nazareth, X. Qin, D. M. Muzny, M. Margulies, G. M. Weinstock, R. A. Gibbs, and J. M. Rothberg. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature 452:872-6.<br />
 
<br />
 
<br />
</div>
+
</span><strong><span style="font-size: 9pt;"><br />
<div><font size="2"><strong>References</strong></font></div>
+
</span></strong></div>
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">IHGSC (2004). Finishing the euchromatic sequence of the human genome. <em>Nature</em> <strong>431</strong>(7011), 931-45.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Anderson, S., A. T. Bankier, et al. (1981). Sequence and organization of the human mitochondrial genome. <em>Nature</em> <strong>290</strong>(5806), 457-65.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Chan, E. Y. (2005). Advances in sequencing technology. <em>Mutat Res</em> <strong>573</strong>(1-2), 13-40.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Church, G. M. (2005). The personal genome project. <em>Mol Syst Biol</em> <strong>1,</strong> 2005.0030.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Gupta, P. K. (2008). Single-molecule DNA sequencing technologies for future genomics research. <em>Trends Biotechnol</em> <strong>26</strong>(11), 602-11.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Levy, S., G. Sutton, et al. (2007). The diploid genome sequence of an individual human. <em>PLoS Biol</em> <strong>5</strong>(10), e254.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Mardis, E. R. (2008). The impact of next-generation sequencing technology on genetics. <em>Trends Genet</em> <strong>24</strong>(3), 133-41.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Metzker, M. L. (2005). Emerging technologies in DNA sequencing. <em>Genome Res</em> <strong>15</strong>(12), 1767-76.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Porreca, G. J., J. Shendure, et al. (2006). Polony DNA sequencing. <em>Curr Protoc Mol Biol</em> <strong>Chapter 7</strong>: Unit 7 8.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Ring, H. Z., P. Y. Kwok, et al. (2006). Human Variome Project: an international collaboration to catalogue human genetic variation. <em>Pharmacogenomics</em> <strong>7</strong>(7), 969-72.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Sanger, F., G. M. Air, et al. (1977). Nucleotide sequence of bacteriophage phi X174 DNA. <em>Nature</em> <strong>265</strong>(5596), 687-95.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Shendure, J., R. D. Mitra, et al. (2004). Advanced sequencing technologies: methods and goals. <u>Nat</u> <em>Rev Genet</em> <strong>5</strong>(5), 335-44.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt"><font size="2">Wheeler, D. A., M. Srinivasan, et al. (2008). The complete genome of an individual by massively parallel DNA sequencing. <em>Nature</em> <strong>452</strong>(7189), 872-6.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt">
 
<font color="#ff6600">Park, H. et al. (2008). Genome-wide Linkage Study for Plasma HDL Cholesterol Level in an Isolated Population of Mongolia. Genomics &amp; Informatics 6(1): 8-13.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt">
 
<font color="#ff6600">
 
Sung, J. et al. (2008). &quot;Inbreeding Coefficients in Two Isolated Mongolian Populations - GENDISCAN Study.' Genomics &amp; Informatics 6(1): 14-17.</font></div>
 
<div style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -36pt">
 
<font color="#ff6600">Kim, T-M. et al. (2008). &quot;Copy Number Variations in the Human Genome: Potential Source for Individual Diversity and Disease Association Studies.&quot; 6(1): 1-7.</font></div>
 

Latest revision as of 05:16, 19 December 2008

 

Personal genomics, bioinformatics, and variomics 


Jong Bhak1, Ho Ghang1, Rohit Reja1, and Sangsoo Kim2*

1KOBIC (Korean Bioinformation Center), KRIBB, Daejeon 305-806, Korea. 2Dept. of Bioinformatics, Soongsil Univ., Seoul 156-743, Korea.

  • Correspondence to: E-mail  sskimb@ssu.ac.kr Tel +82-2-820-0457 Fax +82-2-824-4383


Running title: Genomics revolution achieved by cheap sequencing for common people


Abstract
In 2008 at least five complete genome sequences are available. It is known that there are over 15,000,000 genetic variants, called SNPs, in the dbSNP database. The cost of full genome sequencing in 2009 is claimed to be less than $5000 USD. The genomics era has arrived in 2008. This review introduces technologies, bioinformatics, genomics visions, and variomics projects. Variomics is the study of the total genetic variation in an individual and populations. Research on genetic variation is the most valuable among many genomics research branches. Genomics and variomics projects will change biology and the society so dramatically that biology will become an everyday technology like personal computers and the internet. 'BioRevolution' is the term that can adequately describe this change.
 
Introduction
Since the launch of the Human Genome Project (HGP) in 1990 by NIH of USA, researchers have been developing faster DNA sequencers (Chan, 2005; Gupta, 2008; Mardis, 2008; Metzker, 2005; Shendure et al., 2004). HGP has been said to be led by James Watson who modeled DNA in Cambridge, UK in 1953. In 2003, the International Human Genome Sequencing Consortium held a press conference to announce the completion of the human genome (IHGSC, 2004). In 2008, after 55 years, Watson's complete genome sequence was publicized by using 454 DNA sequencers developed by a company rather than a research institute (Wheeler et al., 2008). In 2007, Craig Venter, a former Celera founder, published his own personal genome in PLoS Biology (Levy et al., 2007). We are entering the personalized biology era with the advent of next generation sequencing technologies.

DNA sequencing
The first breakthrough in genome sequencing came from Watson's colleague, Fred Sanger, in Cambridge, UK. In 1977, Sanger and his team produced the first useful DNA sequencing method and publicized the first complete genome (Sanger et al., 1977). It was a tiny virus genome known as phi X 174. Soon after phi X 174, he published the first complete organelle genome which was a mitochondrion (Anderson et al., 1981). By 1998, researchers in the US evaluated multiplex genome sequencing technologies and were aware that one person's whole genome could be sequenced in a day using contemporary technologies. George Church was a Ph.D. student of Walter Gilbert who received a Nobel Prize with Sanger for developing a sequencing method. Gilbert's method was not widely used. However, his colleague Church continued to develop sequencing methods. One of them is based on the Polony idea (Porreca et al., 2006). This technology is used by KNOME Inc., a full genome sequencing company. Along with KNOME, other companies, such as Complete Genomics, are now producing DNA sequences cheaply and in an unprecedented capacity. The speed of sequencing is advancing many folds per year, much faster than the cycle of semiconductor chips in computer industries. Also, genome sequencing technology is becoming an everyday technology at the level as computer CPUs are universally used. In five years' time, experts predict that everyone in developed nations will be able to have his or her own genome information. Due to its far reaching consequences in medicine, health, biology, nanotechnology, and information technology, DNA sequencing will become the most important industrial technology ever developed during the next decades.  

Personal Genomics
In 2009, genome sequencing technologies will achieve one person's whole genome per day in terms of DNA fragments sequenced. Personal genomics is a new term that utilizes such fast sequencers. In 2008, the cost for one personal genome is less than $350,000 USD. If the cost goes down below $1,000 USD, the impact of personal genomics is predicted to be the largest ever in biology in common people's lives. Reflecting this technological advancement to society is the PGP (Personal Genome Project), a project to sequence as many people as possible with lowest possible cost (Church, 2005). At present, Google, Inc. and the Church group are working together to sequence 100,000 people's genetic regions of DNA. In Saudi Arabia, the government is planning to sequence 100 Arabic people's genome. In Europe, there are various groups of people and nations who have been genotyping those populations. Iceland has been especially successful in that effort by utilizing their well-kept genealogical data encompassing hundreds of thousands of people. In Asia, Jeongsun Seo of Seoul National University has been working on the East Asia Genome Project during the past several years. His group has collected thousands of samples from Mongolian tribes with a extremely large genealogical tree among them (Park et al., 2008; Sung et al., 2008). Seo is said to be sequencing at least 100 Korean genomes in collaboration with Church and Green Cross, Inc. of Korea. The aim of Seo's genome project is to produce a resource for East Asians. He is presently sequencing at least two Korean people. In China, Beijing Genome Institute has been successful in terms of sequencing. Their first achievement came from a plant genome, rice. After rice, they launched a 100 Han Chinese genome sequencing project. In Nov. 2008, they published their first Chinese genome in a journal, Nature. In Dec. 2008, another Korean group, Lee Gilyeo Cancer and Diabetes Institute (LCDI) and Korean Bioinformation Center (KOBIC) made a Korean genome sequence public. The genome was sequenced by Solexa paired-end sequencer, and comparative genomics analyses and SNP data were uploaded as a public resource. It took only one week to analyze the 7.8x Korean genome using 150 computer CPUs to produce mapping DNA fragments to a reference genome, generate new SNP information, compare that with other individual genomes, and map it with 1600 already known phenotype information from the public literature.
 
Genome Revolution
These public genome data alongside previously known Craig Venter's and James Watson's mark that full genome sequences are not soley in academic domain anymore. Anyone who has money and the will can sequence human genomes. This 'genomic revolution' will eventually lead to the 'BioRevolution' in terms of making the most essential human information completely mapped and publically available. This is revolutionary, because humans can now engineer themselves with a map or a blue print not directly relying on trial and error style conventional evolutionary methods. This indicates that evolution has moved to a conscious level driving evolution. We are in effect designing evolution using computers.  

Genomes and Personalized Medicine
The consequences of 'BioRevolution' where genomic information is utilized by scientists to engineers all kinds of biological processes, including evolution itself, will bring us personalized medicine. The essence of personalized medicine is that enzymes in our tissues, such as cytochrome P450, have distinct differences among individuals and populations. Certain drugs produce different responses in individuals.

Cytochrome p450 family example
The cytochrome P450 (CYP) family of liver enzymes is responsible for breaking down more than 30 different classes of drugs during Phase I of drug metabolism. Structural and SNP variations of the genes that code for these enzymes can influence their ability to metabolize certain drugs. Based upon this, a population can be categorized into four major types of drug metabolizers:
"    Extensive metabolizers: Individuals that can be administered with normal drug dosage
"    Intermediate metabolizers: Individuals that metabolize drugs with a  slower than normal rate.
"    Poor metabolizers: Individuals with poor metabolizing rates. Drugs may accumulate and cause serious adverse effects.
"    Ultra metabolizers: Individuals with metabolizing rates even faster than extensive metabolizers. They may experience no effect of drug activity.

In early 2005, the US FDA cleared the AmpliChip® CYP450 Test, which measures variations in two genes of the CYP450 enzyme system: CYP2D6 and CYP2C19. The Roche AmpliChip CYP450 Test is intended to identify a patient's CYP2D6 and CYP2C19 genotype from genomic DNA extracted from a whole blood sample. Information about CYP2D6 and CYP2C19 genotype may be used as an aid to clinicians in determining therapeutic strategy and treatment dose for therapeutics that are metabolized by the CYP2D6 or CYP2C19 gene product.
 
Variomics
The most important scientific data out of personal genomes are the precise sequence differences among individuals. Such differences have many types. There are structural differences among chromosomes. There can be insertions and deletions of DNA segments. There are certain fragments that appear as repeats in genomes. Mapping all these structural genetic variations can be briefly termed 'variomics'. A variome is the totality of genetic variation found in an individual, a population, and a species. Among all the variations we know, the most common is the single nucleotide polymorphisms (SNP). In Korea, mapping the variome has been pursued relatively early, and there are several groups who are mapping the genetic variations. KOBIC has several very early stage, if not the earliest in the world, variome servers: http://variome.net and http://variomics.net. Along with SNP variation, the copy number variation (CNV) is also important. Some recent reports tell us that CNVs can be as variable as or even more variable than SNPs that are simple DNA base changes in populations. Yeun-Jun Chung of the Catholic University of Korea has been mapping CNVs among Korean people (Kim et al., 2008).

Human Variome Project (HVP)
As an international collaboration, headed by Richard Cotton, HVP was launched in 2006 (http://humanvariomeproject.org) (Ring et al., 2006). HVP aims to make clinicians who have been working on rare diseases, to work together with molecular biologists and bioinformaticians. Their goal is to link medical information with genotype information. Succinctly, this process is called genotype to phenotype mapping. As several full human genome sequences are already available, mapping phenotypes to full genomes will be the major challenge of biology in the next 20 years.

Asian Variome Project (AVP)
Alongside and with the associations of eIMBL, A-IMBN, and HVP, a variome project that is working to map the Asian population variome was launched in 2008. This was a group effort by Korean researchers who have been interested in genome sequences, SNPs, and CNVs. They have formed the KOrean VAriome Consortium (KOVAC: http://variome.kr) and support AVP as one of the first projects. eIMBL, the virtual laboratory network of Asia linking key biology groups modeled after EMBL, has acquired $80,000 USD in 2008 to support AVP. eIMBL aims to establish a virtual bioinformatics center in the Asia Pacific region that will link many bioinformation processing scientists in Asia.

Construction of Reference Genomes for the world
Sanger Center, EBI, NCBI, and the University of Washington Genome Center have formed a consortium to produce a reference genome (http://referencegenome.org). A reference standard is the most important standard among all the standards. Providing an accurate reference genome to biologists is an important task. The first reference genome by the above consortium is based on Caucasian genomes. Due to the extent of SNPs and CNVs, it is necessary to construct reference genomes for diverse ethnic groups. In Korea, since 2006, the reference standard genome project began and produced the first draft for Koreans in November, 2008, using a male donor. Through the bioinformatic analysis, the Korean researchers in LCDI and KOBIC found that there was a good justification for any nation to launch large scale genome projects to map population diversities. Even such close populations as Korean and the Chinese showed a large quantity of SNP differences.

Bioinformatics for Personal Genomes and Variomes
Bioinformatics is the key in personal genome projects and variome projects. Bioinformatics is not merely a set of tools but a scientific discipline. It regards life as a gigantic information processing phenomenon and works to map its components and to model the emerging networks of the components. Bioinformatics in 2008 is driving biology into an information science. Most biology research projects produce massive amounts of data that cannot be processed by hand. Nearly all biological research outcomes in the next five years will have some form of high throughput data such as genome sequences, microarray data, proteome analyses, SNPs, epigenome chips, and large scale phenotype mapping. Bioinformatics tools in genomics and variomics can be found from various internet resources. There are several bioinformatics hubs such as NCBI (National Center for Biotechnology Information), EBI (European Bioinformatics Institute), DDBJ (Databank of Japan), and KOBIC. Some others are: Bioinformatics Organization (http://Bioinformatics.Org), EMBnet (http://www.embnet.org/), and The International Society for Computational Biology (http://iscb.org).

The following are major bioinformatics journals:


Sequencing DNA, Metagenomics, and Ecogenomics
Next generation sequencing methods will not only map genomes. They will be used to map the environment. This is called ecogenomics. To humans the environment can mean various microbial, plant, and animal interactions around us. Microbial interaction is especially critical to our health. Gut bacteria are a natural environment within us. Metagenomics is a methodology that sequences the whole set of microbes in our food tract. Researchers are realizing that the human genome is complemented by such environmental genomes. A new term, 'ecogenomics' is now used to describe these concepts. Metagenomics and ecogenomics are for mapping the variations of environmental genetic factors.

Mapping Expression using DNA sequencing
DNA sequencing technologies were mostly used for mapping genotypes. However, they are now used to map RNA expression levels in cells. Cells produce various types of RNA. mRNA is the most abundant and important. In the past, microarray and DNA chips were used to measure expression levels. They are not accurate and take many bioinformatic adjustments before producing reliable expression data. New sequencing technologies can measure expression levels much more accurately. By sequencing the RNAs, we can now quantify the expression levels by precisely knowing the RNA sequences. Sequencing technologies will restructure the expression analyses in the future.

Linking Genome information On-line
Sequencing a genome is basically the production of data, whereas analyzing the whole genome takes human minds networking their hypotheses, proofs, and discoveries, i.e. genomics is a scientific endeavor beyond mechanical sequencing. Therefore, a worldwide effort is required to link all the genome information for proper management and utilization. The internet is the best infrastructure for genome information exchange. Bioinformatics resources should be available as freely as possible for all nations, including those underdeveloped and developing. Genome sequencing and associated analyses should be done freely in certain instances by the support of local governments and international organizations. For maximum efficiency, an adequate data and information license should also be required. Some researchers propose an openfree sharing of bioinformatics analysis tools, as well as the genome sequences (under proper permission). One such movement is Free Genomics (http://freegenomics.org).

The following are on-line genomics sites.


Conclusion
We have examined the current trends in genomics and variomics. In 2009 and onwards, personal genome projects will produce an unprecedented amount of biological data. New bioinformatics technologies will be required to handle them. New sequencing technologies will drive the next decades of biology and transform medical practices. Fast sequencing brought us interesting and unexpected applications such as metagenomics and ecogenomics.

Acknowledgements
SK was supported by Soongsil University Research Fund. JB, GH, and RR were supported by KRIBB/KOBIC fund from the MEST of Korea. The authors thank Maryana Bhak for editing the manuscript.

References
Anderson, S., A. T. Bankier, B. G. Barrell, M. H. de Bruijn, A. R. Coulson, J. Drouin, I. C. Eperon, D. P. Nierlich, B. A. Roe, F. Sanger, P. H. Schreier, A. J. Smith, R. Staden, and I. G. Young. (1981). Sequence and organization of the human mitochondrial genome. Nature 290:457-65.
Chan, E. Y. (2005). Advances in sequencing technology. Mutat Res 573:13-40.
Church, G. M. (2005). The personal genome project. Mol Syst Biol 1:2005 0030.
Gupta, P. K. (2008). Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol 26:602-11.
IHGSC. (2004). Finishing the euchromatic sequence of the human genome. Nature 431:931-45.
Kim, T.-M., S.-H. Yim, and Y. Chung. (2008). Copy Number Variations in the Human Genome: Potential Source for Individual Diversity and Disease Association Studies. Genomics & Informatics 6(1):1-7.
Levy, S., G. Sutton, P. C. Ng, L. Feuk, A. L. Halpern, B. P. Walenz, N. Axelrod, J. Huang, E. F. Kirkness, G. Denisov, Y. Lin, J. R. MacDonald, A. W. Pang, M. Shago, T. B. Stockwell, A. Tsiamouri, V. Bafna, V. Bansal, S. A. Kravitz, D. A. Busam, K. Y. Beeson, T. C. McIntosh, K. A. Remington, J. F. Abril, J. Gill, J. Borman, Y. H. Rogers, M. E. Frazier, S. W. Scherer, R. L. Strausberg, and J. C. Venter. (2007). The diploid genome sequence of an individual human. PLoS Biol 5:e254.
Mardis, E. R. (2008). The impact of next-generation sequencing technology on genetics. Trends Genet 24:133-41.
Metzker, M. L. (2005). Emerging technologies in DNA sequencing. Genome Res 15:1767-76.
Park, H., K. J-H., S.-I. Cho, J. Sung, H.-L. Kim, Y. S. Ju, G. Bayasgalan, M.-K. Lee, and J.-S. Seo. (2008). Genome-wide Linkage Study for Plasma HDL Cholesterol Level in an Isolated Population of Mongolia. Genomics & Informatics 6(1):8-13.
Porreca, G. J., J. Shendure, and G. M. Church. (2006). Polony DNA sequencing. Curr Protoc Mol Biol Chapter 7:Unit 7 8.
Ring, H. Z., P. Y. Kwok, and R. G. Cotton. (2006). Human Variome Project: an international collaboration to catalogue human genetic variation. Pharmacogenomics 7:969-72.
Sanger, F., G. M. Air, B. G. Barrell, N. L. Brown, A. R. Coulson, C. A. Fiddes, C. A. Hutchison, P. M. Slocombe, and M. Smith. (1977). Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265:687-95.
Shendure, J., R. D. Mitra, C. Varma, and G. M. Church. (2004). Advanced sequencing technologies: methods and goals. Nat Rev Genet 5:335-44.
Sung, J., M. K. Lee, and J.-S. Seo. (2008). Inbreeding Coefficients in Two Isolated Mongolian Populations - GENDISCAN Study. Genomics & Informatics 6(1).
Wheeler, D. A., M. Srinivasan, M. Egholm, Y. Shen, L. Chen, A. McGuire, W. He, Y. J. Chen, V. Makhijani, G. T. Roth, X. Gomes, K. Tartaro, F. Niazi, C. L. Turcotte, G. P. Irzyk, J. R. Lupski, C. Chinault, X. Z. Song, Y. Liu, Y. Yuan, L. Nazareth, X. Qin, D. M. Muzny, M. Margulies, G. M. Weinstock, R. A. Gibbs, and J. M. Rothberg. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature 452:872-6.