2. Genomics in past and future

From Biolecture.org

Genomics in Past and Future

정영광

Abstract

In the past, there are two main events in the history of genomics. First is the fact that the DNA structure(helical structure) was determined by James Watson and Francis Crick in 1953. Second is that the concept of computer(turing machine) is proposed by Alan Turing in 1936.

In the future, it is possible to predict the phenome decided by genome and envirom with better technique and more enhanced AI(artificial intelligence) of computer and the 5th generation sequencing will be developed by using model of helicase.

 

Introduction

In the bioinformatics class, the professor gave some questions about genomics

 

1) Define Genomics your own way after doing research on what genomes are and how we study.

2) What is the origin of genomics?

3) History of genomics?

4) The future of genomics?

5) What is the relationship with other omics?

6) How can we engineer genomes?

 

I`ll discuss question 1,2,3,4 in this paper

 

History of genomics

I organized the main events of the genomics

 

1936 A.M turing - Theoretical computer concept is developed (turing machine).

1952 F. Sanger - Amino acid sequence of insulin is completely determined.

1953 James Watson and Francis Crick - DNA structure is determined.(anti - parallel helical structure)

1957 Francis Crick - The theory of central dogma is proposed.

1961 Marshall Nirenberg - The concept of triple code is developed.

1967 W.M Fitch and E.Margoliash - The concept of phylogentic tree is introduced.

1973 Brookhaven National Laboratory - The concept of protein Database is created.

1974 Vint Cerf Robert Kahn - The concept of internet is developed.

1975 Erwin Southern - Southern hybridization is developed

1980 Frederick Sanger and Walter Gilbert - The first DNA sequencing is developed

1990 US Government - Human Genome Program is started.

2001 International Human genome Sequencing Consortium celera Corp - Human genome sequence is published

 

History of Sequencing

1st

First generation sequencing is Sanger sequencing. Its main feature is chain termination related to ddNTP which has no oxygen on 3` site of sugar, and also uses radioisotope for base and gel electrophoresis to identify arrangement according to size of DNA fragments.

 

2nd

It is called next generation sequencing(NGS) after sanger sequencing. 2nd generation sequencing breaks whole genome of DNA and makes many segments related to the concept of depth by shotgun sequencing. The example of 2nd sequencing is pyrosequencing, illumina sequencing. Because it uses computer for sequencing, it can be massive parallel method and is about 100 times faster than the sanger sequencing.

 

3rd

The main characteristics of 3rd generation sequencing is single molecule sequencing which doesn`t make DNA fragments such as shotgun sequencing. One of the examples of 3rd generation sequencing is Pacific bioscience SMRT(single molecule real time) sequencing which uses single stranded DNA molecule and different fluorescent dyes according to base types.

 

4th

The forth generation sequencing is post light sequencing which is no longer to use optical detection. One of the examples is nanopore sequencing which measures the difference of electricity between outside of membrane and inside of membrane using nanopore.

 

 

The origin of genomics

There are two revolutionary events in the history of genomics. First is the fact that the DNA structure was determined by James Watson and Francis Crick. It leads to cause research of DNA rapidly and made the theory of central dogma which means that DNA has genetic information for protein, and this theory made the theoretical basis of sequencing for genomic analysis.

Second is that the concept of computer(turing machine) is proposed by Alan Turing in 1936. Computer was developed from this concept, and genomic project has lager scale for data analysis and is processed more efficiently because of the computer.

 

Therefore, I think that the origin of genomics is discovery of DNA structure and development of computer.

 

The definition of the genomics

Genome sequencing project is generally composed of 3 parts : sequencing, annotation, comparison and analysis. Sequencing is to find the composition of base about the target sample. Annotation is to know the function of the part of sequence and determine which part is gene or not. Comparison and analysis are the process which compare the target gene with other gene from the previous data and find the similartiy or difference between them.

Genomics is arrangement of books in a library. When we organize the books in a library, we need to read books to distinguish, then we classify them by labeling. Finally, we collect the books which has a similar genre or contents in one area. It is same as genomics in the point that reading is sequencing, labeling is annotation, and collecting is comparison.

 

Therefore, I think that genomics is just arranging books that have biological information in library.

 

The future of genomics

 

Trend

Genomics identify the phenome which is a set of phenotypes from the genome (genetic factors) and the envirome (environmental factors). Because there are complicated genes and environmental factors about one phenotypes, it is difficult to predict the phenome.

However, it will be possible to calculate the data of phenome related to genome and envirome with better technique and more enhanced AI(artificial intelligence) of computer, and suggest a personal alternative plan to prevent or compensate the expression on individuals in the future.

 

5th sequencing

I designed the next version of the 4th generation sequencing. It is 5th generation sequencing. The condition of the 5th should be using the single molecule(the feature of the 3rd), unlike shotgun and post light sequencing(the feature of the 4th) which doesn`t use optical detection, and also more efficient than previous version.

I thought about the helicase. DNA consist of the bases which have hydrogen bondings. Adenine and thymine have double hydrogen bonds, and guanine and cytosine have triple hydrogen bonds. Helicase is the enzyme that breaks the hydrogen bonds between bases in DNA. I considered the nanosensor from the helicase as a role model. Because the A and T have double hydrogen bonds, and C and G have triple hydrogen bonds, it is easier to break the A-T bonds. When the energy to cleave the bond of nanosensor is relatively low, it can predict that it is A-T base pairing, vice versa. From this way, it can determine whether it is A-T or C-G base paring.(first step)

The structure of helicase is composed of hexamer and has empty site which DNA pass through in the center. I also thought about making a electrical field in the center of nanosensor. Because base is classified in pyrimidine and purine according to size(single ring or double ring), the nanosensor can identify whether it is pyrimidine or purine by measuring the difference of electrical charge according to the base size.(second step)

Therefore, we can find the base composition from the first step and second step. This is my design of 5th generation sequencing.

 

Conclusion

As I answer the each question about 1,2,3,4 in one sentence,

Genomics is just arranging books that have biological information in library.

The origin of genomics is discovery of DNA structure and development of computer, and it is main events in the history of genomics.

In the future, phenome related to complicated genome and envirome can be predicted and the 5th generation sequencing will be developed by using model of helicase

 


Youngkwang_Jung_-_bioinformatics