Week5 Korean Genome Project

From Biolecture.org

Genomics

Week5 Essay

               Korean Genome Project, KPGP, KOREF

; After reading the paper, Ethnically relevant consensus Korean reference genome towards personal reference genomes

                                                                                                                                    

                                                                                                                 20141497

                                                                                                          Chaeeun Lee

                                                               

             To begin with, the purpose of the Korean Genome Project is sequencing the genomes of every living being in Korea. By achieving this, they can analyze the diversity of Korean’s gene and associated phenotypes. It is meaningful on that it treats the genome of the specific ethnic group, Korean and it may find out some disease associated structural variations (SVs) or historical evidence, such as explaining the historical inflow of foreigners. Also, by doing this, Korean genome reference can be obtained through the large amounts of Korean’s genome data.

             After reading the research paper of Korean Genome Project, Ethnically relevant consensus Korean reference genome towards personal reference genomes, I reminded that my knowledge on Genomics is still insufficient. So, I decided to find some information and write them on my own words for successful reading.    

 

         What is Sanger’s sequencing?

       Sanger’s sequencing is one of the famous sequencing methods in Genomics. It developed earlier but still widely used in short reading. It basically uses the chain termination, so selectively inserts the ddNTP by using polymerase during the DNA replication. After the ddNTP, the sequence cannot be elongated so it stopped. Then there can be a lot of DNA fragment which has various length.  Also, the ddNTPs have already radioactively or fluorescently labeled for detection. Therefore, it can be sequenced after the gel electrophoresis.

 

        What is short reading and long reading?

       The basic difference is, literally, the length of the sequence. The 'short read' usually has some dozens of base pairs. Although short read can be aligned fast by shotgun method, it is hard to process repeats. The 'short read' sequences become longer as the technologies is developed.

       Long reading treats about hundreds times of length of short reading. It can achieve better identification of sequence by reducing the parts which are overlapped. However, it contained a lot of data and complex to accurately sequence it by programming algorithm.  Pacific Biosciences’ PacBio®, a third-generation sequencer company, developed sequencing techniques and advanced analytics which achieved longer reads than other technologies. Finally, both sequences require “de novo assembly”.

 

       What is “de novo assembly”?

       De novo assembly can make a “transcriptome”, which is the set of all messenger RNA molecules, without the reference genome. Therefore, it is useful when studying non-model organisms. Moreover, it is cheaper and easier than building a genome. However, basically, it is done by software program, so I think programming skill and the use of a proper algorithm is important. In case of human sequencing, de novo assembly is good for finding out some variations which cannot be found when we aligned to a reference.

 

        Reference

https://www.biostars.org/p/5660/

http://www.genengnews.com/insight-and-intelligence/the-long-and-the-short-of-dna-sequencing/77899725/

https://en.wikipedia.org/wiki/Sequence_assembly

https://en.wikipedia.org/wiki/De_novo_transcriptome_assembly

http://www.nature.com/nmeth/journal/v9/n4/full/nmeth.1935.html

http://www.genehunters.co.kr/Training/01_3_4.htm

https://en.wikipedia.org/wiki/Transcriptome