Difference between revisions of "Week5 Korean Genome Project"
imported>Chaeeun Lee (Created page with "<p>Genomics</p> <p>Week5 Essay</p> <p><span style="font-size:20px"> Korean Genome Project, KPGP, KOREF</span></p> <p><spa...") |
(No difference)
|
Revision as of 23:39, 3 October 2016
Genomics
Week5 Essay
Korean Genome Project, KPGP, KOREF
; After reading the paper, Ethnically relevant consensus Korean reference genome towards personal reference genomes
20141497
Chaeeun Lee
To begin with, the purpose of the Korean Genome Project is sequencing the genomes of every living being in Korea. By achieving this, they can analyze the diversity of Korean’s gene and associated phenotypes. It is meaningful on that it treats the genome of the specific ethnic group, Korean and it may find out some disease associated structural variations (SVs) or historical evidence, such as explaining the historical inflow of foreigners. Also, by doing this, Korean genome reference can be obtained through the large amounts of Korean’s genome data.
After reading the research paper of Korean Genome Project, Ethnically relevant consensus Korean reference genome towards personal reference genomes, I reminded that my knowledge on Genomics is still insufficient. So, I decided to find some information and write them on my own words for successful reading.
What is Sanger’s sequencing?
Sanger’s sequencing is one of the famous sequencing methods in Genomics. It developed earlier but still widely used in short reading. It basically uses the chain termination, so selectively inserts the ddNTP by using polymerase during the DNA replication. After the ddNTP, the sequence cannot be elongated so it stopped. Then there can be a lot of DNA fragment which has various length. Also, the ddNTPs have already radioactively or fluorescently labeled for detection. Therefore, it can be sequenced after the gel electrophoresis.
What is short reading and long reading?
The basic difference is, literally, the length of the sequence. The 'short read' usually has some dozens of base pairs. Although short read can be aligned fast by shotgun method, it is hard to process repeats. The 'short read' sequences become longer as the technologies is developed.
Long reading treats about hundreds times of length of short reading. It can achieve better identification of sequence by reducing the parts which are overlapped. However, it contained a lot of data and complex to accurately sequence it by programming algorithm. Pacific Biosciences’ PacBio®, a third-generation sequencer company, developed sequencing techniques and advanced analytics which achieved longer reads than other technologies. Finally, both sequences require “de novo assembly”.
What is “de novo assembly”?
De novo assembly can make a “transcriptome”, which is the set of all messenger RNA molecules, without the reference genome. Therefore, it is useful when studying non-model organisms. Moreover, it is cheaper and easier than building a genome. However, basically, it is done by software program, so I think programming skill and the use of a proper algorithm is important. In case of human sequencing, de novo assembly is good for finding out some variations which cannot be found when we aligned to a reference.
Reference
https://www.biostars.org/p/5660/
https://en.wikipedia.org/wiki/Sequence_assembly
https://en.wikipedia.org/wiki/De_novo_transcriptome_assembly
http://www.nature.com/nmeth/journal/v9/n4/full/nmeth.1935.html
http://www.genehunters.co.kr/Training/01_3_4.htm
https://en.wikipedia.org/wiki/Transcriptome