Self Evaluation Paper

From Biolecture.org

Self-Evaluation on “Genomics” course

KyungHyun Cho(20141581)

 

1. Abstract

 In this paper, I would like to evaluate myself(Self-Evaluation) participating in the “Genomic” class for a whole semester. Self-Evaluation is a unique feature of genomics class, and in fact there was no exam at all, instead, “Self-studying” was encouraged.

 Unlike traditional “Teaching-Learning” system, which is a mono-directional lesson (only Lecturer to students), “Self-studying” is a study method in which it is emphasized to raise the questions and study individually. In other words, “Self-studying” can be defined as the process of constantly asking questions such as "Why is this happening?" Or simply "What is this?" and trying to find answer with various references. This paper will be comprised with not only explanation of the questions I had had about Genomics but also the process that I had tried to find answers. Furthermore, this paper will suggest my grade as A+ for “Genomics” with reasonable data.

 

2. Introduction

Genomic is a scientific field focusing on genomes. It studies about structure, function, evolution, mapping, and editing of genomes. Genome is a set of DNA(Deoxyribonucleic Acid) which is a molecule carrying genetic information used in the growth, development, functioning and reproduction of all known living organisms. In fact, like other scientific field, it is very important to find interesting topics on genomics and to create ideas to solve them. Basically, various topics such as "Epigenomics" and "Aging and genomics" were introduced in “Genomics” class. Based on that, some interesting questions were raised from me on several topics of Genomics.

Cetaceans have evolved KEGG pathway to resist reactive oxygen species (ROS) which are generated under hypoxic conditions1. Glutathione, which is well-known antioxidant to prevent damage by ROS, is produced through this pathway. Cetaceans are often positioned under hypoxic conditions after diving into the sea so that antioxidant molecules are core to resist ROS. In detail KEGG pathway, I suggested that this pathway is ATP consuming process2. In that point, the question was raised “is the evolved KEGG pathway in cetaceans really effective to produce antioxidant even if the pathway consumed core energy molecule ATP?”. To get answer, not only basic knowledges of ROS and glutathione but also consideration of differences between cetaceans and other species in terms of genomic information were required.

The development of sequencing methods accelerated since knowledge of DNA sequence have been more and more important in Genomics3,4. As a result, the DNA sequence can be analyzed more precisely and more accurately than before. Eventually, alignment has also been key method for Genomics. Although a wide variety of alignment methods have already been constructed5,6, development of my own alignment methods is important, because creativity and thinking ability would be improved during the programming my own code. Finally, I created code using MATLAB.

The Genomic course has its special grading way that is Self-evaluation. It has been done with several reasonable data: Attendance (10%), Self-studying (70%), Scientific-paper (20%).

 

3. Principles and Methods

 3.1 Resistance to ROS in Cetaceans

3.1.a ROS

 ROS are chemically reactive chemical species containing oxygen such as peroxides, superoxide, hydroxyl radical, singlet oxygen and alpha-oxygen (Figure 1) 7. The hydroxyl radical is extremely reactive and immediately removes electrons from any molecule in its path, turning that molecule into a free radical and thus propagating a chain reaction. However, hydrogen peroxide is more damaging to DNA than the hydroxyl radical, since the lower reactivity of hydrogen peroxide provides enough time for the molecule to travel into the nucleus of the cell, subsequently reacting with macromolecules such as DNA.

Figure 1. Free Radical Toxicity

 

3.1.b Antioxidant: Glutathione

Glutathione is a well-known antioxidant molecule (Figure 2) 8. It is a peptide containing cysteine. Glutathione has antioxidant properties since the thiol group in its cysteine moiety is a reducing agent and can be reversibly oxidized and reduced. In cells, glutathione is maintained in the reduced form by the enzyme glutathione reductase and in turn reduces other metabolites and enzyme systems, such as ascorbate in the glutathione-ascorbate cycle, glutathione peroxidases and reacting directly with oxidants.

Figure 2. Glutathione(GSH). (a) Chemical structure of GSH. (b) Mechanism of GSH as antioxidant.

 

 3.2 DNA Sequence Alignment

3.2.a DNA Sequencing Method: Sanger

 Before focusing on sanger method, how do nucleotides attach to exiting DNA template strand? The answer is the DNA polymerase catalyzes dehydration reaction between DNA strand`s 3’-OH and dNTP’s 5`-H, making phosphodiester bond. Sanger, however, found that ddNTPs prevents further elongation of other nucleotides because ddNTPs have a proton instead of hydroxy group on their 3’ carbon position. Eventually they lead to end of polymerization (Figure 3).

Figure 3. Comparing the chemical structure of dNTP and ddNTP

Basically, prepare reaction mixture with primers, DNA template, DNA polymerase, dNTPs for elongation and ddNTPs with flourochromes for termination. Then, primer will be elongated and the chain will be terminated. Do the gel electrophoresis to separate DNA fragments and analyze by computer with laser detection of flourochromes(Figure 4).

On the other hands, Sanger method has limitation such as its high cost (around $2,400 per 1M bps) and poor read length (up to hundreds bps per run). Therefore, many other methods for sequencing have been developed.

 

Figure 4. Overall Sanger method

 

3.2.b DNA Sequencing Method: NGS

 There are several more methods for sequencing such as Shotgun, SMRT (Single Molecule, Real Time). Shotgun method was designed for analysis of DNA sequences longer than 1000 base pairs, up to and including entire chromosomes. This method requires the target DNA to be broken into random fragments. After sequencing individual fragments, the sequences can be reassembled on the basis of their overlapping regions (Figure 5).

SMRT sequencing is based on the sequencing by synthesis approach. The DNA is synthesized in zero-mode wave-guides (ZMWs) – small well-like containers with the capturing tools located at the bottom of the well. The sequencing is performed with use of unmodified polymerase (attached to the ZMW bottom) and fluorescently labelled nucleotides flowing freely in the solution. The wells are constructed in a way that only the fluorescence occurring by the bottom of the well is detected. The fluorescent label is detached from the nucleotide upon its incorporation into the DNA strand, leaving an unmodified DNA strand (Figure 6) 9.

 

Figure 5. Overall Shotgun method

 

Figure 6. SMRT sequencing method. A. SMRTbell (gray) diffuses into a ZMW, and the adaptor binds to a polymerase immobilized at the bottom. B. Each of the four nucleotides is labeled with a different fluorescent dye (indicated in red, yellow, green, and blue, respectively for G, C, T, and A) so that they have distinct emission spectrums. As a nucleotide is held in the detection volume by the polymerase, a light pulse is produced that identifies the base. (1) A fluorescently-labeled nucleotide associates with the template in the active site of the polymerase. (2) The fluorescence output of the color corresponding to the incorporated base (yellow for base C as an example here) is elevated. (3) The dye-linker-pyrophosphate product is cleaved from the nucleotide and diffuses out of the ZMW, ending the fluorescence pulse. (4) The polymerase translocates to the next position. (5) The next nucleotide associates with the template in the active site of the polymerase, initiating the next fluorescence pulse, which corresponds to base A here.

 

3.2.c Alignment method

When we do the research about genome, the most important thing is sequencing their genome. From the sequence, many information such as their function, structure, or even evolution is included. In other words, the alignment of sequences is important as well. From comparison of sequences with other species, we can analyze what difference in sequence makes variety of phenotypes. If all sequences were short and similar, people can do alignment without any program or computer. However, they are extremely long and various, so advanced alignment methods are required.

 

3.3 Self-Evaluation

Self-evaluation has done with several data: Attendance, Self-studying and Scientific- paper (Table 1). The grading range is: A+(95-100), A0(90-94.9), A-(85-89.9), B+(80-84.9), B0(75-79.9), B-(70-74.9), C+(65-69.9), C0(60-64.9), C-(55-59.9), D+(50-54.9), D0(45-49.9), D-(40-44.9) and F (below 40).

 Table 1. Self-evaluation data table

Data for evaluation

Detail explanation

Attendance (10%)

Class time is good time to get interested in certain topics more so that it can encourage self-studying.

- If absence less than 4 times, full credit will be given.

- If absence more than 4 times, no credit will be given.

Self-studying

(total 70%)

Raise question (20%)

Raising an interesting question is important in self-studying process: Creativity and Thinking skill are required.

- If raising an interesting question, full credit will be given.

Research (20%)

After raising question, do the research to answer the question: Suggest good idea and Research (paper or experiments).

- If idea is appropriate and critical, 10% credit will be given.

- If proper and critical research has been done, 10% credit will be given.

Result (15%)

Evaluate the results that come from research.

- If result include reasonable and effective data to answer question, 15% credit will be given.

Discussion (15%)

Discussion and critical thinking is important part.

- If discussion has been done well, 15% credit will be given.

Scientific Paper (20%)

Write a scientific paper about self-studying and self-evaluation:

how question has been raised, what idea has been used, what does result indicate, how self-evaluation has been done?

- If a good scientific paper (detail and informative) has been written, 20% credit will be given.

4. Result

4.1 Resistance to ROS in cetaceans

4.1.a Evolved KEGG pathway in cetaceans

Glutathione is a well-known antioxidant that prevents damage to important cellular components by ROS. Seven glutathione metabolism pathway genes (GPX2, ODC1, GSR, GGT6, GGT7, GCLC and ANPEP) showed cetacean-specific amino acid changes; these changes were present in the four minke whales, a fin whale, two bottlenose dolphins and a porpoise (Figure 7)1. If expression level of GSR increased, the antioxidant capacity of cells improved 10.

 

 Figure 7. Evolved KEGG pathway in cetaceans

In addition, the gene expression of peroxiredoxin (PRDX) gene family, which eliminates peroxide and other ROS, is also greatly expanded in cetaceans (Figure 8). This fact also implies that resistance to ROS in cetaceans can be much better than other species1.

  Figure 8. Gene expression of PRDX in species

4.1.b Test effects of KEGG pathway

 To do the test effects of genes related to KEGG pathway can resist ROS, researchers measured glutathione levels. Cultured kidney cells from the Atlantic spotted dolphin (Stenella frontalis) showed an increased ratio of reduced glutathione to glutathione disulfide when subjected to hypoxic or oxidative stress (Figure 9)1.

 

Figure 9. Test effects of KEGG pathway: Spotted dolphin kidney Sp1k cells

 

  1. DNA Sequence Alignment

   4.2.a Code in MATLAB and simple simulation

 I actually created my own code for DNA sequence alignment by using MATLAB (Figure 10-18). Firstly, I used “fprintf” to display the information what you have to edit yourself in this code. In other words, you have to edit them to sequences you are interested in. The function ‘nt2int’ changes nucleotides sequence to their corresponding integers: A to 1, C to 2, G to 3, T to 4, Gap to 0. For example, if the reference DNA sequence is ‘CTG’ and sample DNA sequence is ‘ACTG’, ref = [2, 4, 3] and sam = [1, 2, 4, 3] (Figure 10).

 

Figure 10. Type the DNA sequences and read it as integers.

 The next step is explained as below (Figure 11):

  • To do alignment, it is necessary to match the both lengths as equal.
  • Firstly, suppose that sample length is longer than reference length (line 26).
  • Calculate the difference between both lengths (line 28).
  • Create a matrix, ‘nobase’ in which all elements are zero by that length difference (line 29) – 0 means gap (no base).
  • For example, if the reference sequence is ‘CTG’, then the ref_length is 3.
  • If the sample sequence is ‘ACTG’, then the sam_length is 4.
  • In the same example, the length_difference is 1, so create nobase = [0].
  • Then, add it to ref: mod_ref = [2,4,3,0], mod_sam = sam = [1,2,4,3]. Now, size is same!

 Figure 11. Match the sequence lengths as equal

 Then, I rearranged sample sequence to do alignment and calculated quality of alignment (Figure 12):

  • To align the sequences, used “for” twice.
  • First “for”: moving the frame (rearrangement) of the sample gene sequence.
  • Second “for”: doing the alignment and calculating the alignment quality (score) of them.
  • Alignment method: Create the matrix “aligned” consists of 1 and 0 and calculate the quality from “aligned”.
  • If the sequence at the same position is same, then put 1 into the same position of the matrix “aligned”. If different, put 0 at that position.
  • Quality = summation of all components in “aligned” divide by sample length.
  • In the case of previous example, mod_ref = [2,4,3,0], mod_sam = sam = [1,2,4,3], using first “for” to rearrange the mod_sam as move_sam = [1,2,4,3], [3,1,2,4], [4,3,1,2] and [2,4,3,1] for each.
  • Then, create “aligned” as [0,0,0,0], [0,0,0,0], [0,0,0,0] and [1,1,1,0] for each.
  • Qualities are 0, 0, 0 and 0.75 for each.

Figure 12. Rearrangement of sample sequence, Alignment, Calculating quality of alignment

 Then, I set the range of quality and showed a simple example of result in that range (Figure 13, 14):

  • The possible result of alignment will be shown like Figure 14 (only quality >= 0.2).
  • In this case, reference sequence is ‘ATGTTTGGCA’ and sample sequence is ‘ATGTC’.
  • The sequence at odd number of lines represent the reference sequence at each qualities.
  • The sequence at even number of lines represent the sample sequence at each qualities.
  • For example, first line - second line has quality as 0.4, third line - fourth line has quality as 0.2 and the fourth line – fifth line has quality as 0.3(Figure 14).

 

 

Figure 13. Set the range of quality and displaying the result

 

Figure 14. Example of result in the range I set (quality >= 0.2).

 

 

Now, I found the highest quality and showing example of result at highest quality (Figure 15, 16):

  • The most possible result of alignment will be shown like Figure 16 (maybe the most aligned).
  • In this case, reference sequence is ‘ATGTTTGGCA’ and sample sequence is ‘ATGTC’.
  • The sequence at first number line represents the reference sequence at highest quality.
  • The sequence at second line represents the sample sequence at highest quality (Figure 16).

Figure 15. Find the highest quality

 

Figure 16. Example of result at the highest quality (0.4).

 

    4.2.b Comparison with other method - Benchling

 I compared my program with well-known alignment program in Benchling site. The result was similar (Figure 17, 18)11.
 

Figure 17. Alignment ‘ATGTC’ to ‘ATGTTTGGCA’ using Benchling program

 

Figure 18. Alignment ‘ATGTC’ to ‘ATGTTTGGCA’ using my program

 

4.2.c Simulation with longer sequence

 I simulated my program with longer sequence, partial part of ‘lacI’ gene in E.coli as reference gene. Suppose that, for the sample gene, I partially deleted some sequence within ‘lacI’ gene (at 50-66 bp). Benchling showed very accurate alignment result (Figure 19). However, in my method, the highest quality(0.5900) of alignment was wrong, instead, the lower quality (0.3200) of alignment was right   (Figure 20).

Figure 19. Alignment of partial lacI gene in E.coli and its mutation using Benchling program

Figure 20. Alignment of partial lacI gene in E.coli and its mutation using my program

 

 4.3 Self-Evaluation

4.3.a Attendance (10%)

 According to UNIST attendance system, I absent for three times for Genomic class (Figure 21) 12. Week 14, 15 and 16 are neglected due to cancelation of class. I absent less than four times, so I got full credit for attendance.

Figure 21. Attendance system for Genomics class

4.3.b Self-studying – Raise question (20%)

 I raised two critical questions for self-studying: Resistance to ROS in cetaceans and DNA Sequencing Alignment. For the first topic, I could look up the ROS and antioxidant that I had not known before. Also, it gave me opportunity to think about metabolic pathway. In addition, I could study about how the difference in genomics between cetaceans and other species make differ in the way of resistance to ROS.

With the second topic, I could learn several sequencing method and why alignment is important. Moreover, it was good opportunity to learn MATLAB programming as well.

Basically, two questions are critical, interesting and useful, so I got full credit.

 

4.3.c Self-studying – Research (20%)

I designed the research for both topics, and I could improve the critical thinking and organization skills. Eventually, the ability to search for papers and analyze data were greatly improved during the paper research. Moreover, with a lot of trial with creating code, the creativity and coding skill was improved.

The idea for topics were appropriate and critical (already mentioned in principle and method part). The proper and critical paper research and coding in MATLAB has been done well, so full credit has been given.

 

4.3.d Self-studying – Result (15%)

 The reasonable and effective data have been brought to answer for both topics. The data were clear and credible by introducing references. In addition, clear and detail explanation of data have been given.

 In detail, using the KEGG pathway and test effect of KEGG related gene, it can roughly answer to the question, “is it really effective way to produce antioxidant via KEGG pathway?” However, there was no data related to ATP molecule at all to discuss whether the KEGG pathway is effective even if it consumed ATP. For the second topic, using alignment code I created, alignment of DNA sequences has been successfully done. Therefore, I 13% of full credit for result part.

 

4.3.e Self-studying – Discussion (15%)

 Using result I got, reasonable, clear and detail analysis has been done well. In particular, limitation or insufficient part of result has been discussed through comparison with other data.

 In detail, for the first topic, if the evolved KEGG pathway in cetaceans really effective to produce antioxidant has been discussed well. The discussion of whether this pathway, however, is as efficient as using ATP molecule has not been successful. On the other hands, the DNA sequence alignment has been discussed well. As the result, 13% of full credit has been given for this part.

 

4.3.f Scientific-paper (20%)

 Good scientific paper ought to be well organized, having interesting story or idea and using scientific data or evidence. The scientific paper of mine showed how question has been raised, what idea has been used, what does result indicate, how self-evaluation has been done very well. It is well organized, and gave interesting question and proper data with scientific and critical idea so that full credit has been given.

 

 

5. Discussion

 5.1 Resistance to ROS in cetaceans

According to the results, the effect of the KEGG pathway in cetaceans might be good. In fact, ROS is extremely toxic compound and cetaceans must live in the sea. Considering that fact, under the hypoxic condition, it can be important to change in genes and amino acids involved in the KEGG pathway during evolution. Finally, even though the overall pathway consumes two ATP molecules in a cycle, the role of the KEGG pathway is also important in cetaceans.

 

 5.2 DNA Sequence Alignment

I created the DNA sequence alignment using MATLAB. With simple and short sequences, my program works well compared to Benchling alignment system. My program has simpler visualization, but less in detail. For the longer sequence, my program shows less accuracy as well as longer response time for alignment (I tried up to 200bps, it works, but it takes few minutes). Also, as I mentioned previously, the highest quality of alignment can be wrong, instead, the lower quality (0.3200) of alignment can be right. Therefore, setting the appropriate range of quality is important in my method. Because of the lack of accuracy for the really long sequence, it is not yet proper method for whole genome alignment. Therefore, further research to increase the accuracy will be required.

 

  1. Self-Evaluation

According to the result, 96% of full credit in total has been given:

  • 10% in attendance
  • 20% in Self-studying: Raise question
  • 20% in Self-studying: Research
  • 13% in Self-studying: Result
  • 13% in Self-studying: Discussion
  • 20% in Scientifc-paper

   This result (96%) is in the range of A+ (95%-100%) grade. Therefore, I suggest to get A+ grade for Genomics class.

 

 

 

 

6. Reference

 

1. Hyung.S.Y., Yun.S.C., Xuanmin.G., Sung.G.K., Jae.Y.J., Sun.S.C., … Jung.H.L. Minke Whale genome and aquatic adaptation in cetaceans. Nature Genetics 46, 88-92 (2013).

 2. http://mpmp.huji.ac.il/maps/gluth_met.html - Glutathione metabolism

 3. F. SANGER, S. NICKLEN, AND A. R. COULSON. DNA sequencing with chain-terminating inhibitors. Medical Research Council Laboratory of Molecular Biology, Cambridge (1977).

 4. Shawn E. Levy and Richard M. Myers - Advancements in Next-Generation Sequencing (2016)

 5. Shyi.M.C., Chung.H.L., and Shi.J.C. Multiple DNA Sequence Alignment Based on Genetic Algorithms and Divide-and-Conquer Techniques. International Journal of Applied Science and Engineering. 3, 2: 89-100 (2005).

 6. David W. Mount, Bioinformatics – Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press.

7. James P. Kehrer and Lars-Oliver Klotz. Free radicals and related reactive species as mediators of tissue injury and disease: implications for Health. CRITICAL REVIEWS IN TOXICOLOGY (2015).

8. Chad Kerksick and Darryn Willoughby. The Antioxidant Role of Glutathione and N-AcetylCysteine Supplements and Exercise-Induced Oxidative Stress. J Int Soc Sports Nutr. 2(2): 38–44 (2005).

9. Anthony Rhoads and Kin Fai Au. PacBio Sequencing and Its Applications. Genomics, Proteomics & Bioinformatics, Volume 13, Issue 5, Pages 278-289 (2015).

 10. Christine H. Foyer, Nadège Souriau, Sophie Perret, Maud Lelandais, Karl-Josef Kunert, Christophe Pruvost, and Lise Jouanin. Overexpression of Glutathione Reductase but Not Glutathione Synthetase Leads to lncreases in Antioxidant Capacity and Resistance to Photoinhibition in Poplar Trees. Plant Physiol. 109: 1047-1057 (1995).

 11. https://benchling.com/ - Sequence Alignment Program

 12. https://attend.unist.ac.kr – Attendance checking site