WCJung Script pt1

From Biolecture.org

Slide 1

I am dealing with the next section about the method of construction of pan genome. I  would divide this part into 3 steps, which are, first, Prepare and sequencing the strain sample, and then assemble and annotation the sequencing data. Finally, Clustering the genes so as to see the distribution of core, accessary and strain specific genes.

 

Slide 2

When you are preparing for bacterial sequenceing how would you pick certain species and strains? For the cases of prokaryotes, obviously there are several model organisms like E. coli you can think of at hand. In turn, we can easily imagine homogeneous colonies from many differnet sub species or strains that exist inside the species of E.coli. You can purify DNA samples from the each coloies.

 

Slide 3

How about this figure? Do you see the difference? Let's imagine a mixture of bacterial organisms of different strains or even species, which could exist in nature or in artificial way. I want to emphasize the posibility of many differenet resolutions in terms of emerging phenotypes of different gene sets in the genome. or we can imagine about the scope of single cell if possible.

 

Slide 4

Next step is what we learned in the class. We use NGS sequencing to get the sequence data of each samples and assemble or map the sequence with the help of already reported databases.

 

Slide 5

Let me say that each A, B and C stands for colonies for distinct DNA samples. We would sequence it

 

Slide 6

And We would get the same number of distinct genomes for the each DNA samples.

 

Slide 7

With the Genome sequence, and then, we annotate every part of the genome with reported information of the gene elements like, CDS, amino acid sequences, expression regulatory elements and etc. As you see the in the bottom of the figure. Finally, now, we can compose a kind of pan genome from this data.

 

Slide 8

The most important step here is how we integrate these genomes into A pan genome?

As it has been mentioned earlier, we can divide a pan genome into core, accessary, and strain or species specific genes. Simply speaking we sort every gene of the genomes into clusters of the pan genome.

 

Slide 9

The meaning of the core genome was a set of genes which is shared by all of different strains in a group of interest. For this reason, we should know that which genes are the same or shared. We look up for the orthologous genes

 

Slide 10

For example, the gene a, the gene a prime, and the gene a double prime are orthologous to each other. We gather these genes into the cluster a. Likewise, we do the same process for each gene b, c, and d. If every single genome has a gene of cluster a, we can call the cluster as core gene cluster. If some of the strains, but not all of them, have a gene from another cluster, we call it as accessary gene cluster. Then what about the case of a single gene of which we cannot find any orthologous gene from every single genome? We keep the gene into a distinct cluster, which we can call it a species or strain specific gene cluster.

 

Slide 11

There are diverse clustering tools and databases for pan genome analysis

like these. You can easily search these on published journals.

However, there are still some similarities between them, regarding the method they use.

 

Slide 12

As I said one thing important is the way how we find each gene is related? I think these are quite universal ways which is used to find homologous genes which can be either paralog or ortholog.

Reciprocal best hit means a kind of result that comes from ordinary mapping algorithm. When you look up a gene from a sample of interest on a different genome of database, that means you search for a gene which scores best with mapping algorithm. When the found gene is looked up on the very genome of sample reversely, the gene of interest can be also the best hit reciprocally. Such cases are called reciprocal best hit.

For the scoring mechanism the nucleotide sequence of CDS or amino acid sequence is used.

For unannotated Proteins, we can refer to protein database for reported proteins of other organisms. COG is a database for such integrated database for othologous gene group.

 

Slide 13

Let carry on to the next section. Discussion. The prior two subject stands for what we can infer from the concept of pan genome and Horizontal gene transfer. For the lst subject we will discuss how it can be merged into kind of dynamics of prokaryotic genome.

 

Slide 14

If you just search the integreted journal web site, you will easily come up with many articles and papers about pan genome. Recently different research have been done about bacterial lifestyle like habitat, behavior, diet and also pathogenicity using pan genome analysis of other distinct species than E. coli. We can distinguish what certain gene make up such phenotypes even within a single species.

 

Slide 15

Horizontal Gene transfer of prokaryotes is relatively better known idea compared to pan genome. This is not the paper but I could find an article reported in 1989 about HGT between strains of S. pneumonia of its pennisilin resistancy related gene.

 

Slide 16

We see it is obvious that the horizontal geen transfer has certain impact on the dynamics of baterial pan genome. We think how much impact it has and how fast it affect are interesting point of further research. In this perspective. We are planning to select an recently reported bacterial research and to suggest some method for further analytic research regarding this point, if possible.