The Practical Use of System Biology: K-12 E.coli Strain
K-12 E.coli strain을 주축으로 한 System biology의 실질적 활용
The Practical Use of System Biology: K-12 E.coli Strain<o:p></o:p>
권희운(Heeun Kwon), 이강석(Kangseok Lee) [UNIST 나노생명화학공학부]<o:p></o:p>
After naming of something from cork as “cell” by Robert Hooke in 1665, biology, a subject researched from ancient, has been evolved in more radical way and reached gene level study though setting up modern research methodology in 17th and 18th century. And as more and more developing of biology and genomics, it needed to invest and understand about structure and dynamics of cell and organism than isolated parts of them. This necessity was known from long time ago because early development of genomics also had systemic way, there were lack of elements for understanding characteristics of system. By existence of better experimental tools and improved software and analytical way recently, progress of biology directs to not only collection of information but also knowing interactions of information and their roles: overall understanding of genes, understanding and research about complicated biosystem, and systemic way by using genetical information to this day[1-3], As differentiation of sequence genomics and functional genomics from genomics, this kind of research direction is related to system biology. Namely, we uses system biology to overcome post-genomic era hitherto[4]. System-level understanding is continuing issue in biological science, and technically making better system performance collecting understandable data and quality of important molecules of other biology, like genome sequencing and high-throughput measurement[5]. Future biology will be changed by spatiotemporal analytic techniques associated with mathematical, computational modeling. Different from usage of genomics and molecular biology, importance of system biology would be higher, with provision of “nothing more or less, obvious statistical way of thinking” by utilizing these statistics and information[6].<o:p></o:p>
Introduction: System informatics&Networks<o:p></o:p>
There are several components of System biology; individual molecules (signal transferring, metabolites network, etc.), assemblies of interacting complexes, assemble of many physical factor inducing organism development (gene, mRNA, relative protein and protein complex), tissue·organ cell, even using all of life forms in ecological community. Approach method of system biology is namely ‘understanding in system level’, and needs to change of perspective that what is the things to see for research process. Keeping importance of understanding of gene and protein due to its base from genomics, system biology is different by focusing on understanding of systemic structure and dynamics.<o:p></o:p>
System is not simple arrangement of genes and protein, such that characteristics of system could not be completely analyzed by making table and graph of correlation between genes and proteins. Rather than table as just static roadmap, it is important to know dynamic process of system, why they occur, and how to control them. To study these side, system biology focus on four different things from existing genomics. It means importance of understanding of structure and purpose of system, not knowing all of genes and proteins. First part is system structure, including gene interrelation network, biochemical pathway, and adjusting mechanism of physical property in cell or multicellular structures. Secondly, system dynamics is a concept of researching how system works in variable condition by metabolic analysis, sensitivity analysis, dynamic analysis, and knowing fundamental and essential steps in particular action of system. Third thing is the control method, Mechanisms to regulate cell status systemically represent they can be utilized to minimize functional error. Finally, what system biology emphasize on is the design system, meaning that methodological perspective which is strategies making and correcting biological systems having desired properties can be designed by definite design principles and simulation. These four features are able to achieve better result with understanding of science, genetics, and measurement technique using computer and discovery utilizing existing knowledge.<o:p></o:p>
Method selection to analyze system biology is based on modeling according to biological knowledge availability of the system. Steady-state analysis is analyzation without reaction coefficient, can be executed by only network [7]. Traditional stability analysis and sensitivity analysis are analyzing methods when there is some information about steady-state coefficient, providing insight about how system behavior change with reflection of radical change of stimulation and reaction coefficient. In case of bifurcation analysis, it uses dynamic simulator as an analytical tool, providing detailed examples about dynamic acts. This kind of analyzation is necessary in dynamic system and already used in many biological simulation researches[8, 9].<o:p></o:p>
To make network model generally, serial experiments to find specific interrelationship and large scale of comparing studies are done. Some trials reached to making big and understandable database of gene regulation and biochemical network. Many network structures are known, and these database can give truly useful information about that networks. STKE, KEGG, EcoCyc are providing databaase[10, 11]. Functional properties of network and its structure can be comprehended with a lot of regulatory circuits. Classification and comparison studies of circuits, using variable design patterns, can predict how design patterns of regulatory circuits can be corrected and remained. They would be possible to develop if acquisition of additional investigation of ‘the periodic table’ and evolutionary family for regulatory circuits are achieved.<o:p></o:p>
System biology<o:p></o:p>
There are many advantages in doing experiment of system biology in microorganism.<o:p></o:p>
(i) Decades of genetic and biochemical works give us deep insight of biology, such that molecular biology techniques have been developed to make delicate manipulation of experiment possible.<o:p></o:p>
(ii) Microorganism can grow on economical media, and experimental materials which are regulated are able to provide sufficiently.<o:p></o:p>
(iii) Generally experimental subject is infectious pathogen for human, plant, and animal, and this study can be utilized both environmentally and pharmaceutically.<o:p></o:p>
Robustness is one of necessary characteristics in biological system[12], and understand about fundamental mechanism and principle of biological ability to get robustness is essential process to figure out biology deeply in system-level. There are three features of system having robustness: adaptation, parameter in sensitivity, and graceful degradation. Adaptation means resistance of environmental change, and parameter in sensitivity presents relative insensitivity of system about specific kinetic parameter. Lastly, graceful degradation reflects a characteristic that reducing damage by decreasing systemic function rather than getting critical harm after damaging. In engineering system, robustness made as a form of systemic control, for example, negative-feedback and feed-forward control. These kind of control have special properties for robustness. Unnecessary multiple factor inflow having same function is called redundancy, and structural stability exists which is designed for stability increase by substantial way. Then modularity is a prevention from overall damage of system by functional insulation of inferior system(s) when harm of one module. <o:p></o:p>
These approaches in network engineering can be used in biological system. Redundancy can be shown in cell cycle, gene-level regulation of circadian rhythm and circuit-level E. coli alternative metabolic pathways. Structural stability provides insensibility of statistical change in fragment formation network in Drosophila, and modularity is utilized in variable scales from cell to interactions for signal transfer cascade.<o:p></o:p>
To analyze organisms in system-level, it requires qualified and comprehensive data. Huge scale projects are also on the progress, one of them is Alliance for Cellular Signaling project doing numerous measurements to make cell stimulation model[13]. It is necessary to study about modeling in early step to know if measurement bottle-necks would be in the last modeling and what is the useless data to make model.<o:p></o:p>
Application of System biology: K-12 E.coli strain<o:p></o:p>
E. coli have been in central role of most of microorganism studies. In current research status increasing genetic resource continuously, E. coli have been studied much and used in experiments easily, so it is very suitable for systemic investigation of relationship between microorganism protein compositions and their roles[14].<o:p></o:p>
Even it is a most studied microorganism, now K-12 laboratory strain shows only 54% of E.coli protein-coding gene products have experimental evidence representing biological function. Other genes do not have name, and look like they have homologous roles only or do not have special functions. Some of that are functional "orphans”, having very independent characteristic partially because they have too low level of some mutant phenotypes or limiting coidentity as known genes. More sensitive analytical way must be guaranteed for these factors. [※ In this paragraph, orphan is different from ORFans which is a word representing genes between independent or closely related species.] <o:p></o:p>
Integrated analysis for eukaryotes(yeast, bug, and fly) in system-level have been ongoing, nevertheless, it is rare prokaryotes including E. coli[15-17]. To end this trend, genomic context(GC) method inferred from E. coli proteomics and high-quality maps of the functional interaction reflected from physical interactions(PI) occurred. This result shows not specified bacteria proteins previously are components of functionally jointed modules and members of multi-protein complex connected with well-known biological process. Actuality of this correlation can be proven as experiment, and could be observed in prokaryotic phyla (showing in similar system for other microorganism, or in E. coli strain for not similar system).<o:p></o:p>
In this paper, we analyze various kinds of way to study orphans and annotated genes pivoting on physical interactions(PI) and gene context(GC) in system biology.<o:p></o:p>
The Range of E.coli protein Annotations<o:p></o:p>
It has largely been guided historically by scientific interests and technical considerations from the definition of functional features of E. coli, some tendency is expected in the coverage and depth of existing biological information as reflected in current gene annotations. They studied the extent of literature reference records curated in the UniProt annotation system to evaluate the degree to which the physiological functions of the 4,225 putative protein-coding sequences of E. coli K-12 are characterized presently [23]. The average total number of papers associated with each of the proteins of E. coli K-12 is surprisingly limited after excepting PubMed references according to genomic mapping researches (Figure 1A), with many proteins apparently still uncited. <o:p></o:p>
As next step, they examined E. coli K-12 (substrains W3110 and MG1655) gene annotations in the public databases MultiFun, EcoCyc, and RefSeq[24,30,31]. Because W3110 is commonly used for high-throughput studies, they devoted the bulk of our subsequent analysis to this substrain. Nevertheless, they cross-mapped the corresponding gene accessions in substrains and compiled an inclusive set of functional annotations accordingly (Table S1). Overall, they found experimentally derived annotations in the MultiFun multifunction schema, 2,794 (66%) of E. coli’s proteins had either proper mnemonic names [32], or literature documentation to a well-defined pathway or multiprotein complex in EcoCyc (Figure 1B). The left 1,431 proteins as currently functionally uncharacterized exist, and about 34%. 31% of 446 have at least one putative molecular function defined on the basic sequence in the Clusters of Orthologous Groups (COGs) of proteins catalog [33]. <o:p></o:p>
Functional Orphans Characteristics in E. coli <o:p></o:p>
Translation seem to occur for the genes lacking annotation into bona fide proteins as their corresponding transcripts were not obviously (p ¼ 0.36) less stable than the products of annotated genes (Figure 1C). Nonetheless, some differences were clear with side of their biophysical attributes and evolutionary scope associated with annotated genes. Just 21 orphans about 1.5% are necessary for viability under regular laboratory conditions worthily. In contrast with the 280 annotated genes (10%) formerly considered must-have. The orphans were also obviously less sufficient at the protein and transcript levels, and that is major characteristic of the orphans. In addition, they prefer to encode 44% of smaller with fewer domain assignments than for 74% of annotated proteins according to the SUPERFAMILY database [34]. Using a maximum-score E-value cutoff of 1 3 106 for BLAST bidirectional best hits (BDBHs), orphans also overall find less orthologs in a nonredundant dataset, filtered at 90% similarity based on the frequency of shared orthologs (Figure 1G), with an average of 0.22 as compared with 0.48 for annotated genes. However, in metagenomes, more various sequence comparisons available against current one (Figure 1H) indicated that orphan homologs (one-way BLAST hits) are distributed in diverse conditions (PLS2, S2). With all of these things, that claims the functional importance of the orphans is more than the annotations as a result.<o:p></o:p>
Research difficulty of orphans in K-12 E.coli strain<o:p></o:p>
In figure 1-A, X axis shows number of research papers of E.coli K-12 strain and Y axis means number of similar papers(how many papers in one field). It is remarkable that 0 point of X axis, in other words, orphans which is no one researched genes account for 30%. Gene number of E.coli K-12 strain is 4225, and it turned out there are 1431 orphans.<o:p></o:p>
In case of B, annotated genes are functionally divided by Venn diagram, and another 30%, 1431 orphan existence is shown as graphically. <o:p></o:p>
C is a graph comparing orphans and known genes according to mRNA decomposition speed by its half-life. In this graph, it is tried to know why orphans did not research with decomposition speed. In previous experiments, it was assumed that decomposition speeds of orphan mRNA are too fast and hard to study their functions experimentally. Nevertheless, comparison result between half-life of two mRNA samples is not that different (p=0.38). Therefore, half-life cannot clarify why orphans were not researched. [※ If p value is bigger than 0.05, it can be interpreted as no difference. If it is smaller than 0.05, we can see samples have function difference.]<o:p></o:p>
In graph D, it shows result of experiment about knowing difference of two samples by average expression level of mRNA. In this case, different from C, p value is smaller than 0.05 such that these two samples show difference. According to them, we can think about not efficient study about orphans was based on slower speed of expression than annotated genes. <o:p></o:p>
E is the experiment of codon adaption index(CAI) to know difference of two samples. P value of it also inform there is a difference, so it support the result of A above. It is used that functional group and other things of proteins vary by coding codon. [※ CAI is a widely used technology to analyze codon using frequency. Contrary to other codon using frequency, CAI measures protein-coding gene order by perspective of reference set of genes.]<o:p></o:p>
F have also smaller p value than 0.06 and there is a difference between two samples experimentally. F is a graph considering size, to confirm if size of orphans are too small to study. As a result, small size of orphans also effect to experiment. <o:p></o:p>
In G, hypothesis is confirmed that genes existing only in E. coli, not other organisms, can be orphans because they are hard to find their functions. Genes which are found in more organisms have more similar function and easy to study, nonetheless, some of specific orphan exist only in E. coli and that is maybe the reason of orphan classification. By comparing to genes in bacteria, more orphans are only in E. coli.<o:p></o:p>
Proving of difficulty of actual orphan research in A, comparison between bacteria in B~F, and comparison between other species are done. To support the opinion in A and to find the reason, research of mRNA half-life expression level in addition to CAI, gene size, and functional similarity was done. This kind of overall research could present some parts of typical system biology. <o:p></o:p>
Overall analysis of K-12 system<o:p></o:p>
<v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"> <v:stroke joinstyle="miter"> <v:formulas> <v:f eqn="if lineDrawn pixelLineWidth 0"> <v:f eqn="sum @0 1 0"> <v:f eqn="sum 0 0 @1"> <v:f eqn="prod @2 1 2"> <v:f eqn="prod @3 21600 pixelWidth"> <v:f eqn="prod @3 21600 pixelHeight"> <v:f eqn="sum @0 0 1"> <v:f eqn="prod @6 1 2"> <v:f eqn="prod @7 21600 pixelWidth"> <v:f eqn="sum @8 21600 0"> <v:f eqn="prod @7 21600 pixelHeight"> <v:f eqn="sum @10 21600 0"> </v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:formulas> <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"> <o:lock v:ext="edit" aspectratio="t"> </o:lock></v:path></v:stroke></v:shapetype><v:shape id="그림_x0020_3" o:spid="_x0000_s1027" type="#_x0000_t75" alt="설명: figure2.jpg" style="position: absolute; left: 0px; margin-left: 9.05pt; margin-top: -14pt; width: 444.35pt; height: 299.85pt; z-index: 251659264; visibility: visible;"> <v:imagedata src="file:///C:\Users\ADMINI~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.jpg" o:title="figure2"> <w:wrap type="square"> </w:wrap></v:imagedata></v:shape><v:shapetype id="_x0000_t202" coordsize="21600,21600" o:spt="202" path="m,l,21600r21600,l21600,xe"> <v:stroke joinstyle="miter"> <v:path gradientshapeok="t" o:connecttype="rect"> </v:path></v:stroke></v:shapetype>
Figure 2. Generation and Integration of Physical and Functional Networks and Orphan Function Prediction<o:p></o:p> |
<v:shape id="Text_x0020_Box_x0020_9" o:spid="_x0000_s1026" type="#_x0000_t202" style="position: absolute; left: 0px; margin-left: 8.7pt; margin-top: 275.35pt; width: 444.35pt; height: 34.5pt; z-index: 251664384; visibility: visible;" o:gfxdata="UEsDBBQABgAIAAAAIQC75UiUBQEAAB4CAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbKSRvU7DMBSF dyTewfKKEqcMCKEmHfgZgaE8wMW+SSwc27JvS/v23KTJgkoXFsu+P+c7Ol5vDoMTe0zZBl/LVVlJ gV4HY31Xy4/tS3EvRSbwBlzwWMsjZrlprq/W22PELHjb51r2RPFBqax7HCCXIaLnThvSAMTP1KkI +gs6VLdVdad08ISeCho1ZLN+whZ2jsTzgcsnJwldluLxNDiyagkxOquB2Knae/OLUsyEkjenmdzb mG/YhlRnCWPnb8C898bRJGtQvEOiVxjYhtLOxs8AySiT4JuDystlVV4WPeM6tK3VaILeDZxIOSsu ti/jidNGNZ3/J08yC1dNv9v8AAAA//8DAFBLAwQUAAYACAAAACEArTA/8cEAAAAyAQAACwAAAF9y ZWxzLy5yZWxzhI/NCsIwEITvgu8Q9m7TehCRpr2I4FX0AdZk2wbbJGTj39ubi6AgeJtl2G9m6vYx jeJGka13CqqiBEFOe2Ndr+B03C3WIDihMzh6RwqexNA281l9oBFTfuLBBhaZ4ljBkFLYSMl6oAm5 8IFcdjofJ0z5jL0MqC/Yk1yW5UrGTwY0X0yxNwri3lQgjs+Qk/+zfddZTVuvrxO59CNCmoj3vCwj MfaUFOjRhrPHaN4Wv0VV5OYgm1p+LW1eAAAA//8DAFBLAwQUAAYACAAAACEAwBEn54oDAADwBwAA HwAAAGNsaXBib2FyZC9kcmF3aW5ncy9kcmF3aW5nMS54bWzMVc1u3DYQvhfIOxC8r6Vda+2VEDmw ncgI6joLr/MAXIqSWFOkSnL/XATIKxRojwV66wPkrZI+RIektHb+gaSH7mVJzvDjzDffjB4/2bYC rZk2XMkcjw9ijJikquSyzvHLm2I0w8hYIksilGQ53jGDn5w8+uExyWpNuoZTBAjSZCTHjbVdFkWG Nqwl5kB1TIKtUrolFra6jkpNNoDcimgSx0dRS7jEJ/dQT4klaKX5N0AJRW9ZeU7kmhiAFDR7eNLH KOj3I5NMri90t+jm2kVOr9ZzjXiZY2BOkhYowlFv6N1gG31wq74H2Fa6df6qqtAWKhBPklkKWLsc J9Pj43gWBzy2tYiCw/QoOZwkU4yo8zhMk0nvQJsXX4GgzbMvg0CYIRxYPAjRdC5Auf4458Mh5xsX 3pnaonSfvfNGdguHkJWvsukuoUoGSXXeEFmzU63VpmGkNM4j0AR8hmc8ZcOLxmEtNz+pEvglK6s8 3n9E3T5rknXa2AumWuQWOdaMWv8SWV8aGwIcXBwlRgleFlwIv9H18lxotCYix4X/9Tm95yakJ1O5 awExnACB8IazOSp9E/yajidJfDZJR8XR7HiUFMl0lIIiRvE4PUuP4iRNnhavXIDjJGt4WTJ5ySUb GnKcfKT2llOtjKrsAVVtBJLjlA1NCS05jkNLok2O0+lkGqr22SRj//tUki23TCPB2xzP9k4kc7V+ JktIm2SWcBHW0fvh+24BDoZ/zwrUyGnAX9wuvB6dtMqdO1nCP+hDKygZdA5MM1g0St9htIEZlWPz y4pohpF4LkFq4GKHhR4Wy2FBJIWrObYYheW5hR3cWXWa1w0gD2I+BR0WvJdFiMFFI4xd2J1gPksf qRMvETXM159XBnAFsVxeyjN9ew9GTytffNPRuTVBRL6xfebBCsueA0+EBlgBfZRjJkcvFzCn7wAP 6MZo6YDRLdNupsOBi4tVFcg56BiidDEgu+tYRSg01bu/f/vnz9fo7Zu/3v3+R+hhRr5sp+aB/Ya3 zKArtkHXqiWyFwWE6Et2UvB6pRmaHKALJpmGx5UEfkv0XFoGHxK/VxWaNzvDKRHeVqwkdQbYXjG7 URpmh7vzQncwP9BgRnPNSu493aANwvHvMlnOiSbXe6Ju1ejH6/8jUfeRetmDbFwme4WvDFt011C+ MDGsbwGvDHD84Fvnr/bfZvdBfbg/+RcAAP//AwBQSwMEFAAGAAgAAAAhAL6385HCBgAA8hsAABoA AABjbGlwYm9hcmQvdGhlbWUvdGhlbWUxLnhtbOxZT28cNRS/I/EdrLm32f/NRt1U2c1uC23aKNkW 9eid8c648YxHtjfp3lB7REJCFMSBStw4IKBSK3EpnyZQBEXqV+DZnpkdZydK0ka0guaQnXnz83vP 75+f7ctX7scM7RMhKU96Xv1izUMk8XlAk7Dn3R6PLqx6SCqcBJjxhPS8OZHelfUPP7iM13xG0wnH IhhHJCYIGCVyDfe8SKl0bWVF+kDG8iJPSQLfplzEWMGrCFcCgQ9AQMxWGrVaZyXGNPHWgaPSjIYM /iVKaoLPxK5mQ1CCY5B+azqlPjHYYK+uEXIuB0ygfcx6HvAM+MGY3FceYlgq+NDzaubPW1m/vILX skFMHTO2NG5k/rJx2YBgr2FkinBSCK2PWt1LmwV/A2BqGTccDgfDesHPALDvw0ytLmWerdFqvZ/z LIHs4zLvQa1da7n4Ev/mks7dfr/f7ma6WKYGZB9bS/jVWqe10XDwBmTx7SV8q78xGHQcvAFZfGcJ P7rU7bRcvAFFjCZ7S2jt0NEo415Appxdq4SvAny1lsEXKIiGIrq0iClP1HGxFuN7XIwAoIEMK5og NU/JFPsQky+//+Lvx5+iv55+9/LRV1oMXiO49N2SfLlE0hKR9AVNVc/7OMWJV4K8ev7jq+dP0eGD Z4cPfjl8+PDwwc+WkTPqGk7C8qgKXY7iZRn/+0+f/fbrl9WMIYkWk3zx9ZM/nj158c3nf/7wqAK+ IfCkDB/TmEh0kxygHR7DxIxVXE3IRJxtxDjCtDxiIwklTrCWUsF/qCIHfXOOGa7A9YlrwTsCikgV 8OrsnqPwbiRmilZwvB7FDnCLc9bnotIK17WskpnHsySsFi5mZdwOxvtVsgc4cfw7nKVQPWkVy0FE HDW3GU4UDklCFNLf+B4hFbO7S6lj1y3qCy75VKG7FPUxrTTJmE6caFoMukZj8Mu8SkHwt2ObrTuo z1nVrDfJvouErMCsQvkxYY4Zr+KZwnEVyzGOWdngN7CKqpTcnQu/jBtKBZ4OCeNoGBApq8bcEjDf ktOvY6hblW7fYvPYRQpF96p43sCcl5GbfG8Q4Titwu7SJCpjP5J7EKIYbXNVBd/ibobod/ADTo51 9x1KHHefXA1u09BRaREg+stMaF9CwXbqcEyTt1SUoQy++PZxRXy9q+V4Q9DKfLh2pAgfhztaegdc BPTdr7ybeJZsEwj25eXnfeF9X3i9/3zhPS6fT1tuFxUWiq9u32yDbNrl+NhueUoZ21VzRm5I0zBL WC2CERD1OLMrJMXuKY3gMavuDi4U2IxBgqtPqIp2I5xCs133NJNQZqxDiVIuYZNnyJW8NR4admW3 iG29ebD1QGK1xQNLbmpyvkco2Jg1JzQb0VxQUzM4rbDmpYwpTPt1hNW1UqeWVjeqmVLnSCumDD5c nhoQC2tCG4KgeQErd2BfrkXD9gQzEmi72xU4d4vxwnm6SEY4IJmP9LyXfVQ3TspjxZwKQOxU+Ehv +E6wWklaV7N9A2mncVJZXOsYcbn33sRLeQQvvKTz9kg6sqScnCxBBz2v2260PeTjtOdNYX8Lj3EK Xpe688MshJMhXwkb9icms8nyhTe7+cTcJKjDkYW1+9KEnTqQCqk2sYxsaJhPWQiwREuy+jfaYNbz moCN9NfQorkKwfDWtAA7uq4l0ynxVdnZJYq2nX3NSimfKSJ2o+AATdhM7GBwvw5VmE9AJRxQmIqg X+BMTVvbfHKLc5Z05ZMsg7N0zNIIZ+VWp2ieyRZu8rjQwbyV1IO5VepuJnf2qZiUP6eplMP4fzYV vZ7AeUEz0B7w4YBWYKTztedxoSIOVSiNqD8S0DiY2gHRAuey8BmCCk6Tza8g+/rX5pzlYdIatn1q h4ZIUFiPVCQI2YayZKLvBGb1bO2yLFnGyERUSV2ZWrUnZJ+wsa6BHb22eyiCUDfVJCsDBnc0/tz3 LIMmoW5yyvnm1JBi7bU58G93PjaZYVJuHTYNTW7/QsWKVdWON8Pztbc8Ef1h0Wa18qwAYaWloJul /WuqcMal1laspRk32rly4MXlGQOxaIhSOPVB+h+sf1T4zN486AV1zHegtiK4dNDMIGwgqi/YxgPp AmmJE2icLNEGk2ZlTZu1Ttpq+WJ9zp1uIfeIsbVmp/H3GY1dNGeuOCcXz9PYmYUdW1vasaYGzx5N USBN842McUzVDdQWTtEkrPc8uAUCR9+HJ7hH8oDW0LSGpsETXA5Bs2RvdHpe9pBT4LulFJhmTmnm mFZOaeWUdk6B5iy7O8kpHahU+roDrtv0j4fymw3o4LKbkLyoOtd06/8AAAD//wMAUEsDBBQABgAI AAAAIQCcZkZBuwAAACQBAAAqAAAAY2xpcGJvYXJkL2RyYXdpbmdzL19yZWxzL2RyYXdpbmcxLnht bC5yZWxzhI/NCsIwEITvgu8Q9m7SehCRJr2I0KvUBwjJNi02PyRR7Nsb6EVB8LIws+w3s037sjN5 YkyTdxxqWgFBp7yenOFw6y+7I5CUpdNy9g45LJigFdtNc8VZ5nKUxikkUigucRhzDifGkhrRykR9 QFc2g49W5iKjYUGquzTI9lV1YPGTAeKLSTrNIXa6BtIvoST/Z/thmBSevXpYdPlHBMulFxagjAYz B0pXZ501LV2BiYZ9/SbeAAAA//8DAFBLAQItABQABgAIAAAAIQC75UiUBQEAAB4CAAATAAAAAAAA AAAAAAAAAAAAAABbQ29udGVudF9UeXBlc10ueG1sUEsBAi0AFAAGAAgAAAAhAK0wP/HBAAAAMgEA AAsAAAAAAAAAAAAAAAAANgEAAF9yZWxzLy5yZWxzUEsBAi0AFAAGAAgAAAAhAMARJ+eKAwAA8AcA AB8AAAAAAAAAAAAAAAAAIAIAAGNsaXBib2FyZC9kcmF3aW5ncy9kcmF3aW5nMS54bWxQSwECLQAU AAYACAAAACEAvrfzkcIGAADyGwAAGgAAAAAAAAAAAAAAAADnBQAAY2xpcGJvYXJkL3RoZW1lL3Ro ZW1lMS54bWxQSwECLQAUAAYACAAAACEAnGZGQbsAAAAkAQAAKgAAAAAAAAAAAAAAAADhDAAAY2xp cGJvYXJkL2RyYXdpbmdzL19yZWxzL2RyYXdpbmcxLnhtbC5yZWxzUEsFBgAAAAAFAAUAZwEAAOQN AAAAAA== " stroked="f"> <v:textbox inset="0,0,0,0"> </v:textbox> <w:wrap type="square"> </w:wrap></v:shape>Figure2 is outline of System biology studies based on PI, GC in this paper. Figure 2-a is scheme of construction of a physical network based on protein copurification and detection.<o:p></o:p>
Figure2-b is scheme of integration of four Genomic context methods.<o:p></o:p>
First method is gene fusions which represent similarity of functionality [35,36], second method is similarity of Phylogenetic profiles [33,37-38], third method is evolutionary conservation of gene order which is direction that proteins are expressed [39-41], fourth method is measurement of intergenic distances which are close the more functionality is similar[42-44]. Figure2-c is scheme of integration of PI and GC probabilistic networks and function prediction based on Figure2-a or Figure2-b. So this group used “StepPLR” which is designed new integrated network topology-based method.<o:p></o:p>
Physical Interaction (PI) <o:p></o:p>
Large-scale Sequential Peptide Affinity (SPA) tagging allows for the efficient purification of E. coli protein complexes and their characterization by mass spectrometry [20]. So this group use two complementary techniques. (gel-based MALDI peptide mass fingerprinting and gel-free LCMS short gun sequencing) It is used to detect interaction between proteins physically. Next, they combine the score of MALDI and LCMS into a single PI network using a previously established procedure for integrating probabilistic networks. Last, they conduct filtering from confidence cutoff score 0.75 and clustering using MCL.<o:p></o:p>
Genomic Context (GC)<o:p></o:p>
We apply computational methods to identify a network of high-confidence pairwise functional interactions for all E. coli proteins, including those not detectable by PI network. They use four method. These methods classify two types. First type used to predict functional interactions among E. coli proteins were based on: gene fusion and similarity of Phylogenetic profiles. Second type used that natural chromosomal association of bacterial genes in operons is detected: evolutionary conservation of gene order and measurement of intergenic distances.<o:p></o:p>
Clustering of networks<o:p></o:p>
From three different networks using MCL, protein clusters existed [40] (Figure 2): (1) the PI network (generating protein complexes); (2) the unified GC network (generating functional modules); and (3) the function prediction/annotation profiles derived from the integration of PI and GC networks (generating functional neighborhoods). The core idea of MCL is to simulate random walks among the proteins (nodes) within each network to delimit regions with high flux, considering the connectivity and weight of interaction edges. Edge weights correspond to the likelihood of pairwise protein interactions in each network in this work. Tuning the granularity of the delimited clusters, the global MCL inflation parameter was optimized by adjusting the mass fraction of clusters and efficiency of partitions (Protocol S4) in each case. As described formerly, individuals of the resulting clusters were measured for functional homogeneity in view of COG annotations (Protocol S4). The cohesiveness is measured in terms of achieving homogeneity of a chosen behavior within a cluster. For genes, the behavior can be either a molecular function or a biological process. A cluster is said to be homogeneous when all the genes of a cluster belongs to only one behavioral group and our metric returns 0, indicating the best cohesiveness.<o:p></o:p>
Discussion<o:p></o:p>
System analysis has many means and values, but the objective that overcomes current experimental tradition is necessary.<o:p></o:p>
Because complete system analysis about biological regulation need to accurate measurement and large information processing. Research of biological system analysis should improve from research such as superficial topological interaction to research regarding to information included mRNA, protein, metabolic information, and interaction. <o:p></o:p>
Most realizable application among System biology research is mechanism based drug screen about cell regulation focusing on molecular and cascades of specific signal transmission. This model can offset drug’s effect and help to identify feedback mechanism which predicts effect in system aspect. There is possibility of utilizing of multiple drug system guiding cell status that have functional error about minimized side-effect state..<o:p></o:p>
Although system biology is an early stage, potential profit is enormous in practical aspect and scientific aspect. Biological field expend from molecular stage to system stage. It can be easy to understand complicated biological regulation system, and provide crucial opportunity for applying this knowledge practically. <o:p></o:p>
Reference<o:p></o:p>
[3] H. Kitano, Systems biology: a brief overview, Science, 295 (2002) 1662-1664.<o:p></o:p>
[4] H. Kitano, Foundations of systems biology, MIT press Cambridge, MA, 2001.<o:p></o:p>
[5] N. Wiener, Cybernetics; or control and communication in the animal and the machine, (1948).<o:p></o:p>
[6] R.M. May, Uses and abuses of mathematics in biology, Science Signalling, 303 (2004) 790.<o:p></o:p>
[14] Hu, P., Janga, S. C., Babu, M., Díaz-Mejía, J. J., Butland, G., Yang, W., ... & Emili, A. (2009). Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS biology, 7(4), e1000096.<o:p></o:p>
[29] J.J. Díaz‐Mejía, M. Babu, A. Emili, Computational and experimental approaches to chart the Escherichia coli cell‐envelope‐associated proteome and interactome, FEMS microbiology reviews, 33 (2009) 66-97.<o:p></o:p>
[30] Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33: D501–504.<o:p></o:p>
[31] Serres MH, Goswami S, Riley M (2004) GenProtEC: an updated and improved analysis of functions of Escherichia coli K-12 proteins. Nucleic Acids Res 32: D300–302.<o:p></o:p>
[32] Rudd KE (1998) Linkage map of Escherichia coli K-12, edition 10: the physical map. Microbiol Mol Biol Rev 62: 985–1019.<o:p></o:p>
[33] Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278: 631–637.<o:p></o:p>
[34] Madera M, Vogel C, Kummerfeld SK, Chothia C, Gough J (2004) The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res 32: D235–239.<o:p></o:p>
[35] Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402: 86–90.<o:p></o:p>
[36] Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, et al. (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285: 751–753.<o:p></o:p>
[37] Gaasterland T, Ragan MA (1998) Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb Comp Genomics 3: 199–217.<o:p></o:p>
[38] Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96: 4285–4288.<o:p></o:p>
[39] Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23: 324–328.<o:p></o:p>
[40] Janga SC, Moreno-Hagelsieb G (2004) Conservation of adjacency as evidence of paralogous operons. Nucleic Acids Res 32: 5392–5397.<o:p></o:p>
[41] Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96: 2896–2901.<o:p></o:p>
[42] Janga SC, Collado-Vides J, Moreno-Hagelsieb G (2005) Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons. Nucleic Acids Res 33: 2521–2530.<o:p></o:p>
[43] Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, et al. (2002) Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res 30: 2212–2223.<o:p></o:p>
[44] Snel B, Bork P, Huynen MA (2002) The identification of functional modules from the genomic association of genes. Proc Natl Acad Sci USA 99: 5890–5895.<o:p></o:p>