The Practical Use of System Biology: K-12 E.coli Strain

From Biolecture.org

K-12 E.coli strain을 주축으로 한 System biology의 실질적 활용

The Practical Use of System Biology: K-12 E.coli Strain

 

권희운(HeeUn Kwon), 이강석(Kangseok Lee)  [UNIST 나노생명화학공학부]

 

After naming of something from cork as “cell” by Robert Hooke in 1665, biology, a subject researched from ancient, has been evolved in more radical way and reached gene level study though setting up modern research methodology in 17th and 18th century. And as more and more developing of biology and genomics, it needed to invest and understand about structure and dynamics of cell and organism than isolated parts of them. This necessity was known from long time ago because early development of genomics also had systemic way, there were lack of elements for understanding characteristics of system. By existence of better experimental tools and improved software and analytical way recently, progress of biology directs to not only collection of information but also knowing interactions of information and their roles: overall understanding of genes, understanding and research about complicated biosystem, and systemic way by using genetical information to this day[1-3], As differentiation of sequence genomics and functional genomics from genomics, this kind of research direction is related to system biology. Namely, we uses system biology to overcome post-genomic era hitherto[4]. System-level understanding is continuing issue in biological science, and technically making better system performance collecting understandable data and quality of important molecules of other biology, like genome sequencing and high-throughput measurement[5]. Future biology will be changed by spatiotemporal analytic techniques associated with mathematical, computational modeling. Different from usage of genomics and molecular biology, importance of system biology would be higher, with provision of “nothing more or less, obvious statistical way of thinking” by utilizing these statistics and information[6].

 

 

Introduction: System informatics&Networks

There are several components of System biology; individual molecules (signal transferring, metabolites network, etc.), assemblies of interacting complexes, assemble of many physical factor inducing organism development (gene, mRNA, relative protein and protein complex), tissue•organ cell, even using all of life forms in ecological community. Approach method of system biology is namely   ‘understanding in system level’, and needs to change of perspective that what is the things to see for research process. Keeping importance of understanding of gene and protein due to its base from genomics, system biology is different by focusing on understanding of systemic structure and dynamics.

System is not simple arrangement of genes and protein, such that characteristics of system could not be completely analyzed by making table and graph of correlation between genes and proteins. Rather than table as just static roadmap, it is important to know dynamic process of system, why they occur, and how to control them. To study these side, system biology focus on four different things from existing genomics. It means importance of understanding of structure and purpose of system, not knowing all of genes and proteins. First part is system structure, including gene interrelation network, biochemical pathway, and adjusting mechanism of physical property in cell or multicellular structures. Secondly, system dynamics is a concept of researching how system works in variable condition by   metabolic analysis, sensitivity analysis, dynamic analysis, and knowing fundamental and essential steps in particular action of system. Third thing is the control method, Mechanisms to regulate cell status systemically represent they can be utilized to minimize functional error. Finally, what system biology emphasize on is the design system, meaning that methodological perspective which is strategies making and correcting biological systems having desired properties can be designed by definite design principles and simulation. These four features are able to achieve better result with understanding of science, genetics, and measurement technique using computer and discovery utilizing existing knowledge.

Method selection to analyze system biology is based on modeling according to biological knowledge availability of the system. Steady-state analysis is analyzation without reaction coefficient, can be executed by only network [7]. Traditional stability analysis and sensitivity analysis are analyzing methods when there is some information about steady-state coefficient, providing insight about how system behavior change with reflection of radical change of stimulation and reaction coefficient. In case of bifurcation analysis, it uses dynamic simulator as an analytical tool, providing detailed examples about dynamic acts. This kind of analyzation is necessary in dynamic system and already used in many biological simulation researches[8, 9].

To make network model generally, serial experiments to find specific interrelationship and large scale of comparing studies are done.  Some trials reached to making big and understandable database of gene regulation and biochemical network. Many network structures are known, and these database can give truly useful information about that networks. STKE, KEGG, EcoCyc are providing databaase[10, 11]. Functional properties of network and its structure can be comprehended with a lot of regulatory circuits. Classification and comparison studies of circuits, using variable design patterns, can predict how design patterns of regulatory circuits can be corrected and remained. They would be possible to develop if acquisition of additional investigation of ‘the periodic table’ and evolutionary family for regulatory circuits are achieved.

 

System biology

There are many advantages in doing experiment of system biology in microorganism.

(i) Decades of genetic and biochemical works give us deep insight of biology, such that molecular biology techniques have been developed to make delicate manipulation of experiment possible.

(ii) Microorganism can grow on economical media, and experimental materials which are regulated are able to provide sufficiently.

(iii) Generally experimental subject is infectious pathogen for human, plant, and animal, and this study can be utilized both environmentally and pharmaceutically.

Robustness is one of necessary characteristics in biological system[12], and understand about fundamental mechanism and principle of biological ability to get robustness is essential process to figure out biology deeply in system-level. There are three features of system having robustness: adaptation, parameter in sensitivity, and graceful degradation.  Adaptation means resistance of environmental change, and parameter in sensitivity presents relative insensitivity of system about specific kinetic parameter. Lastly, graceful degradation reflects a characteristic that reducing damage by decreasing systemic function rather than getting critical harm after damaging. In engineering system, robustness made as a form of systemic control, for example, negative-feedback and feed-forward control. These kind of control have special properties for robustness. Unnecessary multiple factor inflow having same function is called redundancy, and structural stability exists which is designed for stability increase by substantial way. Then modularity is a prevention from overall damage of system by functional insulation of inferior system(s) when harm of one module. 

These approaches in network engineering can be used in biological system. Redundancy can be shown in cell cycle, gene-level regulation of circadian rhythm and circuit-level E. coli alternative metabolic pathways. Structural stability provides insensibility of statistical change in fragment formation network in Drosophila, and modularity is utilized in variable scales from cell to interactions for signal transfer cascade.

To analyze organisms in system-level, it requires qualified and comprehensive data. Huge scale projects are also on the progress, one of them is Alliance for Cellular Signaling project doing numerous measurements to make cell stimulation model[13]. It is necessary to study about modeling in early step to know if measurement bottle-necks would be in the last modeling and what is the useless data to make model.

 

Application of System biology: K-12 E.coli strain

E. coli have been in central role of most of microorganism studies. In current research status increasing genetic resource continuously, E. coli have been studied much and used in experiments easily, so it is very suitable for systemic investigation of relationship between microorganism protein compositions and their roles[14].

Even it is a most studied microorganism, now K-12 laboratory strain shows only 54% of E.coli protein-coding gene products have experimental evidence representing biological function. Other genes do not have name, and look like they have homologous roles only or do not have special functions. Some of that are functional "orphans”, having very independent characteristic partially because they have too low level of some mutant phenotypes or limiting coidentity as known genes. More sensitive analytical way must be guaranteed for these factors. [※ In this paragraph, orphan is different from ORFans which is a word representing genes between independent or closely related species.] 

Integrated analysis for eukaryotes(yeast, bug, and fly) in system-level have been ongoing, nevertheless, it is rare prokaryotes including E. coli[15-17]. To end this trend, genomic context(GC) method inferred from E. coli proteomics and high-quality maps of the functional interaction reflected from physical interactions(PI) occurred. This result shows not specified bacteria proteins previously are components of functionally jointed modules and members of multi-protein complex connected with well-known biological process. Actuality of this correlation can be proven as experiment, and could be observed in prokaryotic phyla (showing in similar system for other microorganism, or in E. coli strain for not similar system).

In this paper, we analyze various kinds of way to study orphans and annotated genes pivoting on physical interactions(PI) and gene context(GC) in system biology.

 

The Range of E.coli protein Annotations

It has largely been guided historically by scientific interests and technical considerations from the definition of functional features of E. coli, some tendency is expected in the coverage and depth of existing biological information as reflected in current gene annotations. They studied the extent of literature reference records curated in the UniProt annotation system to evaluate the degree to which the physiological functions of the 4,225 putative protein-coding sequences of E. coli K-12 are characterized presently [23]. The average total number of papers associated with each of the proteins of E. coli K-12 is surprisingly limited after excepting PubMed references according to genomic mapping researches (Figure 1A), with many proteins apparently still uncited. 

As next step, they examined E. coli K-12 (substrains W3110 and MG1655) gene annotations in the public databases MultiFun, EcoCyc, and RefSeq[24,30,31]. Because W3110 is commonly used for high-throughput studies, they devoted the bulk of our subsequent analysis to this substrain. Nevertheless, they cross-mapped the corresponding gene accessions in substrains and compiled an inclusive set of functional annotations accordingly (Table S1). Overall, they found experimentally derived annotations in the MultiFun multifunction schema, 2,794 (66%) of E. coli’s proteins had either proper mnemonic names [32], or literature documentation to a well-defined pathway or multiprotein complex in EcoCyc (Figure 1B). The left 1,431 proteins as currently functionally uncharacterized exist, and about 34%. 31% of 446 have at least one putative molecular function defined on the basic sequence in the Clusters of Orthologous Groups (COGs) of proteins catalog [33]. 

 

Functional Orphans Characteristics in E. coli 

 

Figure 1. Annotated and Functional Orphan Genes of the E. coli K-12 Reference Strain

Translation seem to occur for the genes lacking annotation into bona fide proteins as their corresponding transcripts were not obviously (p ¼ 0.36) less stable than the products of annotated genes (Figure 1C). Nonetheless, some differences were clear with side of their biophysical attributes and evolutionary scope associated with annotated genes. Just 21 orphans about 1.5% are necessary for viability under regular laboratory conditions worthily. In contrast with the 280 annotated genes (10%) formerly considered must-have. The orphans were also obviously less sufficient at the protein and transcript levels, and that is major characteristic of the orphans. In addition, they prefer to encode 44% of smaller with fewer domain assignments than for 74% of annotated proteins according to the SUPERFAMILY database [34]. Using a maximum-score E-value cutoff of 1 3 106 for BLAST bidirectional best hits (BDBHs), orphans also overall find less orthologs in a nonredundant dataset, filtered at 90% similarity based on the frequency of shared orthologs (Figure 1G), with an average of 0.22 as compared with 0.48 for annotated genes. However, in metagenomes, more various sequence comparisons available against current one (Figure 1H) indicated that orphan homologs (one-way BLAST hits) are distributed in diverse conditions (PLS2, S2). With all of these things, that claims the functional importance of the orphans is more than the annotations as a result.

 

Research difficulty of orphans in K-12 E.coli strain

In figure 1-A, X axis shows number of research papers of E.coli K-12 strain and Y axis means number of similar papers(how many papers in one field). It is remarkable that 0 point of X axis, in other words, orphans which is no one researched genes account for 30%. Gene number of E.coli K-12 strain is 4225, and it turned out there are 1431 orphans.

In case of B, annotated genes are functionally divided by Venn diagram, and another 30%, 1431 orphan existence is shown as graphically. 

C is a graph comparing orphans and known genes according to mRNA decomposition speed by its half-life. In this graph, it is tried to know why orphans did not research with decomposition speed. In previous experiments, it was assumed that decomposition speeds of orphan mRNA are too fast and hard to study their functions experimentally. Nevertheless, comparison result between half-life of two mRNA samples is not that different (p=0.38). Therefore, half-life cannot clarify why orphans were not researched. [※ If p value is bigger than 0.05, it can be interpreted as no difference. If it is smaller than 0.05, we can see samples have function difference.]

In graph D, it shows result of experiment about knowing difference of two samples by average expression level of mRNA. In this case, different from C, p value is smaller than 0.05 such that these two samples show difference. According to them, we can think about not efficient study about orphans was based on slower speed of expression than annotated genes. 

E is the experiment of codon adaption index(CAI) to know difference of two samples. P value of it also inform there is a difference, so it support the result of A above. It is used that functional group and other things of proteins vary by coding codon. [※ CAI is a widely used technology to analyze codon using frequency. Contrary to other codon using frequency, CAI measures protein-coding gene order by perspective of reference set of genes.]

F have also smaller p value than 0.06 and there is a difference between two samples experimentally. F is a graph considering size, to confirm if size of orphans are too small to study. As a result, small size of orphans also effect to experiment. 

In G, hypothesis is confirmed that genes existing only in E. coli, not other organisms, can be orphans because they are hard to find their functions. Genes which are found in more organisms have more similar function and easy to study, nonetheless, some of specific orphan exist only in E. coli and that is maybe the reason of orphan classification. By comparing to genes in bacteria, more orphans are only in E. coli.

Proving of difficulty of actual orphan research in A, comparison between bacteria in B~F, and comparison between other species are done. To support the opinion in A and to find the reason, research of mRNA half-life expression level in addition to CAI, gene size, and functional similarity was done. This kind of overall research could present some parts of typical system biology. 

 

Overall analysis of K-12 system

Figure 2. Generation and Integration of Physical and Functional Networks and Orphan Function Prediction

Figure2 is outline of System biology studies based on PI, GC in this paper. Figure 2-a is scheme of construction of a physical network based on protein copurification and detection. Figure2-b is scheme of integration of four Genomic context methods.

First method is gene fusions which represent similarity of functionality [35,36], second method is similarity of Phylogenetic profiles [33,37-38], third method is evolutionary conservation of gene order which is direction that proteins are expressed [39-41], fourth method is measurement of intergenic distances which are close the more functionality is similar[42-44]. Figure2-c is scheme of integration of PI and GC probabilistic networks and function prediction based on Figure2-a or Figure2-b. So this group used “StepPLR” which is designed new integrated network topology-based method.

Physical Interaction (PI) 

Large-scale Sequential Peptide Affinity (SPA) tagging allows for the efficient purification of E. coli protein complexes and their characterization by mass spectrometry [20]. So this group use two complementary techniques. (gel-based MALDI peptide mass fingerprinting and gel-free LCMS short gun sequencing) It is used to detect interaction between proteins physically. Next, they combine the score of MALDI and LCMS into a single PI network using a previously established procedure for integrating probabilistic networks. Last, they conduct filtering from confidence cutoff score 0.75 and clustering using MCL.

Genomic Context (GC)

We apply computational methods to identify a network of high-confidence pairwise functional interactions for all E. coli proteins, including those not detectable by PI network. They use four method. These methods classify two types. First type used to predict functional interactions among E. coli proteins were based on: gene fusion and similarity of Phylogenetic profiles. Second type used that natural chromosomal association of bacterial genes in operons is detected: evolutionary conservation of gene order and measurement of intergenic distances.

Clustering of networks

From three different networks using MCL, protein clusters existed [40] (Figure 2): (1) the PI network (generating protein complexes); (2) the unified GC network (generating functional modules); and (3) the function prediction/annotation profiles derived from the integration of PI and GC networks (generating functional neighborhoods). The core idea of MCL is to simulate random walks among the proteins (nodes) within each network to delimit regions with high flux, considering the connectivity and weight of interaction edges. Edge weights correspond to the likelihood of pairwise protein interactions in each network in this work. Tuning the granularity of the delimited clusters, the global MCL inflation parameter was optimized by adjusting the mass fraction of clusters and efficiency of partitions (Protocol S4) in each case. As described formerly, individuals of the resulting clusters were measured for functional homogeneity in view of COG annotations (Protocol S4). The cohesiveness is measured in terms of achieving homogeneity of a chosen behavior within a cluster. For genes, the behavior can be either a molecular function or a biological process. A cluster is said to be homogeneous when all the genes of a cluster belongs to only one behavioral group and our metric returns 0, indicating the best cohesiveness.

 

Discussion

System analysis has many means and values, but the objective that overcomes current experimental tradition is necessary.

Because complete system analysis about biological regulation need to accurate measurement and large information processing. Research of biological system analysis should improve from research such as superficial topological interaction to research regarding to information included mRNA, protein, metabolic information, and interaction.  

Most realizable application among System biology research is mechanism based drug screen about cell regulation focusing on molecular and cascades of specific signal transmission. This model can offset drug’s effect and help to identify feedback mechanism which predicts effect in system aspect. There is possibility of utilizing of multiple drug system guiding cell status that have functional error about minimized side-effect state..

Although system biology is an early stage, potential profit is enormous in practical aspect and scientific aspect. Biological field expend from molecular stage to system stage. It can be easy to understand complicated biological regulation system, and provide crucial opportunity for applying this knowledge practically.  

 

 

 

Reference

[1] P. Hieter, M. Boguski, Functional genomics: it's all how you read it, Science, 278 (1997) 601-602.

[2] T. Ideker, T. Galitski, L. Hood, A new approach to decoding life: systems biology, Annual review of genomics and human genetics, 2 (2001) 343-372.

[3] H. Kitano, Systems biology: a brief overview, Science, 295 (2002) 1662-1664.

[4] H. Kitano, Foundations of systems biology, MIT press Cambridge, MA, 2001.

[5] N. Wiener, Cybernetics; or control and communication in the animal and the machine, (1948).

[6] R.M. May, Uses and abuses of mathematics in biology, Science Signalling, 303 (2004) 790.

[7] J.S. Edwards, R.U. Ibarra, B.O. Palsson, In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data, Nature biotechnology, 19 (2001) 125-130.

[8] M.T. Borisuk, J.J. Tyson, Bifurcation analysis of a model of mitotic control in frog eggs, Journal of theoretical biology, 195 (1998) 69-85.

[9] K.C. Chen, A. Csikasz-Nagy, B. Gyorffy, J. Val, B. Novak, J.J. Tyson, Kinetic analysis of a molecular model of the budding yeast cell cycle, Molecular biology of the cell, 11 (2000) 369-391.

[10] M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein, Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences, 95 (1998) 14863-14868.

[11] S. Chu, J. DeRisi, M. Eisen, J. Mulholland, D. Botstein, P.O. Brown, I. Herskowitz, The transcriptional program of sporulation in budding yeast, Science, 282 (1998) 699-705.

[12] M.E. Csete, J.C. Doyle, Reverse engineering of biological complexity, Science Signalling, 295 (2002) 1664.

[13] A.G. Gilman, M.I. Simon, H.R. Bourne, B.A. Harris, R. Long, E.M. Ross, J.T. Stull, R. Taussig, A.P. Arkin, M.H. Cobb, Overview of the alliance for cellular signaling, Nature, 420 (2002) 703-706.

[14] Hu, P., Janga, S. C., Babu, M., Díaz-Mejía, J. J., Butland, G., Yang, W., ... & Emili, A. (2009). Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS biology, 7(4), e1000096.

[15] M. Campillos, C. Von Mering, L.J. Jensen, P. Bork, Identification and analysis of evolutionarily cohesive functional modules in protein networks, Genome research, 16 (2006) 374-382.

[16] N. Slonim, O. Elemento, S. Tavazoie, Ab initio genotype–phenotype association reveals intrinsic modularity in genetic networks, Molecular systems biology, 2 (2006).

[17] S. Yellaboina, K. Goyal, S.C. Mande, Inferring genome-wide functional linkages in E. coli by combining improved genome context methods: comparison with high-throughput experimental data, Genome research, 17 (2007) 527-535.

[18] B.A. Shoemaker, A.R. Panchenko, Deciphering protein–protein interactions. Part II. Computational methods to predict protein and domain interaction partners, PLoS computational biology, 3 (2007) e43.

[19] M. Arifuzzaman, M. Maeda, A. Itoh, K. Nishikata, C. Takita, R. Saito, T. Ara, K. Nakahigashi, H.C. Huang, A. Hirai, Large-scale identification of protein–protein interaction of Escherichia coli K-12, Genome research, 16 (2006) 686-691.

[20] G. Butland, J.M. Peregrín-Alvarez, J. Li, W. Yang, X. Yang, V. Canadien, A. Starostine, D. Richards, B. Beattie, N. Krogan, Interaction network containing conserved and essential protein complexes in Escherichia coli, Nature, 433 (2005) 531-537.

[21] J. Vlasblom, S. Wu, S. Pu, M. Superina, G. Liu, C. Orsi, S.J. Wodak, GenePro: a Cytoscape plug-in for advanced visualization and analysis of interaction networks, Bioinformatics, 22 (2006) 2178-2179.

[22] J. Sabina, N. Dover, L.J. Templeton, D.R. Smulski, D. Söll, R.A. LaRossa, Interfering with different steps of protein synthesis explored by transcriptional profiling of Escherichia coli K-12, Journal of bacteriology, 185 (2003) 6158-6170.

[23] Z. Yao, W.L. Ruzzo, A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data, BMC bioinformatics, 7 (2006) S11.

[24] I.M. Keseler, J. Collado-Vides, S. Gama-Castro, J. Ingraham, S. Paley, I.T. Paulsen, M. Peralta-Gil, P.D. Karp, EcoCyc: a comprehensive database resource for Escherichia coli, Nucleic acids research, 33 (2005) D334-D337.

[25] E. Hahn, P. Wild, U. Hermanns, P. Sebbel, R. Glockshuber, M. Häner, N. Taschner, P. Burkhard, U. Aebi, S.A. Müller, Exploring the 3D Molecular Architecture of< i> Escherichia coli</i> Type 1 Pili, Journal of molecular biology, 323 (2002) 845-857.

[26] R. Fronzes, H. Remaut, G. Waksman, Architectures and biogenesis of non-flagellar protein appendages in Gram-negative bacteria, The EMBO journal, 27 (2008) 2271-2280.

[27] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, Gene Ontology: tool for the unification of biology, Nature genetics, 25 (2000) 25.

[28] E. Camon, M. Magrane, D. Barrell, V. Lee, E. Dimmer, J. Maslen, D. Binns, N. Harte, R. Lopez, R. Apweiler, The Gene Ontology annotation (GOA) database: sharing knowledge in Uniprot with Gene Ontology, Nucleic acids research, 32 (2004) D262-D266.

[29] J.J. Díaz‐Mejía, M. Babu, A. Emili, Computational and experimental approaches to chart the Escherichia coli cell‐envelope‐associated proteome and interactome, FEMS microbiology reviews, 33 (2009) 66-97.

[30] Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33: D501–504.

[31] Serres MH, Goswami S, Riley M (2004) GenProtEC: an updated and improved analysis of functions of Escherichia coli K-12 proteins. Nucleic Acids Res 32: D300–302.

[32] Rudd KE (1998) Linkage map of Escherichia coli K-12, edition 10: the physical map. Microbiol Mol Biol Rev 62: 985–1019.

[33] Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278: 631–637.

[34] Madera M, Vogel C, Kummerfeld SK, Chothia C, Gough J (2004) The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res 32: D235–239.

[35] Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402: 86–90.

[36] Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, et al. (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285: 751–753. 

[37] Gaasterland T, Ragan MA (1998) Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb Comp Genomics 3: 199–217.

[38] Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96: 4285–4288.

[39] Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23: 324–328.

[40] Janga SC, Moreno-Hagelsieb G (2004) Conservation of adjacency as evidence of paralogous operons. Nucleic Acids Res 32: 5392–5397.

[41] Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96: 2896–2901.

[42] Janga SC, Collado-Vides J, Moreno-Hagelsieb G (2005) Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons. Nucleic Acids Res 33: 2521–2530.

[43] Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, et al. (2002) Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res 30: 2212–2223.

[44] Snel B, Bork P, Huynen MA (2002) The identification of functional modules from the genomic association of genes. Proc Natl Acad Sci USA 99: 5890–5895.

[45] Hu, P., Janga, S. C., Babu, M., Díaz-Mejía, J. J., Butland, G., Yang, W., ... & Emili, A. (2009). Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS biology7(4), 929.