Difference between revisions of "Could several genomes can be overlapped?"
imported>SeoGaLam (Created page with "<p><span style="font-size:24px">Overlapping Gene</span></p> <p>An <strong>overlapping gene</strong> is a <a href="https://en.wikipedia.org/wiki/Gene" title="Gene"...") |
imported>SeoGaLam |
||
Line 1: | Line 1: | ||
<p><span style="font-size:24px">Overlapping Gene</span></p> | <p><span style="font-size:24px">Overlapping Gene</span></p> | ||
− | <p>An <strong>overlapping gene</strong> is a <a href="https://en.wikipedia.org/wiki/Gene" title="Gene">gene</a> whose expressible <a href="https://en.wikipedia.org/wiki/Nucleic_acid_sequence" title="Nucleic acid sequence">nucleotide sequence</a> partially overlaps with the expressible nucleotide sequence of another gene. In this way, a nucleotide sequence may make a contribution to the function of one or more <a href="https://en.wikipedia.org/wiki/Gene_product" title="Gene product">gene products</a>. <strong>Overprinting</strong> refers to a type of overlap in which all or part of the sequence of one gene is read in an alternate <a href="https://en.wikipedia.org/wiki/Reading_frame" title="Reading frame">reading frame</a> from another gene at the same <a href="https://en.wikipedia.org/wiki/Locus_(genetics)" title="Locus (genetics)">locus</a>. Overprinting has been hypothesized as a mechanism for <em>de novo</em> emergence of new genes from existing sequences, either older genes or previously <a href="https://en.wikipedia.org/wiki/Non-coding_DNA" title="Non-coding DNA">non-coding</a> regions of the genome. Overprinted genes are particularly common features of the <a href="https://en.wikipedia.org/wiki/Genomic" title="Genomic">genomic</a> organization of viruses, likely to greatly increase the number of potential expressible genes from a small set of viral genetic information.</p> | + | <hr /> |
+ | <p> An <strong>overlapping gene</strong> is a <a href="https://en.wikipedia.org/wiki/Gene" title="Gene">gene</a> whose expressible <a href="https://en.wikipedia.org/wiki/Nucleic_acid_sequence" title="Nucleic acid sequence">nucleotide sequence</a> partially overlaps with the expressible nucleotide sequence of another gene. In this way, a nucleotide sequence may make a contribution to the function of one or more <a href="https://en.wikipedia.org/wiki/Gene_product" title="Gene product">gene products</a>. <strong>Overprinting</strong> refers to a type of overlap in which all or part of the sequence of one gene is read in an alternate <a href="https://en.wikipedia.org/wiki/Reading_frame" title="Reading frame">reading frame</a> from another gene at the same <a href="https://en.wikipedia.org/wiki/Locus_(genetics)" title="Locus (genetics)">locus</a>. Overprinting has been hypothesized as a mechanism for <em>de novo</em> emergence of new genes from existing sequences, either older genes or previously <a href="https://en.wikipedia.org/wiki/Non-coding_DNA" title="Non-coding DNA">non-coding</a> regions of the genome. Overprinted genes are particularly common features of the <a href="https://en.wikipedia.org/wiki/Genomic" title="Genomic">genomic</a> organization of viruses, likely to greatly increase the number of potential expressible genes from a small set of viral genetic information.</p> | ||
<p> </p> | <p> </p> | ||
Line 7: | Line 8: | ||
<h2>Classification</h2> | <h2>Classification</h2> | ||
− | <p>Genes may overlap in a variety of ways and can be classified by their positions relative to each other.</p> | + | <p> Genes may overlap in a variety of ways and can be classified by their positions relative to each other.</p> |
<ul> | <ul> | ||
Line 15: | Line 16: | ||
</ul> | </ul> | ||
− | <p>Overlapping genes can also be classified by <em>phases</em>, which describe their relative <a href="https://en.wikipedia.org/wiki/Reading_frame" title="Reading frame">reading frames</a>:</p> | + | <p> Overlapping genes can also be classified by <em>phases</em>, which describe their relative <a href="https://en.wikipedia.org/wiki/Reading_frame" title="Reading frame">reading frames</a>:</p> |
<ul> | <ul> | ||
Line 24: | Line 25: | ||
<h2>Evolution</h2> | <h2>Evolution</h2> | ||
− | <p>Overlapping genes are particularly common in rapidly evolving genomes, such as those of <a href="https://en.wikipedia.org/wiki/Virus" title="Virus">viruses</a>, <a href="https://en.wikipedia.org/wiki/Bacteria" title="Bacteria">bacteria</a>, and <a href="https://en.wikipedia.org/wiki/Mitochondria" title="Mitochondria">mitochondria</a>. They may originate in three ways:</p> | + | <p> Overlapping genes are particularly common in rapidly evolving genomes, such as those of <a href="https://en.wikipedia.org/wiki/Virus" title="Virus">viruses</a>, <a href="https://en.wikipedia.org/wiki/Bacteria" title="Bacteria">bacteria</a>, and <a href="https://en.wikipedia.org/wiki/Mitochondria" title="Mitochondria">mitochondria</a>. They may originate in three ways:</p> |
<ol> | <ol> | ||
Line 32: | Line 33: | ||
</ol> | </ol> | ||
− | <p>The use of the same nucleotide sequence to encode multiple genes may provide an <a href="https://en.wikipedia.org/wiki/Evolution" title="Evolution">evolutionary</a> advantage due to a reduction in <a href="https://en.wikipedia.org/wiki/Genome" title="Genome">genome</a> size and due to the opportunity for <a href="https://en.wikipedia.org/wiki/Transcription_(genetics)" title="Transcription (genetics)">transcriptional</a> and <a href="https://en.wikipedia.org/wiki/Translation_(genetics)" title="Translation (genetics)">translational</a> <a href="https://en.wikipedia.org/wiki/Gene_regulation" title="Gene regulation">co-regulation</a> of the overlapping genes. Gene overlaps introduce novel evolutionary constraints on the sequences of the overlap regions.</p> | + | <p> The use of the same nucleotide sequence to encode multiple genes may provide an <a href="https://en.wikipedia.org/wiki/Evolution" title="Evolution">evolutionary</a> advantage due to a reduction in <a href="https://en.wikipedia.org/wiki/Genome" title="Genome">genome</a> size and due to the opportunity for <a href="https://en.wikipedia.org/wiki/Transcription_(genetics)" title="Transcription (genetics)">transcriptional</a> and <a href="https://en.wikipedia.org/wiki/Translation_(genetics)" title="Translation (genetics)">translational</a> <a href="https://en.wikipedia.org/wiki/Gene_regulation" title="Gene regulation">co-regulation</a> of the overlapping genes. Gene overlaps introduce novel evolutionary constraints on the sequences of the overlap regions.</p> |
<h3>Origins of new genes</h3> | <h3>Origins of new genes</h3> | ||
− | <p>In 1977, <a href="https://en.wikipedia.org/wiki/Pierre-Paul_Grass%C3%A9" title="Pierre-Paul Grassé">Pierre-Paul Grassé</a> proposed that one of the genes in the pair could have originated <em>de novo</em> by mutations to introduce novel ORFs in alternate reading frames; he described the mechanism as <em>overprinting</em>. It was later substantiated by <a href="https://en.wikipedia.org/wiki/Susumu_Ohno" title="Susumu Ohno">Susumu Ohno</a>, who identified a candidate gene that may have arisen by this mechanism. Some de novo genes originating in this way may not remain overlapping, but <a href="https://en.wikipedia.org/wiki/Subfunctionalization" title="Subfunctionalization">subfunctionalize</a> following <a href="https://en.wikipedia.org/wiki/Gene_duplication" title="Gene duplication">gene duplication</a>, contributing to the prevalence of <a href="https://en.wikipedia.org/wiki/Orphan_gene" title="Orphan gene">orphan genes</a>. Which member of an overlapping gene pair is younger can be identified <a href="https://en.wikipedia.org/wiki/Bioinformatic" title="Bioinformatic">bioinformatically</a> either by a more restricted <a href="https://en.wikipedia.org/wiki/Phylogenetic" title="Phylogenetic">phylogenetic</a> distribution, or by less optimized <a href="https://en.wikipedia.org/wiki/Codon_usage" title="Codon usage">codon usage</a>. Younger members of the pair tend to have higher <a href="https://en.wikipedia.org/wiki/Intrinsically_disordered_proteins" title="Intrinsically disordered proteins">intrinsic structural disorder</a> than older members, but the older members are also more disordered than other proteins, presumably as a way of alleviating the increased evolutionary constraints posed by overlap. Overlaps are more likely to originate in proteins that already have high disorder.</p> | + | <p> In 1977, <a href="https://en.wikipedia.org/wiki/Pierre-Paul_Grass%C3%A9" title="Pierre-Paul Grassé">Pierre-Paul Grassé</a> proposed that one of the genes in the pair could have originated <em>de novo</em> by mutations to introduce novel ORFs in alternate reading frames; he described the mechanism as <em>overprinting</em>. It was later substantiated by <a href="https://en.wikipedia.org/wiki/Susumu_Ohno" title="Susumu Ohno">Susumu Ohno</a>, who identified a candidate gene that may have arisen by this mechanism. Some de novo genes originating in this way may not remain overlapping, but <a href="https://en.wikipedia.org/wiki/Subfunctionalization" title="Subfunctionalization">subfunctionalize</a> following <a href="https://en.wikipedia.org/wiki/Gene_duplication" title="Gene duplication">gene duplication</a>, contributing to the prevalence of <a href="https://en.wikipedia.org/wiki/Orphan_gene" title="Orphan gene">orphan genes</a>. Which member of an overlapping gene pair is younger can be identified <a href="https://en.wikipedia.org/wiki/Bioinformatic" title="Bioinformatic">bioinformatically</a> either by a more restricted <a href="https://en.wikipedia.org/wiki/Phylogenetic" title="Phylogenetic">phylogenetic</a> distribution, or by less optimized <a href="https://en.wikipedia.org/wiki/Codon_usage" title="Codon usage">codon usage</a>. Younger members of the pair tend to have higher <a href="https://en.wikipedia.org/wiki/Intrinsically_disordered_proteins" title="Intrinsically disordered proteins">intrinsic structural disorder</a> than older members, but the older members are also more disordered than other proteins, presumably as a way of alleviating the increased evolutionary constraints posed by overlap. Overlaps are more likely to originate in proteins that already have high disorder.</p> |
<h2>Taxonomic distribution</h2> | <h2>Taxonomic distribution</h2> | ||
Line 48: | Line 49: | ||
<h3>Viruses</h3> | <h3>Viruses</h3> | ||
− | <p>The existence of overlapping genes was first identified in viruses; the first DNA genome ever sequenced, of the <a href="https://en.wikipedia.org/wiki/Bacteriophage" title="Bacteriophage">bacteriophage</a> <a href="https://en.wikipedia.org/wiki/%CE%A6X174" title="ΦX174">ΦX174</a>, contained several examples. Overlapping genes are particularly common in <a href="https://en.wikipedia.org/wiki/Virus" title="Virus">viral</a> genomes. Some studies attribute this observation to <a href="https://en.wikipedia.org/wiki/Selective_pressure" title="Selective pressure">selective pressure</a> toward small genome sizes mediated by the physical constraints of packaging the genome in a <a href="https://en.wikipedia.org/wiki/Viral_capsid" title="Viral capsid">viral capsid</a>, particularly one of <a href="https://en.wikipedia.org/wiki/Icosahedral" title="Icosahedral">icosahedral</a> geometry. However, other studies dispute this conclusion and argue that the distribution of overlaps in viral genomes is more likely to reflect overprinting as the evolutionary origin of overlapping viral genes. Overprinting is a common source of <em>de novo</em> genes in viruses.</p> | + | <p> The existence of overlapping genes was first identified in viruses; the first DNA genome ever sequenced, of the <a href="https://en.wikipedia.org/wiki/Bacteriophage" title="Bacteriophage">bacteriophage</a> <a href="https://en.wikipedia.org/wiki/%CE%A6X174" title="ΦX174">ΦX174</a>, contained several examples. Overlapping genes are particularly common in <a href="https://en.wikipedia.org/wiki/Virus" title="Virus">viral</a> genomes. Some studies attribute this observation to <a href="https://en.wikipedia.org/wiki/Selective_pressure" title="Selective pressure">selective pressure</a> toward small genome sizes mediated by the physical constraints of packaging the genome in a <a href="https://en.wikipedia.org/wiki/Viral_capsid" title="Viral capsid">viral capsid</a>, particularly one of <a href="https://en.wikipedia.org/wiki/Icosahedral" title="Icosahedral">icosahedral</a> geometry. However, other studies dispute this conclusion and argue that the distribution of overlaps in viral genomes is more likely to reflect overprinting as the evolutionary origin of overlapping viral genes. Overprinting is a common source of <em>de novo</em> genes in viruses.</p> |
− | <p>Studies of overprinted viral genes suggest that their protein products tend to be accessory proteins which are not <a href="https://en.wikipedia.org/wiki/Essential_gene" title="Essential gene">essential</a> to viral proliferation, but contribute to <a href="https://en.wikipedia.org/wiki/Pathogenicity" title="Pathogenicity">pathogenicity</a>. Overprinted proteins often have unusual <a href="https://en.wikipedia.org/wiki/Amino_acid" title="Amino acid">amino acid</a> distributions and high levels of intrinsic <a href="https://en.wikipedia.org/wiki/Disordered_protein" title="Disordered protein">disorder</a>. In some cases overprinted proteins do have well-defined, but novel, three-dimensional structures; one example is the <a href="https://en.wikipedia.org/wiki/RNA_silencing_suppressor_p19" title="RNA silencing suppressor p19">RNA silencing suppressor p19</a> found in <a href="https://en.wikipedia.org/wiki/Tombusvirus" title="Tombusvirus">Tombusviruses</a>, which has both a novel <a href="https://en.wikipedia.org/wiki/Protein_fold" title="Protein fold">protein fold</a> and a novel binding mode in recognizing <a href="https://en.wikipedia.org/wiki/SiRNA" title="SiRNA">siRNAs</a>.</p> | + | <p> Studies of overprinted viral genes suggest that their protein products tend to be accessory proteins which are not <a href="https://en.wikipedia.org/wiki/Essential_gene" title="Essential gene">essential</a> to viral proliferation, but contribute to <a href="https://en.wikipedia.org/wiki/Pathogenicity" title="Pathogenicity">pathogenicity</a>. Overprinted proteins often have unusual <a href="https://en.wikipedia.org/wiki/Amino_acid" title="Amino acid">amino acid</a> distributions and high levels of intrinsic <a href="https://en.wikipedia.org/wiki/Disordered_protein" title="Disordered protein">disorder</a>. In some cases overprinted proteins do have well-defined, but novel, three-dimensional structures; one example is the <a href="https://en.wikipedia.org/wiki/RNA_silencing_suppressor_p19" title="RNA silencing suppressor p19">RNA silencing suppressor p19</a> found in <a href="https://en.wikipedia.org/wiki/Tombusvirus" title="Tombusvirus">Tombusviruses</a>, which has both a novel <a href="https://en.wikipedia.org/wiki/Protein_fold" title="Protein fold">protein fold</a> and a novel binding mode in recognizing <a href="https://en.wikipedia.org/wiki/SiRNA" title="SiRNA">siRNAs</a>.</p> |
<h3>Prokaryotes</h3> | <h3>Prokaryotes</h3> | ||
− | <p>Estimates of gene overlap in <a href="https://en.wikipedia.org/wiki/Bacteria" title="Bacteria">bacterial</a> genomes typically find that around one third of bacterial genes are overlapped, though usually only by a few base pairs. Most studies of overlap in bacterial genomes find evidence that overlap serves a function in <a href="https://en.wikipedia.org/wiki/Gene_regulation" title="Gene regulation">gene regulation</a>, permitting the overlapped genes to be <a href="https://en.wikipedia.org/wiki/Transcription_(genetics)" title="Transcription (genetics)">transcriptionally</a> and <a href="https://en.wikipedia.org/wiki/Translation_(genetics)" title="Translation (genetics)">translationally</a> co-regulated. In prokaryotic genomes, unidirectional overlaps are most common, possibly due to the tendency of adjacent prokaryotic genes to share orientation. Among unidirectional overlaps, long overlaps are more commonly read with a one-nucleotide offset in reading frame (i.e., phase 1) and short overlaps are more commonly read in phase 2. Long overlaps of greater than 60 <a href="https://en.wikipedia.org/wiki/Base_pair" title="Base pair">base pairs</a> are more common for convergent genes; however, putative long overlaps have very high rates of <a href="https://en.wikipedia.org/wiki/Genome_annotation" title="Genome annotation">misannotation</a>. Robustly validated examples of long overlaps in bacterial genomes are rare; in the well-studied <a href="https://en.wikipedia.org/wiki/Model_organism" title="Model organism">model organism</a> <em><a href="https://en.wikipedia.org/wiki/Escherichia_coli" title="Escherichia coli">Escherichia coli</a></em>, only four gene pairs are well validated as having long, overprinted overlaps.</p> | + | <p> Estimates of gene overlap in <a href="https://en.wikipedia.org/wiki/Bacteria" title="Bacteria">bacterial</a> genomes typically find that around one third of bacterial genes are overlapped, though usually only by a few base pairs. Most studies of overlap in bacterial genomes find evidence that overlap serves a function in <a href="https://en.wikipedia.org/wiki/Gene_regulation" title="Gene regulation">gene regulation</a>, permitting the overlapped genes to be <a href="https://en.wikipedia.org/wiki/Transcription_(genetics)" title="Transcription (genetics)">transcriptionally</a> and <a href="https://en.wikipedia.org/wiki/Translation_(genetics)" title="Translation (genetics)">translationally</a> co-regulated. In prokaryotic genomes, unidirectional overlaps are most common, possibly due to the tendency of adjacent prokaryotic genes to share orientation. Among unidirectional overlaps, long overlaps are more commonly read with a one-nucleotide offset in reading frame (i.e., phase 1) and short overlaps are more commonly read in phase 2. Long overlaps of greater than 60 <a href="https://en.wikipedia.org/wiki/Base_pair" title="Base pair">base pairs</a> are more common for convergent genes; however, putative long overlaps have very high rates of <a href="https://en.wikipedia.org/wiki/Genome_annotation" title="Genome annotation">misannotation</a>. Robustly validated examples of long overlaps in bacterial genomes are rare; in the well-studied <a href="https://en.wikipedia.org/wiki/Model_organism" title="Model organism">model organism</a> <em><a href="https://en.wikipedia.org/wiki/Escherichia_coli" title="Escherichia coli">Escherichia coli</a></em>, only four gene pairs are well validated as having long, overprinted overlaps.</p> |
<h3>Eukaryotes</h3> | <h3>Eukaryotes</h3> | ||
− | <p>Compared to prokaryotic genomes, eukaryotic genomes are often poorly annotated and thus identifying genuine overlaps is relatively challenging. However, examples of validated gene overlaps have been documented in a variety of eukaryotic organisms, including mammals such as mice and humans.<span style="font-size:10.8333px"> </span>Eukaryotes differ from prokaryotes in distribution of overlap types: while unidirectional (i.e., same-strand) overlaps are most common in prokaryotes, opposite or antiparallel-strand overlaps are more common in eukaryotes. Among the opposite-strand overlaps, convergent orientation is most common. Most studies of eukaryotic gene overlap have found that overlapping genes are extensively subject to genomic reorganization even in closely related species, and thus the presence of an overlap is not always well-conserved Overlap with older or less taxonomically restricted genes is also a common feature of genes likely to have originated <em>de novo</em> in a given eukaryotic lineage.</p> | + | <p> Compared to prokaryotic genomes, eukaryotic genomes are often poorly annotated and thus identifying genuine overlaps is relatively challenging. However, examples of validated gene overlaps have been documented in a variety of eukaryotic organisms, including mammals such as mice and humans.<span style="font-size:10.8333px"> </span>Eukaryotes differ from prokaryotes in distribution of overlap types: while unidirectional (i.e., same-strand) overlaps are most common in prokaryotes, opposite or antiparallel-strand overlaps are more common in eukaryotes. Among the opposite-strand overlaps, convergent orientation is most common. Most studies of eukaryotic gene overlap have found that overlapping genes are extensively subject to genomic reorganization even in closely related species, and thus the presence of an overlap is not always well-conserved Overlap with older or less taxonomically restricted genes is also a common feature of genes likely to have originated <em>de novo</em> in a given eukaryotic lineage.</p> |
<p> </p> | <p> </p> |
Latest revision as of 00:37, 7 December 2018
Overlapping Gene
An overlapping gene is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene. In this way, a nucleotide sequence may make a contribution to the function of one or more gene products. Overprinting refers to a type of overlap in which all or part of the sequence of one gene is read in an alternate reading frame from another gene at the same locus. Overprinting has been hypothesized as a mechanism for de novo emergence of new genes from existing sequences, either older genes or previously non-coding regions of the genome. Overprinted genes are particularly common features of the genomic organization of viruses, likely to greatly increase the number of potential expressible genes from a small set of viral genetic information.
Contents
Classification
Genes may overlap in a variety of ways and can be classified by their positions relative to each other.
- Unidirectional or tandem overlap: the 3' end of one gene overlaps with the 5' end of another gene on the same strand. This arrangement can be symbolized with the notation → → where arrows indicate the reading frame from start to end.
- Convergent or end-on overlap: the 3' ends of the two genes overlap on opposite strands. This can be written as → ←.
- Divergent or tail-on overlap: the 5' ends of the two genes overlap on opposite strands. This can be written as ← →.
Overlapping genes can also be classified by phases, which describe their relative reading frames:
- In-phase overlap occurs when the shared sequences use the same reading frame. This is also known as "phase 0". Unidirectional genes with phase 0 overlap are not considered distinct genes, but rather as alternative start sites of the same gene.
- Out-of-phase overlaps occur when the shared sequences use different reading frames. This can occur in "phase 1" or "phase 2", depending on whether the reading frames are offset by 1 or 2 nucleotides. Because a codon is three nucleotides long, an offset of three nucleotides is an in-phase, phase 0 frame.
Evolution
Overlapping genes are particularly common in rapidly evolving genomes, such as those of viruses, bacteria, and mitochondria. They may originate in three ways:
- By extension of an existing open reading frame (ORF) downstream into a contiguous gene due to the loss of a stop codon;
- By extension of an existing ORF upstream into a contiguous gene due to loss of an initiation codon;
- By generation of a novel ORF within an existing one due to a point mutation.
The use of the same nucleotide sequence to encode multiple genes may provide an evolutionary advantage due to a reduction in genome size and due to the opportunity for transcriptional and translational co-regulation of the overlapping genes. Gene overlaps introduce novel evolutionary constraints on the sequences of the overlap regions.
Origins of new genes
In 1977, Pierre-Paul Grassé proposed that one of the genes in the pair could have originated de novo by mutations to introduce novel ORFs in alternate reading frames; he described the mechanism as overprinting. It was later substantiated by Susumu Ohno, who identified a candidate gene that may have arisen by this mechanism. Some de novo genes originating in this way may not remain overlapping, but subfunctionalize following gene duplication, contributing to the prevalence of orphan genes. Which member of an overlapping gene pair is younger can be identified bioinformatically either by a more restricted phylogenetic distribution, or by less optimized codon usage. Younger members of the pair tend to have higher intrinsic structural disorder than older members, but the older members are also more disordered than other proteins, presumably as a way of alleviating the increased evolutionary constraints posed by overlap. Overlaps are more likely to originate in proteins that already have high disorder.
Taxonomic distribution
Overlapping genes in the bacteriophage ΦX174 genome. There are 11 genes in this genome (A, A*, B-H, J, K). Genes B, K, E overlap with genes A, C, D.
Overlapping genes occur in all domains of life, though with varying frequencies. They are especially common in viral genomes.
Viruses
The existence of overlapping genes was first identified in viruses; the first DNA genome ever sequenced, of the bacteriophage ΦX174, contained several examples. Overlapping genes are particularly common in viral genomes. Some studies attribute this observation to selective pressure toward small genome sizes mediated by the physical constraints of packaging the genome in a viral capsid, particularly one of icosahedral geometry. However, other studies dispute this conclusion and argue that the distribution of overlaps in viral genomes is more likely to reflect overprinting as the evolutionary origin of overlapping viral genes. Overprinting is a common source of de novo genes in viruses.
Studies of overprinted viral genes suggest that their protein products tend to be accessory proteins which are not essential to viral proliferation, but contribute to pathogenicity. Overprinted proteins often have unusual amino acid distributions and high levels of intrinsic disorder. In some cases overprinted proteins do have well-defined, but novel, three-dimensional structures; one example is the RNA silencing suppressor p19 found in Tombusviruses, which has both a novel protein fold and a novel binding mode in recognizing siRNAs.
Prokaryotes
Estimates of gene overlap in bacterial genomes typically find that around one third of bacterial genes are overlapped, though usually only by a few base pairs. Most studies of overlap in bacterial genomes find evidence that overlap serves a function in gene regulation, permitting the overlapped genes to be transcriptionally and translationally co-regulated. In prokaryotic genomes, unidirectional overlaps are most common, possibly due to the tendency of adjacent prokaryotic genes to share orientation. Among unidirectional overlaps, long overlaps are more commonly read with a one-nucleotide offset in reading frame (i.e., phase 1) and short overlaps are more commonly read in phase 2. Long overlaps of greater than 60 base pairs are more common for convergent genes; however, putative long overlaps have very high rates of misannotation. Robustly validated examples of long overlaps in bacterial genomes are rare; in the well-studied model organism Escherichia coli, only four gene pairs are well validated as having long, overprinted overlaps.
Eukaryotes
Compared to prokaryotic genomes, eukaryotic genomes are often poorly annotated and thus identifying genuine overlaps is relatively challenging. However, examples of validated gene overlaps have been documented in a variety of eukaryotic organisms, including mammals such as mice and humans. Eukaryotes differ from prokaryotes in distribution of overlap types: while unidirectional (i.e., same-strand) overlaps are most common in prokaryotes, opposite or antiparallel-strand overlaps are more common in eukaryotes. Among the opposite-strand overlaps, convergent orientation is most common. Most studies of eukaryotic gene overlap have found that overlapping genes are extensively subject to genomic reorganization even in closely related species, and thus the presence of an overlap is not always well-conserved Overlap with older or less taxonomically restricted genes is also a common feature of genes likely to have originated de novo in a given eukaryotic lineage.
Reference
1. "Overlapping gene", from Wikipedia