Identification of ohnologs from whole genome duplication

Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago, Figure 1.

   

Figure 1. Whole genome duplications in evolution. Whole genome duplications have occurred repeatedly in the course of eukaryote evolution (Singh et al PLoS Comput Biol 2015).  

   
 
 
Paralogs retained from WGD, also coined "ohnologs" after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself, Figure 2. These approaches are thus limited by lineage specific rearrangements in individual genomes.
 
 

 

 

Figure 2. Evolution after WGD and identification of ohnologs. Evolution after WGD and identification of ohnologs using content-based synteny comparison. The genomes of three lineages sharing a common ancestor are shown. Orthologs and paralogs have been depicted by the same color. The WGD lineage (A) underwent whole genome duplication (B) followed by non-functionalization (C) and genome rearrangements (D) leading to the current intragenomic content-based synteny (I). By contrast, the two outgroup genomes without WGD (E, G) experienced lineage specific genome rearrangements (F, H) leading to 1-to-2 content-based synteny pattern with the WGD lineage (J, K). Note, that some ohnolog pairs (D) are only identified by one of the two outgroups (J or K) due to lineage specific rearrangements (Singh et al. PLoS Comput Biol 2015).  

   
 
 
By contrast, We have implemented the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between multiple vertebrates and invertebrate outgroups, Figure 3. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. It also enables to simultaneously identify ohnologs in multiple species, which requires however to take into account the phylogenetically biased sampling of included species by computing a weighted confidence q-score of each included species (Singh & Isambert, Nucleic Acid Res 2020, Online Supplementary Methods). This enables us to identify 2R-ohnologs in 27 vertebrates including 4 teleost fish as well as additional 3R-ohnologs in these 4 teleost fish, which experienced a third round of WGD, Figure 3.
 
 

 

 

Figure 3. Schematic tree for the 27 vertebrates including 4 teleost fish and 5 invertebrate outgroup organisms. Vertebrates analysed for 2R-WGD are in orange, and teleost fish species analysed for 3R-WGD are underlined. Outgroup species used to identify 2R- and 3R-ohnologs have been highlighted. (Singh & Isambert, Nucleic Acid Res 2020).

   
 
 
Methodological details can be found in the associated papers (Singh et al. PLoS Comput Biol 2015 and Nucleic Acid Res 2020) and the open access online server, OHNOLOGS. There, ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria, Figure 4. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases. A synthetic summary of the contents of the ohnolog database is shown on Figure 5.
 
 

 

 

Figure 4. Navigating the OHNOLOGS database. (A) Screenshot of the search page. (B) Result page for a keyword search of 'rat sarcoma viral oncogene' shows the matching genes in human. (C) Ohnolog family page for HRAS gene in the human genome. (D) From the family page, users can navigate to ortholog families in other vertebrates, e.g. zebrafish HRASA. (E) Ohnolog pair page for zebrafish for NRAS gene. (F) Browse/Download page for zebrafish showing both 2R and 3R-ohnolog pairs and families for all the three criteria (Singh & Isambert, Nucleic Acid Res 2020).

 
 
 

 

 

Figure 5. Description of the ohnolog genes, pairs and families in the database. (A) Number of retained individual 2R-ohnolog genes, pairs and families in all the 27 vertebrates. Bars represent the numbers from the intermediate criterion. Ohnologs from strict and relaxed criteria are indicated by dots. (B) Number of retained individual 3R-ohnolog genes, pairs and families in the four teleost fish species. Bars represent the numbers from the intermediate criterion. Ohnologs from strict and relaxed criteria are indicated by dots. (C) Size of the 2R-ohnolog families from the intermediate criterion in vertebrates. Note that a vast majority of the families are of size 2, 3 or 4. (D) Sizes of the 3R-ohnolog families from the intermediate criterion in the teleost fish hardly exceed size two. (E) The 2R-ohnologs are significantly more likely to retain 3R-ohnologs, compared to genome-average. The retention of 3R-ohnologs is even higher for the 2R-ohnologs that belong to family size 3 or 4, and for 2R-ohnologs conserved in all the 27 vertebrates. All the P-values are <1e-41, Chi-square test. Family counts are from the intermediate criterion (Singh & Isambert, Nucleic Acid Res 2020).

   
 
 
 

Related Publications

 

Singh PP, Isambert H: Ohnologs v2: a comprehensive resource for the genes retained from whole genome duplication in vertebrates
Nucleic Acid Res 48(D1):D724-D730 (2020).
Pubmed | DOIpdfsuppbioRxiv | Ohnologs server
[recommended by F1000Prime]

Arbabian A, Iftinca M, Altier C, Singh PP, Isambert H, Coscoy S: Mutations in calmodulin-binding domains of TRPV4/6 channels confer invasive properties to colon adenocarcinoma cells.
Channels 14(1):101-109 (2020).
Pubmed | DOIpdfsupp

Singh P-P, Arora J, Isambert H: Identification of ohnolog genes originating from whole genome duplication in early vertebrates, based on synteny comparison across multiple genomes.
PLoS Comput Biol, 11(7):e1004394 (2015).
Pubmed | PLOSpdfsupp | Ohnologs server

Singh PP, Affeldt S, Cascone I, Selimoglu R, Camonis J, Isambert H: On the expansion of "dangerous" gene repertoires by whole genome duplications in early vertebrates.
Cell Rep, 2(5), 1387-1398 (2012).
Pubmed | CellPresspdfsupp
[featured by Le Point, Biofutur, CNRS]