3.2 Genome annotation
About 357.31 Mb, accounting for 43.23 % of the assembled genome, were identified as repetitive elements, including transposable elements, SSR, and unknown elements. Class I and II transposable elements accounted for 23.34% and 12.59% of the assembled genome, respectively, and SSR accounted for 0.59% (Table 1).
De novo , homology, and transcript-based methods were used to predict the gene models. The expected results were integrated, and 36,738 non-redundant genes were obtained using the EVM software (Table S4). The average exon length per gene was 2 kb (Figure 1c), the average exon number per gene was 8.11, and the mean CDS length was 1,492.64 bp, indicating a relatively fine consistency of the genome assembly. A total of 1,829 pseudogenes, 450 miRNAs, 5,616 rRNAs, and 3,280 tRNAs were annotated in the male O. bidens genome (Table 2). A total of 30,922 genes were functionally annotated using the GO, KEGG, KOG, TrEMBL, and NR databases, representing 84.17% of the predicted genes (Figure 2a). According to GO analysis, these annotated genes were functional in cellular components, molecular functions, and biological processes (Figure 2b). Of the annotated genes in the male O. bidens genome assembly, 5,462 were orthologous to four species,D. rerio , L. rohita , C. carpio, and A. graham (Figure 2c).
3.3 Genome phylogeny, expansion, and contraction of gene families
In total, 4,350 single-copy orthologs and 18,271 gene families were identified in the assembled genome of male O. bidens by clustering homologous genes in S. rhinocerous , A. grahami ,L. rohita , D. rerio , O. latipes , T. rubripes , C. carpio and G. aculeatus (Figure 3a). A phylogenetic tree was constructed using these single-copy orthologues (Figure 3b). Together with calibration times, the result showed that O. bidens was divided from D. rerio at approximately 89.69 Mya, separated fromL. rohita at about 57.77 Mya, then isolated from A. grahami at approximately 29.74 Mya, indicating a rapid differentiation among these species.
A total of 496 and 305 gene families were significantly expanded and contracted in the male O. bidens genome (p < 0.05), respectively (Figure 3b). These gene families were mainly involved in 60 GO terms (Table S5; Figure S2), such as metabolic processes in biological processes, organelles in cellular components, and catalytic activity in molecular functions. The expanded and contracted gene families were enriched in 76 KEGG pathways (Table S6; Figure S3), such as the calcium signaling pathway, ABC transporters, GnRH signaling pathway, melanogenesis, and adrenergic signaling in cardiomyocytes.