3.1 Genome assembly and integrity assessment
Using 90.47 Gb Illumina data, the genome size of male O. bidenswas estimated to be 813.1 Mb based on the 17-kmer peak with a depth of 111.3 × coverage (Figure S1; Table S2). The 99.7 Gb data obtained from Nanopore sequencing indicated a 122.6-fold range of the genome (Table S2). Low-quality reads and adapter sequences were eliminated from the raw data to give 95.1 Gb clean reads for subsequent genome assembly. After the Nanopore clean reads were corrected and polished, 992.9 Mb sequences were obtained for the male O. bidens genome, including 1,373 contigs with a contig N50 length of 5.2 Mb. The assembled genome size was larger than the estimated one because of a high heterozygosity ratio of 0.58% (Figure S1), as previously described (Xiao et al., 2019). The GC content of the whole genome was 37.9%. The mapping rate of Illumina short, clean reads with the entire genome was 99.5%. Out of the 458 genes in the eukaryotic genome CEG database, 456 were present in assemblies (99.6%), and 97.5% of BUSCO genes were wholly found in the genome of O. bidens using the vertebrata_odb10 database (Table S3). This evidence indicates a complete high-quality genome.
Raw reads of 55.96 Gb by Hi-C library sequencing of approximately 68.8 × coverage of the genome were used to construct a chromosome-level assembly (Table S2). A total of 218.09 million read pairs (64.4%) were uniquely mapped to the nanopore draft genome. Finally, 54.15 million read pairs (24.83%) provided valid interaction information for chromosome construction. Using the Hi-C valid information, 1,864 contigs (approximately 992.91 Mb) were produced and further clustered into 1,461 scaffolds anchored on the 38 chromosomes. The boundaries between different chromosomes were clear, and every chromosome showed strong interactions (Figure 1a), which was consistent with the karyotype of the male hook snout carp (Figure 1b). The contig and scaffold N50 values reached 2.85 Mb and 19.44, respectively (Table S1). The final genome at the chromosome level was 886.81 Mb, representing 89.31% of nanopore genome sequences.