3.1 Genome assembly and integrity assessment
Using 90.47 Gb Illumina data, the genome size of male O. bidenswas estimated to be 813.1 Mb based on the 17-kmer peak with a depth of
111.3 × coverage (Figure S1; Table S2). The 99.7 Gb data obtained from
Nanopore sequencing indicated a 122.6-fold range of the genome (Table
S2). Low-quality reads and adapter sequences were eliminated from the
raw data to give 95.1 Gb clean reads for subsequent genome assembly.
After the Nanopore clean reads were corrected and polished, 992.9 Mb
sequences were obtained for the male O. bidens genome, including
1,373 contigs with a contig N50 length of 5.2 Mb. The assembled genome
size was larger than the estimated one because of a high heterozygosity
ratio of 0.58% (Figure S1), as previously described (Xiao et al.,
2019). The GC content of the whole genome was 37.9%. The mapping rate
of Illumina short, clean reads with the entire genome was 99.5%. Out of
the 458 genes in the eukaryotic genome CEG database, 456 were present in
assemblies (99.6%), and 97.5% of BUSCO genes were wholly found in the
genome of O. bidens using the vertebrata_odb10 database (Table
S3). This evidence indicates a complete high-quality genome.
Raw reads of 55.96 Gb by Hi-C library sequencing of approximately 68.8 ×
coverage of the genome were used to construct a chromosome-level
assembly (Table S2). A total of 218.09 million read pairs (64.4%) were
uniquely mapped to the nanopore draft genome. Finally, 54.15 million
read pairs (24.83%) provided valid interaction information for
chromosome construction. Using the Hi-C valid information, 1,864 contigs
(approximately 992.91 Mb) were produced and further clustered into 1,461
scaffolds anchored on the 38 chromosomes. The boundaries between
different chromosomes were clear, and every chromosome showed strong
interactions (Figure 1a), which was
consistent
with the karyotype of the male hook snout carp (Figure 1b). The contig
and scaffold N50 values reached 2.85 Mb and 19.44, respectively (Table
S1). The final genome at the chromosome level was 886.81 Mb,
representing 89.31% of nanopore genome sequences.