2.3. Data processing and full-length transcript functional annotation
The raw data (subreads) were filtered and corrected to obtain circular consensus sequences, in which the adaptors, barcodes, polyA, and chimera were eliminated and then polished using isoseq3 software (Abdel-Ghany et al. 2016) to obtain isoform sequences. After removing low-quality isoform sequences, based on min passes = 2 and min predicted accuracy = 99%, the high-quality isoform sequences were obtained and clustered into a unigene sequence via CD-HIT (Fu et al. 2012) with an identification of 98% model. The completeness of the unigene sequence was assessed by BUSCO with the arthropoda_odb9 database (Simao et al. 2015), and annotated functionally, using diamond software, with an e value of e<1e-5 based on six different databases: non-redundant sequences (NR), eukaryotic ortholog groups (KOG), Gene Ontology (GO), Swissprot, Evolutionary Genealogy of Genes Non-supervised Orthologous Groups (eggNOG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases (Bairoch et al. 2010; Kanehisa et al. 2000). The protein families were assigned by the HMMER 3.1 package (http://hmmer. org/download.html) with the Pfam database (Mistry et al. 2021).