2.3. Data processing and full-length transcript functional
annotation
The raw data (subreads) were filtered and corrected to obtain circular
consensus sequences, in which the adaptors, barcodes, polyA, and chimera
were eliminated and then polished using isoseq3 software
(Abdel-Ghany et al. 2016) to obtain
isoform sequences. After removing low-quality isoform sequences, based
on min passes = 2 and min predicted accuracy = 99%, the high-quality
isoform sequences were obtained and clustered into a unigene sequence
via CD-HIT (Fu et al. 2012) with
an identification of 98% model. The completeness of the unigene
sequence was assessed by BUSCO with the arthropoda_odb9 database
(Simao et al. 2015), and annotated
functionally, using diamond software, with an e value of
e<1e-5 based on six different databases: non-redundant
sequences (NR), eukaryotic ortholog groups (KOG), Gene Ontology (GO),
Swissprot, Evolutionary Genealogy of Genes Non-supervised Orthologous
Groups (eggNOG) and Kyoto Encyclopedia of Genes and Genomes (KEGG)
databases (Bairoch et al. 2010;
Kanehisa et al. 2000). The protein
families were assigned by the HMMER 3.1 package (http://hmmer.
org/download.html) with the Pfam database
(Mistry et al. 2021).