Search for a command to run...
Here we provide the gene annotations for stickleback (Gasterosteus aculeatus). We provide these for both convenience and because some of the functional annotations of genes/proteins are removed when we prepare these for uploading to ENA. We also provide the FASTA files for the assemblies we have made. We annotated the genome assemblies using a pre-release version of the EBP-Nor genome annotation pipeline (https://github.com/ebp-nor/GenomeAnnotation). First, AGAT (https://zenodo.org/record/7255559) agat_sp_keep_longest_isoform.pl and agat_sp_extract_sequences.pl were used on the GRCz11 genome assembly and annotation to generate one protein (the longest isoform) per gene. Miniprot (Li, 2023) was used to align the proteins to the curated assemblies. UniProtKB/Swiss-Prot (Consortium et al., 2023) release 2025_03 in addition to the Actinopterygii part of OrthoDB v11 (Kuznetsov et al., 2022) were also aligned separately to the assemblies. Red (Girgis, 2015) was run via redmask (https://github.com/nextgenusfs/redmask) on the assemblies to mask repetitive areas. GALBA (Brůna et al., 2023; Buchfink et al., 2015; Hoff and Stanke, 2018; Li, 2023; Stanke et al., 2006) was run with the GRCz11 proteins using the miniprot mode on the masked assemblies. The funannotate-runEVM.py script from Funannotate was used to run EvidenceModeler (Haas et al., 2008) on the alignments of GRCz11 proteins, UniProtKB/Swiss-Prot proteins, Actinopterygii proteins and the predicted genes from GALBA. The resulting predicted proteins were compared to the protein repeats that Funannotate distributes using DIAMOND blastp, and the predicted genes were filtered based on this comparison using AGAT. The filtered proteins were compared to the UniProtKB/Swiss-Prot release 2025_03 using DIAMOND (Buchfink et al., 2015) blastp to find gene names, and InterProScan was used to discover functional domains. AGATs agat_sp_manage_functional_annotation.pl was used to attach the gene names and functional annotations to the predicted genes. List of files provided here and their description: fGasAcu404.1.hap1.fa.gz - genome assembly of stickleback (hap1) fGasAcu404.1.hap2.fa.gz - genome assembly of stickleback (hap2) fGasAcu404.1.hap2.gff.gz - genome annotation of stickleback (hap1) fGasAcu404.1.hap2.gff.gz - genome annotation of stickleback (hap2) fGasAcu404.1.hap1.proteins.fa.gz - predicted proteins of stickleback (hap1) fGasAcu404.1.hap2.proteins.fa.gz - predicted proteins of stickleback (hap2)