Search for a command to run...
Original article: The in vivo RNA structurome of the malaria parasite Plasmodium falciparum, a protozoan with an A/U-rich transcriptome Dumetz F*, Enright AJ*, Zhao J, Kwok CK, Merrick CJ. PLoS ONE, 2022, 17(9): e0270863, doi: 10.1371/journal.pone.0270863 Oringinal reads are available at ENA project PRJEB44384 Description of the data set The files named combined_overlap contain the dob/bracket sequence resulting from the averaging of both replicates How to read .ct.struct.summary files This file format stores a compact summary of a predicted nucleic acid secondary structure for a single sequence. It combines global folding metrics with base-by-base structural annotations, making it useful for downstream analyses, visualization, and motif/structure interpretation. This README is written for files generated by StructureFold. Overview Each record is represented by five lines: Header line with sequence ID and global structure statistics Sequence line containing the full nucleotide sequence Dot-bracket line representing the predicted secondary structure Structural context line with a per-nucleotide annotation Numeric annotation line with additional position-specific structural labels All annotation lines are position-matched to the sequence, so character n in each line corresponds to nucleotide n in the sequence. Example mal_mito_1:mRNA LEN:786 DELTAG:-287.4 AMFE:-36.5648854961832 MFEDEN:-6.37974554707379 MAXSTRUCTS:9 FREE:5 STEM:516 HAIRPIN:82 MULTI:91 BULGE:92 ATGTTTACGGCACATT... .(((((.((((.............(((((((((... fsssssissssmmmmmmmmmmmmmssssssss... 000000011110000000000000222222222... File structure 1. Header line The first line contains the sequence identifier followed by summary statistics describing the predicted structure. Example: mal_mito_1:mRNA LEN:786 DELTAG:-287.4 AMFE:-36.5648854961832 MFEDEN:-6.37974554707379 MAXSTRUCTS:9 FREE:5 STEM:516 HAIRPIN:82 MULTI:91 BULGE:92 Fields >mal_mito_1:mRNA Sequence identifier. LEN:786 Total sequence length in nucleotides. DELTAG:-287.4 Predicted minimum free energy (ΔG) of the structure. Generally, more negative values indicate a more thermodynamically stable predicted fold. AMFE:-36.5648854961832 Adjusted minimum free energy, typically normalized to sequence length to allow comparison between sequences of different sizes. MFEDEN:-6.37974554707379 Minimum free energy density, another normalized measure of structural stability. MAXSTRUCTS:9 Maximum number of structures considered or summarized for this sequence. FREE:5 Number of nucleotides classified as free or unpaired in the external region. STEM:516 Number of nucleotides located in paired stem regions. HAIRPIN:82 Number of nucleotides located in hairpin loops. MULTI:91 Number of nucleotides located in multibranch loops. BULGE:92 Number of nucleotides located in bulges or internal loop-like regions. 2. Sequence line The second line contains the full nucleotide sequence used for the structure prediction. Example: ATGTTTACGGCACATT... 3. Dot-bracket structure line The third line uses the standard dot-bracket notation to describe the predicted secondary structure. Example: .(((((.((((.............(((((((((... Symbols ( = nucleotide paired with a downstream nucleotide ) = nucleotide paired with an upstream nucleotide . = unpaired nucleotide This line has the same length as the sequence line. 4. Structural context annotation line The fourth line provides a per-base annotation of structural context. Example: fsssssissssmmmmmmmmmmmmmssssssss... Each character corresponds to one nucleotide in the sequence and assigns it to a structural category. Common interpretations include: s = stem h = hairpin loop m = multiloop b or i = bulge or internal loop f = free/external unpaired region t = terminal region Note: the exact letter definitions may depend on the software or pipeline that generated the file. 5. Numeric annotation line The fifth line contains a per-nucleotide structural element label. Example: 000000011110000000000000222222222... For StructureFold-generated .ct.struct.summary files, this line appears to identify which specific structural element each nucleotide belongs to, rather than giving a quantitative score. Coordinate consistency All lines after the header are positionally aligned: Nucleotide 1 in the sequence corresponds to character 1 in the dot-bracket structure, character 1 in the structural context line, and character 1 in the numeric annotation line. This makes it straightforward to map sequence motifs, variants, or experimentally probed positions onto predicted structural features.