nf-core/rnaseq: nf-core/rnaseq v2.0 - Titanium Tiger

20200 citationsJournal Articlegreen Open Access

Authors

Harshil Patel · The Francis Crick Institute

Phil Ewels · Science for Life Laboratory

Alexander Peltzer · Boehringer Ingelheim (Egypt)

Rickard Hammarén

Olga Botvinnik · Všeobecná Uverová Banka

Denis Moreno

Pranathi Vemuri · Chan Zuckerberg Biohub San Francisco

Gregor Sturm

Abstract

[2.0] - 2020-11-12 Major enhancements Pipeline has been re-implemented in Nextflow DSL2 All software containers are now exclusively obtained from Biocontainers Added a separate workflow to download FastQ files via SRA, ENA or GEO ids and to auto-create the input samplesheet (<code>ENA FTP</code>; see <code>--public_data_ids</code> parameter) Added and refined a Groovy <code>lib/</code> of functions that include the automatic rendering of parameters defined in the JSON schema for the help and summary log information Replace edgeR with DESeq2 for the generation of PCA and heatmaps (also included in the MultiQC report) Creation of bigWig coverage files using BEDTools and bedGraphToBigWig [#70] - Added new genome mapping and quantification route with RSEM via the <code>--aligner star_rsem</code> parameter [#72] - Samples skipped due to low alignment reported in the MultiQC report [#73, #435] - UMI barcode support [#91] - Ability to concatenate multiple runs of the same samples via the input samplesheet [#123] - The primary input for the pipeline has changed from <code>--reads</code> glob to samplesheet <code>--input</code>. See usage docs. [#197] - Samples failing strand-specificity checks reported in the MultiQC report [#227] - Removal of ribosomal RNA via SortMeRNA [#419] - Add <code>--additional_fasta</code> parameter to provide ERCC spike-ins, transgenes such as GFP or CAR-T as additional sequences to align to Other enhancements & fixes Updated pipeline template to nf-core/tools <code>1.11</code> Optimise MultiQC configuration for faster run-time on huge sample numbers Add information about SILVA licensing when removing rRNA to <code>usage.md</code> Fixed ansi colours for pipeline summary, added summary logs of alignment results [#281] - Add nag to cite the pipeline in summary [#302] - Fixed MDS plot axis labels [#338] - Add option for turning on/off STAR command line option (--sjdbGTFfile) [#344] - Added multi-core TrimGalore support [#351] - Fixes missing Qualimap parameter <code>-p</code> [#353] - Fixes an issue where MultiQC fails to run with <code>--skip_biotype_qc</code> option [#357] - Fixes broken links [#362] - Fix error with gzipped annotation file [#384] - Changed SortMeRNA reference dbs path to use stable URLs (v4.2.0) [#396] - Deterministic mapping for STAR aligner [#412] - Fix Qualimap not being passed on correct strand-specificity parameter [#413] - Fix STAR unmapped reads not output [#434] - Fix typo reported for work-dir [#437] - FastQC uses correct number of threads now [#440] - Fixed issue where featureCounts process fails when setting <code>--fc_count_type</code> to gene [#452] - Fix <code>--gff</code> input bug [#345] - Fixes label name in FastQC process [#391] - Make publishDir mode configurable [#431] - Update AWS GitHub actions workflow with organization level secrets [#435] - Fix a bug where gzipped references were not extracted when <code>--additional_fasta</code> was not specified [#435] - Fix a bug where merging of RSEM output would fail if only one fastq provided as input [#435] - Correct RSEM output name (was saving counts but calling them TPMs; now saving both properly labelled) [#436] - Fix a bug where the RSEM reference could not be built [#458] - Fix <code>TMP_DIR</code> for process MarkDuplicates and Qualimap Parameters Updated Old parameter New parameter <code>--reads</code> <code>--input</code> <code>--igenomesIgnore</code> <code>--igenomes_ignore</code> <code>--removeRiboRNA</code> <code>--remove_ribo_rna</code> <code>--rRNA_database_manifest</code> <code>--ribo_database_manifest</code> <code>--save_nonrRNA_reads</code> <code>--save_non_ribo_reads</code> <code>--saveAlignedIntermediates</code> <code>--save_align_intermeds</code> <code>--saveReference</code> <code>--save_reference</code> <code>--saveTrimmed</code> <code>--save_trimmed</code> <code>--saveUnaligned</code> <code>--save_unaligned</code> <code>--skipAlignment</code> <code>--skip_alignment</code> <code>--skipBiotypeQC</code> <code>--skip_biotype_qc</code> <code>--skipDupRadar</code> <code>--skip_dupradar</code> <code>--skipFastQC</code> <code>--skip_fastqc</code> <code>--skipMultiQC</code> <code>--skip_multiqc</code> <code>--skipPreseq</code> <code>--skip_preseq</code> <code>--skipQC</code> <code>--skip_qc</code> <code>--skipQualimap</code> <code>--skip_qualimap</code> <code>--skipRseQC</code> <code>--skip_rseqc</code> <code>--skipTrimming</code> <code>--skip_trimming</code> <code>--stringTieIgnoreGTF</code> <code>--stringtie_ignore_gtf</code> Added <code>--additional_fasta</code> - FASTA file to concatenate to genome FASTA file e.g. containing spike-in sequences <code>--deseq2_vst</code> - Use vst transformation instead of rlog with DESeq2 <code>--enable_conda</code> - Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter <code>--min_mapped_reads</code> - Minimum percentage of uniquely mapped reads below which samples are removed from further processing <code>--multiqc_title</code> - MultiQC report title. Printed as page header, used for filename if not otherwise specified <code>--public_data_ids</code> - File containing SRA/ENA/GEO identifiers one per line in order to download their associated FastQ files <code>--publish_dir_mode</code> - Method used to save pipeline results to output directory <code>--rsem_index</code> - Path to directory or tar.gz archive for pre-built RSEM index <code>--rseqc_modules</code> - Specify the RSeQC modules to run <code>--save_merged_fastq</code> - Save FastQ files after merging re-sequenced libraries in the results directory <code>--save_umi_intermeds</code> - If this option is specified, intermediate FastQ and BAM files produced by UMI-tools are also saved in the results directory <code>--skip_bigwig</code> - Skip bigWig file creation <code>--skip_deseq2_qc</code> - Skip DESeq2 PCA and heatmap plotting <code>--skip_featurecounts</code> - Skip featureCounts <code>--skip_markduplicates</code> - Skip picard MarkDuplicates step <code>--skip_sra_fastq_download</code> - Only download metadata for public data database ids and don't download the FastQ files <code>--skip_stringtie</code> - Skip StringTie <code>--star_ignore_sjdbgtf</code> - See #338 <code>--umitools_bc_pattern</code> - The UMI barcode pattern to use e.g. 'NNNNNN' indicates that the first 6 nucleotides of the read are from the UMI <code>--umitools_extract_method</code> - UMI pattern to use. Can be either 'string' (default) or 'regex' <code>--with_umi</code> - Enable UMI-based read deduplication Removed <code>--awsqueue</code> can now be provided via nf-core/configs if using AWS <code>--awsregion</code> can now be provided via nf-core/configs if using AWS <code>--compressedReference</code> now auto-detected <code>--markdup_java_options</code> in favour of updating centrally on nf-core/modules <code>--project</code> parameter from old NGI template <code>--readPaths</code> is not required since these are provided from the input samplesheet <code>--sampleLevel</code> not required <code>--singleEnd</code> is now auto-detected from the input samplesheet <code>--skipEdgeR</code> qc not performed by DESeq2 instead <code>--star_memory</code> in favour of updating centrally on nf-core/modules if required Strandedness is now specified at the sample-level via the input samplesheet <code>--forwardStranded</code> <code>--reverseStranded</code> <code>--unStranded</code> <code>--pico</code> Software dependencies Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. Dependency Old version New version <code>bioconductor-dupradar</code> 1.14.0 1.18.0 <code>bioconductor-summarizedexperiment</code> 1.14.0 1.18.1 <code>bioconductor-tximeta</code> 1.2.2 1.6.3 <code>fastqc</code> 0.11.8 0.11.9 <code>gffread</code> 0.11.4 0.12.1 <code>hisat2</code> 2.1.0 2.2.0 <code>multiqc</code> 1.7 1.9 <code>picard</code> 2.21.1 2.23.8 <code>qualimap</code> 2.2.2c 2.2.2d <code>r-base</code> 3.6.1 4.0.3 <code>salmon</code> 0.14.2 1.3.0 <code>samtools</code> 1.9 1.10 <code>sortmerna</code> 2.1b 4.2.0 <code>stringtie</code> 2.0 2.1.4 <code>subread</code> 1.6.4 2.0.1 <code>trim-galore</code> 0.6.4 0.6.6 <code>bedtools</code> - 2.29.2 <code>bioconductor-biocparallel</code> - 1.22.0 <code>bioconductor-complexheatmap</code> - 2.4.2 <code>bioconductor-deseq2</code> - 1.28.0 <code>bioconductor-tximport</code> - 1.16.0 <code>perl</code> - 5.26.2 <code>python</code> - 3.8.3 <code>r-ggplot2</code> - 3.3.2 <code>r-optparse</code> - 1.6.6 <code>r-pheatmap</code> - 1.0.12 <code>r-rcolorbrewer</code> - 1.1_2 <code>rsem</code> - 1.3.3 <code>ucsc-bedgraphtobigwig</code> - 377 <code>umi_tools</code> - 1.0.1 <code>bioconductor-edger</code> - - <code>deeptools</code> - - <code>matplotlib</code> - - <code>r-data.table</code> - - <code>r-gplots</code> - - <code>r-markdown</code> - - NB: Dependency has been updated if both old and new version information is present. NB: Dependency has been added if just the new version information is present. NB: Dependency has been removed if version information isn't present.

Topics & Keywords

Single-cell and spatial transcriptomics Genomics and Phylogenetic Studies CRISPR and Genetic Engineering

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.4270402

Command Palette

nf-core/rnaseq: nf-core/rnaseq v2.0 - Titanium Tiger

Authors

Abstract

Topics & Keywords

Publication Details