Examples

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie

This protocol is created according to this nature protocol paper:

Software Parameter
hisat2 -p {ThreadN} --dta -x {HISAT2_HG38} -1 {InputFile:1} -2 {InputFile:2} -S {EXP}.sam
samtools sort -@ {ThreadN} -o {EXP}.sorted.bam {EXP}.sam
stringtie -p {ThreadN} -G {GENCODE_HG38} -o {EXP}.gtf -l {EXP} {EXP}.sorted.bam

Reference settings

Reference Name Reference Value
HISAT2_HG38 The folder stores hg38 genome index for hisat2, this is generated by hisat2-build
GENCODE_HG38 Path of a gene annotation file, like /mnt/biodata/gencode_hg38_v23.gtf

Job Initial

Input files:

{Uploaded:ERR033015_1.fastq};{Uploaded:ERR033015_2.fastq};
Or /mnt/biodata/samples/ERR033015_1.fastq;/mnt/biodata/samples/ERR033015_2.fastq

Job parameter:

EXP=ERR033015;

Calling variants in RNAseq (STAR-gatk)

This protocol is created according to gatk’s Best-Practices provided by Broad Institute:

Software Parameter
STAR –genomeDir {STAR_HG38} –readFilesIn {InputFile:1} {InputFile:2} –runThreadN {ThreadN}
STAR –runMode genomeGenerate –genomeDir 2pass –genomeFastaFiles {HG38} –sjdbFileChrStartEnd SJ.out.tab –sjdbOverhang 75 –runThreadN {ThreadN}
STAR –genomeDir 2pass –readFilesIn {InputFile:1} {InputFile:2} –runThreadN {ThreadN}
java -jar {picard} AddOrReplaceReadGroups I=Aligned.out.sam O=rg_added_sorted.bam SO=coordinate RGID={RGID} RGLB={RGLB} RGPL={RGPL} RGPU={RGPU} RGSM={RGSM}
java -jar {picard} MarkDuplicates I=rg_added_sorted.bam O=dedupped.bam CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT M=output.metrics
java -jar {gatk} -T SplitNCigarReads -R {HG38} -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS
java -jar {gatk} -T HaplotypeCaller -R {HG38} -I input.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -stand_emit_conf 20.0 -o output.vcf
java -jar {gatk} -T VariantFiltration -R {HG38} -V output.vcf -window 35 -cluster 3 -filterName FS -filter “FS > 30.0” -filterName QD -filter “QD < 2.0” -o output.hard.filtered.vcf

Reference settings

Reference Name Reference Value
STAR_HG38 The folder stores hg38 genome index for star, this is generated by command STAR --runMode genomeGenerate
HG38 Path of a reference genome file, like /mnt/biodata/hg38.fa
picard Path of picard.jar, like /mnt/biosoftware/picard.jar
gatk Path of GenomeAnalysisTK.jar, like /mnt/biosoftware/GenomeAnalysisTK.jar

Job Initial

Input files:

{Uploaded:ERR033015_1.fastq};{Uploaded:ERR033015_2.fastq};
Or /mnt/biodata/samples/ERR033015_1.fastq;/mnt/biodata/samples/ERR033015_2.fastq

Job parameter:

RGID=4;RGLB=lib1;RGPL=illumina;RGPU=unit1;RGSM=20;