Examples¶
Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie¶
This protocol is created according to this nature protocol paper:
Software | Parameter |
---|---|
hisat2 | -p {ThreadN} --dta -x {HISAT2_HG38} -1 {InputFile:1} -2 {InputFile:2} -S {EXP}.sam |
samtools | sort -@ {ThreadN} -o {EXP}.sorted.bam {EXP}.sam |
stringtie | -p {ThreadN} -G {GENCODE_HG38} -o {EXP}.gtf -l {EXP} {EXP}.sorted.bam |
Reference settings¶
Reference Name | Reference Value |
---|---|
HISAT2_HG38 | The folder stores hg38 genome index for hisat2, this is generated by hisat2-build |
GENCODE_HG38 | Path of a gene annotation file, like /mnt/biodata/gencode_hg38_v23.gtf |
Job Initial¶
Input files:
{Uploaded:ERR033015_1.fastq};{Uploaded:ERR033015_2.fastq};
Or /mnt/biodata/samples/ERR033015_1.fastq;/mnt/biodata/samples/ERR033015_2.fastq
Job parameter:
EXP=ERR033015;
Calling variants in RNAseq (STAR-gatk)¶
This protocol is created according to gatk’s Best-Practices provided by Broad Institute:
Software | Parameter |
---|---|
STAR | –genomeDir {STAR_HG38} –readFilesIn {InputFile:1} {InputFile:2} –runThreadN {ThreadN} |
STAR | –runMode genomeGenerate –genomeDir 2pass –genomeFastaFiles {HG38} –sjdbFileChrStartEnd SJ.out.tab –sjdbOverhang 75 –runThreadN {ThreadN} |
STAR | –genomeDir 2pass –readFilesIn {InputFile:1} {InputFile:2} –runThreadN {ThreadN} |
java | -jar {picard} AddOrReplaceReadGroups I=Aligned.out.sam O=rg_added_sorted.bam SO=coordinate RGID={RGID} RGLB={RGLB} RGPL={RGPL} RGPU={RGPU} RGSM={RGSM} |
java | -jar {picard} MarkDuplicates I=rg_added_sorted.bam O=dedupped.bam CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT M=output.metrics |
java | -jar {gatk} -T SplitNCigarReads -R {HG38} -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS |
java | -jar {gatk} -T HaplotypeCaller -R {HG38} -I input.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -stand_emit_conf 20.0 -o output.vcf |
java | -jar {gatk} -T VariantFiltration -R {HG38} -V output.vcf -window 35 -cluster 3 -filterName FS -filter “FS > 30.0” -filterName QD -filter “QD < 2.0” -o output.hard.filtered.vcf |
Reference settings¶
Reference Name | Reference Value |
---|---|
STAR_HG38 | The folder stores hg38 genome index for star, this is generated by command STAR --runMode genomeGenerate |
HG38 | Path of a reference genome file, like /mnt/biodata/hg38.fa |
picard | Path of picard.jar, like /mnt/biosoftware/picard.jar |
gatk | Path of GenomeAnalysisTK.jar, like /mnt/biosoftware/GenomeAnalysisTK.jar |
Job Initial¶
Input files:
{Uploaded:ERR033015_1.fastq};{Uploaded:ERR033015_2.fastq};
Or /mnt/biodata/samples/ERR033015_1.fastq;/mnt/biodata/samples/ERR033015_2.fastq
Job parameter:
RGID=4;RGLB=lib1;RGPL=illumina;RGPU=unit1;RGSM=20;