Create a new history, and import the following data:
The main functions of FastQC are:
The following links show the results after using 'fastqc' :
fqCleaner v.0.4.1
USAGE :
fqCleaner.sh [options]
where 'options' are :
-f FASTQ formatted input file name (mandatory option)
-r when using paired-ends data, this option allows inputing the
second file (i.e. reverse reads)
-q quality score threshold (default: 20); all bases with quality
score below this threshold are considered as non-confident
-l minimum required length for a read (default: 30)
-p minimum percent of bases (default: 80) that must have a quality
score higher than the fixed threshold (set by option -q)
-s a sequence of tasks to be iteratively performed, each being set
one of the following letters:
Q: each read is trimmed off to remove non-confident bases at
5' and 3' ends (quality score threshold is set with option
-q); all short reads are discarded (minimum read length is
set with option -l)
F: each read containing too few confident bases is filtered
out (the minimum percentage of confident bases is set with
option -p)
A: each artefactual read is filtered out
C: contaminant oligonucleotide sequences are trimmed off when
occuring in either 5' or 3' ends of each read; all short
reads are discarded (minimum read length is set with
option -l); user contaminant sequences can be set with
option -c
D: all duplicated single- or paired-ends reads are removed
d: (only for paired-ends data) all duplicated reads within
each input file are removed
N: the number of (remaining) reads is displayed
(default sequence of tasks: NQNCNFNANDN)
-c sequence file containing contaminant nucleotide sequences (one
per line) to be trimmed off during step 'C' (This option dosen't work yet on Galaxy)
-x single-end: output file name; paired-ends: forward read output
file name (default: ..fq )
-y paired-ends data only: reverse read output file name (default:
..fq)
-z paired-ends data only: single read output file name (default:
..sgl.fq)
In this example, we keep all default parameters.
The following links show the results after using 'fqCleaner :
The following links show the results after using 'fqCleaner :
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.5.9-r16
Contact: Heng Li
Usage: bwa [options]
Command: index index sequences in the FASTA format
aln gapped/ungapped alignment
samse generate alignment (single ended)
sampe generate alignment (paired ended)
bwasw BWA-SW for long queries
fa2pac convert FASTA to PAC format
pac2bwt generate BWT from PAC
pac2bwtgen alternative algorithm for generating BWT
bwtupdate update .bwt to the new format
pac_rev generate reverse PAC
bwt2sa generate SA from BWT and Occ
pac2cspac convert PAC to color-space PAC
stdsw standard SW/NW alignment
--- In case of paired-end: bwa sampe ---
Usage: bwa sampe [options]
Options: -a INT maximum insert size [500]
-o INT maximum occurrences for one end [100000]
-n INT maximum hits to output for paired reads [3]
-N INT maximum hits to output for discordant pairs [10]
-c FLOAT prior of chimeric rate (lower bound) [1.0e-05]
-f FILE sam file to output results to [stdout]
-r STR read group header line such as `@RG\tID:foo\tSM:bar' [null]
-P preload index into memory (for base-space reads only)
-s disable Smith-Waterman for the unmapped mate
-A disable insert size estimate (force -s)
Notes: 1. For SOLiD reads, corresponds R3 reads and to F3.
2. For reads shorter than 30bp, applying a smaller -o is recommended to
to get a sensible speed at the cost of pairing accuracy.
The following link shows the result after using 'BWA' :
This tool uses the SAMTools toolkit to produce an indexed BAM file based on a sorted input SAM file.
The following link shows the result after using 'SAM-to-BAM':
'mpileup' is one of SAMTOOLS tools, which collects summary information in the input BAMs, computes the likelihood of data given each possible genotype and stores the likelihoods in the BCF format (via link).
Usage: samtools mpileup [options] in1.bam [in2.bam [...]]
Input options:
-6 assume the quality is in the Illumina-1.3+ encoding
-A count anomalous read pairs
-B disable BAQ computation
-b FILE list of input BAM files [null]
-C INT parameter for adjusting mapQ; 0 to disable [0]
-d INT max per-BAM depth to avoid excessive memory usage [250]
-E extended BAQ for higher sensitivity but lower specificity
-f FILE faidx indexed reference sequence file [null]
-G FILE exclude read groups listed in FILE [null]
-l FILE list of positions (chr pos) or regions (BED) [null]
-M INT cap mapping quality at INT [60]
-r STR region in which pileup is generated [null]
-R ignore RG tags
-q INT skip alignments with mapQ smaller than INT [0]
-Q INT skip bases with baseQ/BAQ smaller than INT [13]
Output options:
-D output per-sample DP in BCF (require -g/-u)
-g generate BCF output (genotype likelihoods)
-O output base positions on reads (disabled by -g/-u)
-s output mapping quality (disabled by -g/-u)
-S output per-sample strand bias P-value in BCF (require -g/-u)
-u generate uncompress BCF output
SNP/INDEL genotype likelihoods options (effective with `-g' or `-u'):
-e INT Phred-scaled gap extension seq error probability [20]
-F FLOAT minimum fraction of gapped reads for candidates [0.002]
-h INT coefficient for homopolymer errors [100]
-I do not perform indel calling
-L INT max per-sample depth for INDEL calling [250]
-m INT minimum gapped reads for indel candidates [1]
-o INT Phred-scaled gap open sequencing error probability [40]
-P STR comma separated list of platforms for indels [all]
Notes: Assuming diploid individuals.
The following link shows the result after using 'mpileup':
This step converts the pileup format file to VCF format (via link) for displaying the SNPs information more readablely.
The following link shows the result after using 'Pileup to VCF':
Notice: The GATK2 SNP calling step is independent to the SAMTOOLS SNP calling step.
Many downstream analysis tools (such as GATK, for example) require BAM datasets to contain read groups. Even if you are not going to use GATK, setting read groups correctly from the start will simplify your life greatly.
The following link shows the result after using 'Add or Replace Groups':
Notice: This step depends on Step 5'.
A GATK tool to emit intervals for the Local Indel Realigner to target for cleaning. Ignores 454 reads, MQ0 reads, and reads with consecutive indel operators in the CIGAR string.
More information about this tool, please see the documentation (via link).
The following link shows the result after using 'Realigner Target Creator'.
Notice: This step depends on Step 6'.
A GATK tool to perform local realignment of reads based on misalignments due to the presence of indels. Unlike most mappers, this walker uses the full alignment context to determine whether an appropriate alternate reference (i.e. indel) exists and updates SAM Records accordingly.
More information about this tool, please see the documentation (via link).
The following link shows the result after using 'indel realigner'.
Notice: This step depends on Step 5'.
A GATK tool as a variant caller to unifiy the approaches of several disparate callers. Works for single-sample and multi-sample data. The user can choose from several different incorporated calculation models.
More information about this tool, please see the documentation (via link).
The following links show the results after using 'Unified genotyper'.
Notice: This step is independent to the MAPPING, MPILEUP and GATK steps.
This step dosen't need the genome reference (fasta file), only cleaned reads are required.
This tool assembles the cleaned reads without genome reference and outputs contig sequences in fasta format. It can also output scaffolding annotation in a GFF output format file.
The following links show the results after using 'clc_assembler'.
History --> Setting --> Extract Workflow --> choose the steps that you want to have in your workflow --> create workflow
Go to Workflow section, you can checkout your pipeline in the list.
The following link shos the created workflow:
yluo
All published pages
Published pages by yluo