Galaxy | Pasteur | Published Workflow | Metavisitor: Workflow for Use Case 2-1

Metavisitor: Workflow for Use Case 2-1

Annotation: DOI: 10.1371/journal.pone.0168397

Step	Annotation
Step 1: Input dataset collection input select at runtime
Step 2: Input dataset input select at runtime
Step 3: Retrieve FASTA from NCBI Query to NCBI in entrez format NC_001834.1 NCBI database Nucleotide Filter the sequences by date? False
Step 4: Clip adapter Source file Output dataset 'output' from step 1 min size 18 max size 50 Select output format Fasta format Accept reads containing N? accept Source Use a built-in adapter (select from the list below) Select Adapter to clip Illumina TruSeq TGGAATTCTCGGGTGCCAAG
Step 5: NCBI BLAST+ makeblastdb Molecule type of input nucleotide Input FASTA files(s) Output dataset 'outfilename' from step 3 Title for BLAST database DCV NC_001834.1 Parse the sequence identifiers False Enable the creation of sequence hash values True Optional ASN.1 file(s) containing masking data select at runtime Taxonomy options Do not assign a Taxonomy ID to the sequences
Step 6: Concatenate datacollection Concatenate Dataset collection Output dataset 'output' from step 4
Step 7: fasta - tabular conversion option fasta to tabular fasta file to convert to tabular Output dataset 'out_file1' from step 6
Step 8: fasta - tabular conversion option tabular to weighted fasta tabular file to convert to fasta weighted Output dataset 'output' from step 7
Step 9: sR_bowtie Input fasta or fastq file: reads clipped from their adapter Output dataset 'output' from step 8 What kind of matching do you want to do? Match on DNA as fast as possible, without taking care of mapping issues (for raw annotation of reads) Number of mismatches allowed 2 Will you select a reference genome from your history or use a built-in index? Use one from the history Select a fasta file, to serve as index reference select at runtime Select output format tabular additional fasta output unaligned
Step 10: sR_bowtie Input fasta or fastq file: reads clipped from their adapter Output dataset 'unaligned' from step 9 What kind of matching do you want to do? Match on DNA as fast as possible, without taking care of mapping issues (for raw annotation of reads) Number of mismatches allowed 2 Will you select a reference genome from your history or use a built-in index? Use one from the history Select a fasta file, to serve as index reference select at runtime Select output format tabular additional fasta output unaligned
Step 11: sR_bowtie Input fasta or fastq file: reads clipped from their adapter Output dataset 'unaligned' from step 10 What kind of matching do you want to do? Match on DNA as fast as possible, without taking care of mapping issues (for raw annotation of reads) Number of mismatches allowed 2 Will you select a reference genome from your history or use a built-in index? Use one from the history Select a fasta file, to serve as index reference select at runtime Select output format tabular additional fasta output unaligned
Step 12: Unknown Tool with id 'toolshed.pasteur.fr/repos/khillion/msp_oases/oasesoptimiserv/0.2.08'
Step 13: NCBI BLAST+ blastx Nucleotide query sequence(s) Output dataset 'transcripts' from step 12 Subject database/sequences BLAST database from your history None Empty. Protein BLAST database Output dataset 'output' from step 2 None Empty. Query genetic code 1. Standard Type of BLAST blastx - Traditional BLASTX to compare translated nucleotide query to protein database Set expectation value cutoff 0.001 Output format Tabular (select which columns) Standard columns qseqid = Query Seq-id (ID of your sequence) sseqid = Subject Seq-id (ID of the database hit) pident = Percentage of identical matches length = Alignment length mismatch = Number of mismatches gapopen = Number of gap openings qstart = Start of alignment in query qend = End of alignment in query sstart = Start of alignment in subject (database hit) send = End of alignment in subject (database hit) evalue = Expectation value (E-value) bitscore = Bit score Extended columns slen = Subject sequence length Other identifier columns Nothing selected. Miscellaneous columns Nothing selected. Taxonomy columns Nothing selected. Advanced Options Hide Advanced Options
Step 14: Parse blast output and compile hits fasta sequences that have been blasted Output dataset 'transcripts' from step 12 The blast output you wish to parse Output dataset 'output1' from step 13 Number of flanking nucleotides to add to hits for CAP3 assembly 5 Extensive or compact reporting mode extensive Use Additional Filters? No
Step 15: Pick Fasta sequences Select sequences with this string in their header Drosophila_C_virus Source file Output dataset 'fastaOutput' from step 14
Step 16: Pick Fasta sequences Select sequences with this string in their header Cricket_paralysis_virus Source file Output dataset 'fastaOutput' from step 14
Step 17: Concatenate datasets Concatenate Dataset Output dataset 'output' from step 15 Datasets Dataset 1 Select Output dataset 'output' from step 16
Step 18: cap3 Input sequences to assemble Output dataset 'out_file1' from step 17 specify overlap length cutoff > 15 (40) 40 specify overlap percent identity cutoff N > 65 (90) 90
Step 19: NCBI BLAST+ blastx Nucleotide query sequence(s) Output dataset 'contigsandsinglets' from step 18 Subject database/sequences BLAST database from your history None Empty. Protein BLAST database Output dataset 'outfile' from step 5 None Empty. Query genetic code 1. Standard Type of BLAST blastx - Traditional BLASTX to compare translated nucleotide query to protein database Set expectation value cutoff 0.001 Output format Tabular (select which columns) Standard columns qseqid = Query Seq-id (ID of your sequence) sseqid = Subject Seq-id (ID of the database hit) pident = Percentage of identical matches length = Alignment length mismatch = Number of mismatches gapopen = Number of gap openings qstart = Start of alignment in query qend = End of alignment in query sstart = Start of alignment in subject (database hit) send = End of alignment in subject (database hit) evalue = Expectation value (E-value) bitscore = Bit score Extended columns slen = Subject sequence length Other identifier columns Nothing selected. Miscellaneous columns Nothing selected. Taxonomy columns Nothing selected. Advanced Options Hide Advanced Options
Step 20: blast_to_scaffold Select a fasta contigs file Output dataset 'contigsandsinglets' from step 18 Select the fasta guide sequence for scaffolding Output dataset 'outfilename' from step 3 Select a blastn or tblastx output from your history Output dataset 'output1' from step 19

About this Workflow

Author

khillion

Related Workflows

All published workflows
Published workflows by khillion

Rating

Community
(0 ratings, 0.0 average)

Metavisitor: Workflow for Use Case 2-1

Author

Related Workflows

Rating

Tags