Welcome to Galaxy
It appears that you found this tool from a link outside of Galaxy. If you're not familiar with Galaxy, please consider visiting the welcome page. To learn more about what Galaxy is and what it can do for you, please visit the Galaxy wiki.

SOAPdenovo1 (Galaxy tool version 1.0)
Default settings is suitable for most mapping needs. If you want full control, use Full parameter list

What it does

SOAPdenovo 1 is a novel short-read assembly method that can build a de novo draft assembly for human-sized genomes. The assembler is specially designed to use Illumina Genome Analyzer short reads.

SOAPdenovo 1 is designed for assembling large plant and animal genomes, although it also works well on bacteria and fungal genomes. It runs on 64-bit/32-bit Linux/MAC OSX platforms with a minimum of 5 GB physical memory. For large genomes such as those of humans, approximately 150 GB memory is required.

This Galaxy tool is a wrapping of the version of SOAPdenovo 1.0 which was used as a comparison for the performance of SOAPdenovo2.

This wrapping of SOAPdenovo 1 executes its whole assembly pipeline from pregraph to the contig, map and scaff steps. These steps have not been made individually available for the time being unless there is a demand.


Configuration of SOAPdenovo 1

For large genome projects involving deep sequencing, data is usually organized as multiple read sequence files generated from multiple libraries. Tools such as SOAPdenovo 1 and 2 require information on how to perform its job which is made available to it in the form of a configuration file. Since this Galaxy wrapping of SOAPdenovo 1 creates this configuration file for you based on the settings that you give on this interace web page, the information provided below is for reference if you need to write this file manually.

The configuration file has a section of global information, and then multiple library sections. The library information and the information about the sequencing data generated from the library should be organized in the corresponding library section. Right now only the information of maximal read length is included in the global information section. Each library section starts with tag [LIB] and is followed by read file names along with their paths, read file format, average insert size, library ranks and two other flags that tell the assembler how to treat these reads.

The assembler accepts read files in two formats: FASTA or FASTQ. Mate-pair relationship could be indicated in two ways: two sequence files with reads in the same order belonging to a pair, or two adjacent reads in a single file (FASTA only) belonging to a pair.

Libraries with the same "rank" are used at the same time for scaffolding in the order indicated by "rank".

The flag "asm_flag" has three possible values: "1" for reads to be only used for contig assembly; "2" for reads to be only used for scaffold assembly; and "3" for reads to be used for both contig and scaffold assembly.

There are two types of paired-end libraries: a) forward-reverse, generated from fragmented DNA ends with typical insert size less than 800 bp; b) reverse-forward, generated from circularizing libraries with typical insert size greater than 2 Kb. User should set parameter for tag "reverse_seq" to indicate this: 0, forward-reverse; 1, reverse-forward.


Outputs

Two files are generated by the SOAPdenovo 1 assembly process:

  1. A contig sequence file
  2. A scaffold sequence file

More information

For test data and more detailed information, click here.


This tool was installed from a ToolShed, you may be able to find additional information by following this link: http://gigatoolshed.net/view/peterli/soapdenovo1