What it does
GapCloser reduces the size of any gaps present in scaffolds generated by SOAPdenovo2 or another assembler by using the abundant pair relationships of short reads.
System Requirement
GapCloser works on large plant and animal genomes, but it also works well on bacterial and fungal genomes. Its use of memory is associated with the number of reads, the number of unique kmers in the reads, the number of gaps and the scaffold sizes. The processing time of GapCloser also depends on the number of gaps, their sizes and the number of reads. With respect to the assembly of the YH genome which was 3 GB in size, peak memory usage by GapCloser was determined to be about 200 GB and GapCloser required about 1 day to process the YH dataset.
Outputs
Two outputs are produced by GapCloser:
FAQ
What pair ends will be used for gap filling?
GapCloser mainly uses read pairs of short and medium insert sizes, although the long insert paired end reads over 2K bps in length may also help. It is recommended that the reads be corrected before gap filling to reduce memory usage and improve the accuracy of gap sequences produced at this stage.
What is the sequence quality produced during gap filling?
The sequence quality is statistically lower than that of the sequences on both sides of the gaps.