Page 97
stage2.ovl_opt_erc --min-map-
len 499
Overlapping options for the Raptor overlapping tool, applied
at the pread overlapping stage. The defaults are set to work
well with error-corrected reads and HiFi reads. The options set
by this parameter here are fed directly into the Raptor call.
For details on Raptor options, use raptor -h.
The option --min-map-len reduces the minimum span of the
overlap to 499 bp (instead of the default 1000 bp). This allows
shorter overlaps to be reported.
stage2.ovl_flank_grace 20 Heuristic to salvage some potential dovetail overlaps. Only
dovetail overlaps are used for assembly, and all other overlaps
(partial overlaps, which are actually local alignments by
definition) are not used to construct the string graph.
Dovetail overlaps are overlaps where the full suffix of one read
and a full prefix of the other read are used to form the overlap.
More details can be found here:
http://wgs-assembler.sourceforge.net/wiki/index.php/Overlaps
Overlaps are formed in the process of alignments, and
alignment extension near the ends of the sequences can be
stopped in case there are errors present near the edges of one
or both of the sequences.
For any overlap which is missing only a few bases to become a
dovetail overlap (the number of bases defined by this
parameter), the coordinates are augmented to convert it into a
dovetail overlap.
The impact of this parameter is very low, and this value is set to
work in almost all cases. This value should also be set
relatively low, to avoid chimeric overlaps.
stage2.ovl_min_idt 94 Overlap identity threshold (in percentage) for filtering overlaps
used for contig construction.
stage2.ovl_min_len 500 Minimum span of an overlap to keep it for contig construction,
in bp.
genome_size 5,000,000 The approximate number of base pairs expected in the
genome, used to determine the coverage cutoff.
Note: It is better to slightly overestimate rather than
underestimate the genome length to ensure good coverage
across the genome.
Coverage 30 A target value for the total amount of subread coverage used
for assembly. This parameter is used, together with the
genome size, to calculate the seed length cutoff.
plasmid_contig_len_max 300,000 Maximum expected plasmid size in the input subreadset. The
default value covers a large range of possible plasmids. This
value is used to select subreads for the secondary assembly
stage which is specialized for assembly of smaller sequences
(e.g. plasmids) that might have been lost due to the seed
length cutoff threshold.
Any contig assembled in the first assembly stage larger than
this value will be filtered out and reassembled in the secondary
assembly stage. This is performed in order to avoid partially
assembled plasmid sequences
plasmid_min_aln_frac 0.95 Applied in the "Mapping and filtering" stage, where raw
subreads are aligned to the filtered contigs of the first assembly
stage.
Any subread which doesn't have at least this large of aligned
span (in query coordinates) is kept for the secondary assembly
stage, in addition to all reads which didn't align)
The value is a fraction of the subread's length (0.95 means
95% of the subread's size).
Advanced Parameters Default Value Description