IUBio

Assembling large sequences

jkb at mrc-lmb.cam.ac.uk jkb at mrc-lmb.cam.ac.uk
Wed Jan 16 10:00:50 EST 2002


In <3C457DFC.AF6A0D0F at bbsrc.ac.uk> Simon Andrews <simon.andrews at bbsrc.ac.uk> writes:

> Whilst playing with Staden 2001 I've found that something I hoped would
> have been changed hasn't.
> 
> I'm trying to assemble some large genomic sequence fragments (100kb -
> 300kb) using Staden.  The program will not read these and keeps
> crashing.

It shouldn't crash (that's a bug), however there is indeed a limit. It allows
up to 30,000 bases. If you want something longer use splitseq or splitseq_da
(for Directed Assembly) to chop the sequence into smaller overlapping
sections.

The good news is that this limitation has now been removed and so the 2002
release will allow any length sequences.

> I've tried increasing the maxseq and maxdb parameters, but to no
> effect.  Could the maximum sequence size not be increased beyond 30kb? 
> It is very useful to bring long genomic contigs into an assembly, but
> this seems not to be possible at present.

The maxseq parameter controls the total maximum consensus sequence length, not 
individual sequence length. It defaults to 100K, so you will indeed need to
increase this before assembling.

Maxdb controls the maximum total number of readings + contigs in the
database. It defaults to 8000 (or 10000, I forget which), but it too can be
increased to whatever you like.

These parameters may go in the future, but they're still in our current
in-house code. Practically speaking they're an annoyance but not a
limitation. Some of our users have databases with over 100,000 sequences and
many megabases of consensus.

James
--
James Bonfield (jkb at mrc-lmb.cam.ac.uk)   Fax: (+44) 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/




More information about the Staden mailing list

Send comments to us at biosci-help [At] net.bio.net