> I would be grateful if
>anyone can advise me as to whether
>Staden software can cope with
>contiging cosmid sequences of upto
>40kb each. I have sequences for 12
>cosmids which I am trying to contig.
The Staden Package assembly program (gap4) can handle large data sets, but
is designed around assembling large numbers of small fragments. It has a
limitation of 4Kb for each single fragment. Ideally having the original
fragments available would allow for a complete sequence assembly database, but
it's still possible only using the cosmid consensus sequences.
It should be easy to split your each of your 40kb cosmid sequences into sets
of overlapping small fragments by using the splitseq (plain sequence files) or
splitseqf (fasta sequence files) programs, and to then assemble these using
gap4. The pitfalls with this method is that if there is highly repetitive
data the assembly may join the fragments incorrectly. Solutions are to provide
plenty of overlap between fragments (one of the splitseq parameters); to
assemble with 0% mismatch; and to ensure all cosmid vector sequence has been
removed before splitting.