[Staden] Re: Solving transposon problems on genome assembly

Bastien Chevreux bach at chevreux.org
Wed May 17 14:25:29 EST 2006

Duarte Molha wrote:
> I am working on the sequencing of a microorganism and altough the main
> sequencing has been completed I am having problems due to the existence of
> many repeats (transposon regions).
> This makes my genome assembly very difficult and produces many contigs.
> Does anyone know if there is there anyway of improving this with the
> staden package?

Hello Duarte,

does your sequence data have quality information? Was the sequencing made
with library size of different template sizes and is this information
present in ancillary data (EXP files have them as well as XML traceinfo)?

If yes, you might want to try out a number of other assemblers that are
available out there. Phrap comes to mind (there's a version with gap4
integration), or - warning, shameless plug ahead - the MIRA assembler
(which produces assemblies that can be directly imported into gap4).

Should the transposons be not too large and ideally have perhaps something
between .3% and 1% (or more) of SNPs inbetween the different versions, then
I think that there is a realistic chance to get that solved without too
much hassle.

MIRA:  http://www.chevreux.org/projects_mira.html
phrap: http://www.phrap.org/
list of others: http://en.wikipedia.org/wiki/Sequence_assembly


        -- The universe has its own cure for stupidity. --
         -- Unfortunately, it doesn't always apply it. --

More information about the Staden mailing list

Send comments to us at biosci-help [At] net.bio.net