I am pleased to announce that MIRA 2.9.8 sequence assembler is available.
This is the first time I make a version of the assembler publicly available
that is able to assemble (de-novo) sequences gained through Roche
instruments like the GS20 and GS20FLX.
MIRA is furthermore able to perform true hybrid sequence assembly. That is,
instead of assembling *the* *consensus* of 454 data with Sanger reads, MIRA
assembles 454 *reads* together with Sanger reads. An example how this looks
like when assembled against a backbone is shown at
http://chevreux.org/mira_ex_454sanger.html where one also can see how going
with a hybrid strategy helps to overcome sequencing errors that are typical
for either strategy.
As known from the earliest MIRA versions since 1999 (see
http://www.bioinfo.de/isb/gcb99/talks/chevreux/), the repeat resolving
algorithms are able to (more or less cleanly) separate reads from different
locations as long as there is 1 base differentiating the reads of the
different repetitive places. This should alleviate a little bit the repeat
Please note that 2.9.8 is still in development though and not entirely
optimised throughout all the algorithms. Therefore, MIRA 2.9.8 should NOT
be used for productive assembly but rather be used as testing version to
gather feedback of parties interested in hybrid assembly strategies. Also,
one needs a fast machine and quite an amount of memory. As a rough
estimate: per million 454 GS20 reads, one needs some 5-6GB RAM, ~10GB disk
and ~48-72hrs of computation time.
A version of MIRA 2.9.8 for 64bit Linux machines can be downloaded from
To know how to use MIRA with 454 data (the provided documentation is
unfortunately not entirely up-to-date), please have a look at the scripts
provided in the example project of "Streptococcus pneumoniae TIGR4" which
is also available for download there.