Hi all,
MIRA is a sequence assembly system suited for genome and EST sequences.
Since V2.4.0rc1, the binary has no restrictions whatsoever concerning the
number of sequences it processes.
Highlights of the V2.4.0 version include:
- overall speedups in many parts thanks to new algorithms (read/read
comparison, SW alignment, pathfinder module, contig build module, read
extension)
- overall quality improvements: longer contigs with less errors remaining,
reliable detection and resolving of misassemblies when using clone pair
(also called templates or "double-barreled data") techniques, enhanced
'probably true' consensus computation without gaps and with consensus
quality files, improved automatic editor when using ABI 373, 377, 3100 and
3700 trace files (MEGABACE should also be ok)
- assembly for whole genomes supported for up to 10 megabases (and more for
really fast and big computers)
- EST assembly support: detection of SNPs; transcript assembly by strains,
according to detected SNP bases, special routines for extreme coverage
that allow assembly of gene families with thousands of similar sequences
- additional and/or improved input and output formats; fasta with quality,
gap4 directed assembly, phrap/consed ACE format (output only) and others
- assembly options: a plethora of options to fine tune the assembly, these
can now also be loaded from parameter files
- data preprocessing routines if these were not or incorrectly provided by
external data preprocessing programs: clipping potential vector leftovers
in sequences, support for 'screened' bases in FASTA files, own quality
clipping routines, tagging of poly-A or poly-T bases at the end of EST
sequences
- full IUPAC support in input and output files (as well as internal
computation)
- support for merging ancillary data present in EXP files or loaded from XML
trace info files (in NCBI format)
- many assembly info files generated, containing machine and human readable
statistics, cluster and assembly information
- optimised multiple alignments (no more gap base jiggling)
- possibility to load "backbones" and assemble against those sequences
- possibility to assembly several closely related strains in one go
- support for loading GenBank (gbf/gbk) files while retaining all features
and transferring them to Staden GAP4 viewers
- improved documentation and examples
- a lot more
The MIRA sequence assembler is available precompiled for 32 and 64 bit Linux
platforms at http://chevreux.org/projects_mira.html
Regards,
Bastien
--
-- The universe has its own cure for stupidity. --
-- Unfortunately, it doesn't always apply it. --