F A L CON
Fast Assembly of Large CONtigs
This is an announcement of availability of FALCON.
FALCON is a program for shotgun DNA sequence assembly.
FALCON is available via anonymous FTP from rascal.med.harvard.edu.
in [gryan.falcon]
What is FALCON?
Mailing List
Other Platforms
Program Output
/**************************************************************************/
WHAT IS FALCON?
FALCON is a program for shotgun DNA sequence assembly. Its primary
advantages are:
a) It can handle genome-size projects (it has assembled 12 megabases
of raw test data into a single 3 megabase consensus of approximately
30,000 fragments in 10.2 hours on a DEC OSF/1 ALPHA). Larger projects
should simply be matter of adding additional virtual memory.
b) It runs very fast; a typical cosmid with 8 fold coverage takes 4-10
minutes on a DEC OSF/1 ALPHA.
c) It scales well with increasing project size.
d) It is an open "software tool" with fully available source code,
written in portable ANSI "C".
e) It is portable between UNIX and VMS operating systems.
f) Input sequences can be arbitrarily large, so consensus sequences from
other projects can be incorporated into an assembly.
An article is in preparation for submission to Nucleic Acids
Research where the algorithms will be discussed in greater detail
and tests will be presented.
Tested systems: (as of 12/7/94)
DEC ALPHA running OSF/1 v1.3 DEC C compiler
VAX 4000/90 running VAX/VMS 5.5-2 using VAX C compiler
DECSTATION 5000 running Ultrix 4.3
/**************************************************************************/
Mailing List
If you wish to be informed of updates, get tech-notes, and receive
announcement of further developments of this program, send e-mail
to gryan at rascal.med.harvard.edu. Announcements of major upgrades will
be posted to the newsgroups bionet.software and bionet.announce on the
internet.
/**************************************************************************/
Other Platforms
Work is in progress on porting to SunOS; currently FALCON runs under SunOS
and produces assemblies, but the score parameter for the overlaps is
different than on the other three tested platforms. This may be a big vs.
little endian problem; any suggestions are welcome.
/**************************************************************************/
Program Output
Output compatible to GCG Gelassemble is possible by specifying the -a
option.
Files which are unconnected (i.e. have no 16-mers in common with any
other sequences in the project) are written to the unconnected directory
with suffix .unc .
By default FALCON outputs .dis and .out files. DIS files have .dis as the
suffix and highlight the discrepancies between the fragments by displaying
'-' where the sequence agrees with the consensus. OUT files have .out as
the suffix and display all the characters in the alignment.
DIS and OUT are text files.
/**************************************************************************/
Gary P. Gryan E-Mail: gryan at rascal.med.harvard.edu
Scientific Programmer
Harvard Medical School
Department of Genetics
Howard Hughes Medical Institute
200 Longwood Avenue
Boston, MA 02115
/**************************************************************************/