FALCON - DNA sequence assembly software

Gary Gryan gryan at rascal.med.harvard.edu
Tue Dec 20 13:26:05 EST 1994

    F    A           L     CON
    Fast Assembly of Large CONtigs

  This is an announcement of availability of FALCON.
  FALCON is a program for shotgun DNA sequence assembly.  
  FALCON is available via anonymous FTP from rascal.med.harvard.edu.
  in [gryan.falcon]

What is FALCON?
Mailing List
Other Platforms
Program Output
  FALCON is a program for shotgun DNA sequence assembly.  Its primary 
  advantages are:

  a) It can handle genome-size projects (it has assembled 12 megabases 
  of raw test data into a single 3 megabase consensus of approximately
  30,000 fragments in 10.2 hours on a DEC OSF/1 ALPHA). Larger projects
  should simply be matter of adding additional virtual memory.

  b) It runs very fast; a typical cosmid with 8 fold coverage takes 4-10
  minutes on a DEC OSF/1 ALPHA.

  c) It scales well with increasing project size.

  d) It is an open "software tool" with fully available source code,
  written in portable ANSI "C".

  e) It is portable between UNIX and VMS operating systems.

  f) Input sequences can be arbitrarily large, so consensus sequences from
  other projects can be incorporated into an assembly.

  An article is in preparation for submission to Nucleic Acids 
  Research where the algorithms will be discussed in greater detail 
  and tests will be presented.  

  Tested systems: (as of 12/7/94)
    DEC ALPHA running OSF/1 v1.3   DEC C compiler
    VAX 4000/90 running VAX/VMS 5.5-2   using VAX C compiler
    DECSTATION 5000 running Ultrix 4.3    

Mailing List
  If you wish to be informed of updates, get tech-notes, and receive
  announcement of further developments of this program, send e-mail 
  to gryan at rascal.med.harvard.edu.  Announcements of major upgrades will 
  be posted to the newsgroups bionet.software and bionet.announce on the 
Other Platforms

  Work is in progress on porting to SunOS; currently FALCON runs under SunOS
and produces assemblies, but the score parameter for the overlaps is 
different than on the other three tested platforms.  This may be a big vs.
little endian problem; any suggestions are welcome.

Program Output
   Output compatible to GCG Gelassemble is possible  by specifying the -a 

   Files which are unconnected (i.e. have no 16-mers in common with any 
 other sequences in the project) are written to the unconnected directory
 with suffix .unc .

  By default FALCON outputs .dis and .out files. DIS files have .dis as the 
 suffix and highlight the discrepancies between the fragments by displaying 
 '-' where the sequence agrees with the consensus.  OUT files have .out as
  the suffix and display all the characters in the alignment.
 DIS and OUT are text files.

Gary P. Gryan			E-Mail:   gryan at rascal.med.harvard.edu
Scientific Programmer           
Harvard Medical School          
Department of Genetics		
Howard Hughes Medical Institute
200 Longwood Avenue
Boston, MA 02115

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net