IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

ANNOUNCE: WWWtacg (Ver 2) - Web Analysis of Nucleic Acids

Harry Mangalam mangalam at uci.edu
Thu Nov 20 16:02:33 EST 1997

                         WWWtacg Version 2 - open BETA test
                a Web Resource for the analysis of nucleic acids.

WWWtacg is the Web interface to tacg2 (a free, fast, command line program for 
unixy OSs).  The Ver2 interface allows the following improvements over the 
previous versions:

= using Don Gilbert's readseq, it allows entire *file uploads* of your sequence 
   in any of the single sequence formats that readseq can convert to raw 
   output (^).
   Here's the list direct from readseq:
         1. IG/Stanford           10. Olsen (in-only)
         2. GenBank/GB            11. Phylip3.2
         3. NBRF                  12. Phylip
         4. EMBL                  13. Plain/Raw
         5. GCG                   14. PIR/CODATA
         6. DNAStrider            15. MSF
         7. Fitch                 16. ASN.1
         8. Pearson/Fasta         17. PAUP/NEXUS
         9. Zuker (in-only)  
   (^)  due to a bug/feature in readseq that I haven't figured out, very short 
      raw sequences do not get converted properly.  I hacked tacgi2 to work 
      around this for most cases, but beware that for short (<100 b), 
      pasted-in sequences, it may fail.  Please mail me if this happens.
= you can select internal SUBSEQUENCES from your uploads, do you don't have 
   to edit the sequences to upload them.
= you can upload your own GCG-formatted REBASE files to use instead of the 
   preselected/prefiltered ones (still available).  You can use this 
   file to specify searches for multiple patterns *with errors*.
= it allows you to use tacg2's ability to handle degenerate sequence input 
   (long strings of 'n's for example), as well as searching for degenerate 
   patterns (which may have errors in them - ie. Look for 'gtyrnncgaryy', 
   allowing up to 2 errors).
The Web interface incorporates most of the functionality of tacg2, with the 
following exceptions:
= it has an upper limit of 500,000 characters (unlike tacg2 itself which uses
   dynamic mem and has no upper limit)
= it does not (yet) provide an easy FORMS interface to the pattern-matching 
   parts of tacg2 (although this is in final coding).

After the beta testing, the ANSI C source code for implementing the WWWtacg 
interface (tacgi) will be made available under the same conditions as tacg2
below), as were the previous versions.
A brief description of tacg2's abilities follows:

tacg was originally designed for Restriction Enzyme analysis and it still 
does so (better than ever), but it has also been expanded to more general 
analyses, including:
= multiple degenerate pattern matching *with* errors, 
= handling and matching degenerate input sequence, 
= proximity matching of patterns. ie. report only if:
   SiteA is < 200b upstream from SiteB 
   SiteA is 400-800b downstream of SiteB
= Simple Open Reading Frame analysis 
   - Finds ORFs greater than X Amino Acids in any of the 6 frames
   - Streams ORFs to stdout in FASTA format so they can be searched with other 
      pattern-matching tools
   - The FASTA comment line includes Frame, Offsets from start (in BPs,
AAs), Size in
      AAs and KDaltons
= export of data for use with external plotting/analysis tools such as gnuplot
= more features, bug fixes, robustness.
= a whole new set of bugs ;)

If you are familiar with GCG's suite of programs, it includes much of the 
functionality of MAP, MAPPLOT, TRANSLATE, and FINDPATTERNS, except that:

= it's cheaper (free* - Damn that asterisk!)
= it's faster (~5 - 20 times faster for most things)
= it does a few things that GCG doesn't do.
= it can parasitize NEdit (the excellent, free Xwin text editor) to become
  a pretty slick (if simple) GUI biosequence application (and the just 
  released Nedit 5.0 gained an internal macro language, so further 
  integration will be forthcoming)
= most internals are dynamically allocated, so there are few limitations 
  on input sequence size, numbers of patterns to search for, etc.
= it runs on more platforms, including/especially Linux (Intel, Alpha, PPC), 
  IRIX, SunOS, Solaris, HP UX/Exemplar OS, NeXT, DEC Unix/OSF - anything for 
  which there's a gcc or other ANSI C compiler) 
= you can look at/scoff at/improve on the source code.

* in short, you can use the program and source code for anything you want
without charge, even in commercial ventures.  The only thing you have to 
contact me about is if you want to incorporate the program or it's source 
into another program or package that you will sell for profit.  

It's not a database scanner (yet - next order of business), but it will do 
Megabase sequences quite handily.

If you decide to use it, you might be interested in subscribing to the 
(low-flow) tacg listserv to be notified of bugs found and 
squashed, new features, suggestions, etc.  The page above tells how.  

Nuff said.  If you're interested, read more (and how to get it) at:
There's an HTML-ized man page at: 
The original WWW interface is at: 
and the new one is at: 

Harry J Mangalam, MolBio+Biochem / Dev+Cell Bio, Rm 4201, BioSciII  UC
Irvine, Irvine, CA, 92717, (714) 824-4824, fax (714) 824 8598

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net