WWWtacg Version 2 - open BETA test
a Web Resource for the analysis of nucleic acids.
WWWtacg is the Web interface to tacg2 (a free, fast, command line program for
unixy OSs). The Ver2 interface allows the following improvements over the
= using Don Gilbert's readseq, it allows entire *file uploads* of your sequence
in any of the single sequence formats that readseq can convert to raw
Here's the list direct from readseq:
1. IG/Stanford 10. Olsen (in-only)
2. GenBank/GB 11. Phylip3.2
3. NBRF 12. Phylip
4. EMBL 13. Plain/Raw
5. GCG 14. PIR/CODATA
6. DNAStrider 15. MSF
7. Fitch 16. ASN.1
8. Pearson/Fasta 17. PAUP/NEXUS
9. Zuker (in-only)
(^) due to a bug/feature in readseq that I haven't figured out, very short
raw sequences do not get converted properly. I hacked tacgi2 to work
around this for most cases, but beware that for short (<100 b),
pasted-in sequences, it may fail. Please mail me if this happens.
= you can select internal SUBSEQUENCES from your uploads, do you don't have
to edit the sequences to upload them.
= you can upload your own GCG-formatted REBASE files to use instead of the
preselected/prefiltered ones (still available). You can use this
file to specify searches for multiple patterns *with errors*.
= it allows you to use tacg2's ability to handle degenerate sequence input
(long strings of 'n's for example), as well as searching for degenerate
patterns (which may have errors in them - ie. Look for 'gtyrnncgaryy',
allowing up to 2 errors).
The Web interface incorporates most of the functionality of tacg2, with the
= it has an upper limit of 500,000 characters (unlike tacg2 itself which uses
dynamic mem and has no upper limit)
= it does not (yet) provide an easy FORMS interface to the pattern-matching
parts of tacg2 (although this is in final coding).
After the beta testing, the ANSI C source code for implementing the WWWtacg
interface (tacgi) will be made available under the same conditions as tacg2
below), as were the previous versions.
A brief description of tacg2's abilities follows:
tacg was originally designed for Restriction Enzyme analysis and it still
does so (better than ever), but it has also been expanded to more general
= multiple degenerate pattern matching *with* errors,
= handling and matching degenerate input sequence,
= proximity matching of patterns. ie. report only if:
SiteA is < 200b upstream from SiteB
SiteA is 400-800b downstream of SiteB
= Simple Open Reading Frame analysis
- Finds ORFs greater than X Amino Acids in any of the 6 frames
- Streams ORFs to stdout in FASTA format so they can be searched with other
- The FASTA comment line includes Frame, Offsets from start (in BPs,
AAs), Size in
AAs and KDaltons
= export of data for use with external plotting/analysis tools such as gnuplot
= more features, bug fixes, robustness.
= a whole new set of bugs ;)
If you are familiar with GCG's suite of programs, it includes much of the
functionality of MAP, MAPPLOT, TRANSLATE, and FINDPATTERNS, except that:
= it's cheaper (free* - Damn that asterisk!)
= it's faster (~5 - 20 times faster for most things)
= it does a few things that GCG doesn't do.
= it can parasitize NEdit (the excellent, free Xwin text editor) to become
a pretty slick (if simple) GUI biosequence application (and the just
released Nedit 5.0 gained an internal macro language, so further
integration will be forthcoming)
= most internals are dynamically allocated, so there are few limitations
on input sequence size, numbers of patterns to search for, etc.
= it runs on more platforms, including/especially Linux (Intel, Alpha, PPC),
IRIX, SunOS, Solaris, HP UX/Exemplar OS, NeXT, DEC Unix/OSF - anything for
which there's a gcc or other ANSI C compiler)
= you can look at/scoff at/improve on the source code.
* in short, you can use the program and source code for anything you want
without charge, even in commercial ventures. The only thing you have to
contact me about is if you want to incorporate the program or it's source
into another program or package that you will sell for profit.
It's not a database scanner (yet - next order of business), but it will do
Megabase sequences quite handily.
If you decide to use it, you might be interested in subscribing to the
(low-flow) tacg listserv to be notified of bugs found and
squashed, new features, suggestions, etc. The page above tells how.
Nuff said. If you're interested, read more (and how to get it) at:
There's an HTML-ized man page at:
The original WWW interface is at:
and the new one is at:
Harry J Mangalam, MolBio+Biochem / Dev+Cell Bio, Rm 4201, BioSciII UC
Irvine, Irvine, CA, 92717, (714) 824-4824, fax (714) 824 8598