DTANG%UDESVM at PUCC.PRINCETON.EDU (Denis Tang) writes:
>> Does anybody know about a place where I will be able to do TFASTA searches?
>> I think netservers at EMBL and GENBANK and also GENIUS cannot do that type of
>> search. I have never used TFASTA but what is its advantage compare to the
>> regular FASTA? When should someone use FASTA or TFASTA?
>>
This is the man page on our system, genbank.bio.net:
FASTA/TFASTA/LFASTAv1.5(1)USER COMMANDSFASTA/TFASTA/LFASTAv1.5(1)
NAME
fasta - scan a protein or DNA sequence library for similar
sequences
tfasta - compare a protein sequence to a DNA sequence
library, translating the DNA sequence library `on-the-fly'.
lfasta - compare two protein or DNA sequences for local
similarity and show the local sequence alignments
plfasta - compare two sequences for local similarity and
plot the local sequence alignments
SYNOPSIS
fasta [-a -b # -c # -d # -[f|k] -g # -l FASTLIBS -r STAT-
FILE -m # -o -p # -Q -s SMATRIX -w # -1 ] query-sequence-
file library-file [ ktup ]
fasta [-Qacglmoprsw] query-file @library-name-file
fasta [-Qacglmoprsw] query-file "%PRMVI"
fasta [-acglmoprsw] - interactive mode
tfasta [-abcdfgmoprsw3] protein-query-file DNA-library [
ktup ]
lfasta [-ampsw] sequence-file-1 sequence-file-2 [ ktup ]
plfasta [-ampsv] sequence-file-1 sequence-file-2 [ ktup ]
DESCRIPTION
fasta is used to compare a protein or DNA sequence to all of
the entries in a sequence library. For example, fasta can
compare a protein sequence to all of the sequences in the
NBRF PIR protein sequence database. fasta will automati-
cally decide whether the query sequence is DNA or protein by
reading the query sequence as protein and determining
whether the `amino-acid composition' is more than 85%
A+C+G+T. fasta uses an improved version of the rapid
sequence comparison algorithm described by Lipman and Pear-
son (Science, (1985) 227:1427) that is described in Pearson
and Lipman, Proc. Natl. Acad. USA, (1988) 85:2444. The pro-
gram can be invoked either with command line arguments or in
interactive mode. The optional third argument, ktup sets
the sensitivity and speed of the search. If ktup=2, similar
regions in the two sequences being compared are found by
looking at pairs of aligned residues; if ktup=1, single
aligned amino acids are examined. ktup can be set to 2 or 1
for protein sequences, or from 1 to 6 for DNA sequences.
The default if ktup is not specified is 2 for proteins and 6
for DNA.
fasta compares a query sequence to a sequence library which
consists of sequence data interspersed with comments, see
below. Normally fasta and tfasta search the libraries
listed in the file pointed to by the environment variable
FASTLIBS. The format of this file is described in the file
FASTA.DOC. tfasta compares a protein sequence to a DNA
sequence database, translating the DNA sequence library in 6
frames `on-the-fly' (3 frames with the -3 option). The
search uses the standard PAM250 scoring matrix, and uses a
ktup=2 by default. tfasta searches a DNA sequence database
in the standard text format described below.
lfasta and plfasta programs compare two sequences looking
for local sequence similarities. While fasta and tfasta
report only the best alignment between the query sequence
and the library sequence, lfasta and plfasta will report all
of the alignments between the two sequences with scores
greater than a cut-off value. lfasta shows the actual local
alignments between the two sequences and their scores, while
plfasta produces a plot of the alignments that looks similar
to a `dot-matrix' homology plot. On Unix systems, plfasta
generates tektronix output that can either be displayed on a
tektronix terminal or piped through the tek2ps program for
output on the laser printer. On MS-DOS systems, plfasta
uses the graphics capabilities of the computer screen
together with the *.BGI graphics device drivers supplied by
Borland with Turbo `C'.
The fasta programs use a standard text format sequence file.
Lines beginning with '>' or ';' are considered comments and
ignored; sequences can be upper or lower case, blanks,tabs
and unrecognizable characters are ignored. fasta expects
sequences to use the single letter amino acid codes, see
protcodes(1) . Library files for fasta should have the form
shown below.
OPTIONS
fasta and the other programs can be directed to change the
scoring matrix, search parameters, output format, and
default search directories by entering options on the com-
mand line (preceeded by a `-' or `/' for MS-DOS). All of the
options should preceed the file name and ktup arguments).
Alternately, these options can be changed by setting
environment variables. The options and environment vari-
ables are:
-1 Normally, the top scoring sequences are ranked by their
initn score. By using the -1 option, sequences are
ranked by their init1 score. -a (SHOWALL) Modifies the
display of the two sequences in alignments. Normally,
both sequences are shown only where they overlap
(SHOWALL=0); If -a or the environment variable SHOWALL
= 1, both sequences are shown in their entirety. -b #
The number of similarity scores to be shown when the -Q
option is used. This value is usually calculated based
on the actual scores.
-c # (OPTCUT) The threshold for optimization with the -o
option. The OPTCUT value is normally calculated based
on sequence length.
-d # The number of alignments to be shown. Normally, fasta
shows the same number of alignments as similarity
scores. By using fasta -Q -b 200 -d 50, one would see
the top scoring 200 sequences and alignments for the 50
best scores.
-f | -k
(PAMFACT) This version of fasta uses a more sensitive
method for identifying initial regions. Instead of
using a constant factor (fact) for each match in a
ktup, it uses the scoring matrix (PAM) scores. While
this works well for protein sequences, it has not been
as carefully tested for DNA sequences, so by default,
this modification is used for proteins but not for DNA.
The -f option forces this option on. -k forces it off.
Setting the PAMFACT environment variable to 1 forces
the option on; PAMFACT=0 turns it off.
-g # (GAPCUT) Sets the threshold for joining the initial
regions for calculating the initn score.
-l # (FASTLIBS) The name of the library menu file. Normally
this will be determined by the environment variable
FASTLIBS. However, a library menu file can also be
specified with -l.
-m # (MARKX) =1,2,3. Alternate display of matches and
mismatches in alignments. MARKX=1 uses ":","."," ", for
identities, consevative replacements, and non-
conservative replacements, respectively. MARKX=2 uses "
","x", and "X". MARKX=3 does not show the second
sequence, but uses the second alignment line to display
matches with a "." for identity, or with the
mismatched residue for mismatches. MARKX=3 is useful
for aligning large numbers of similar sequences.
-o Causes fasta to perform a limited optimization on all
of the sequences in the library with initn scores
greater than OPTCUT. This slows the program down about
5-fold, but, when combined with ktup=1, provides an
extremely sensitive sequence comparison.
-Q Quiet option. This allows fasta and tfasta to search a
database and report the results without asking any
questions. fasta -Q file library > output can be put in
the background or run at a later time with the unix
'at' command. The number of similarity scores and
alignments displayed with the -Q option can be modified
with the -b (scores) and -d (alignments) options.
-r STATFILE Causes fasta to write out the sequence iden-
tifier, superfamily number (if available), and similar-
ity scores to STATFILE for every sequence in the
library. These results are not sorted.
-s str
(SMATRIX) the filename of an alternative scoring matrix
file.
-v str
(LINEVAL) (plfasta only) plfasta and pclfasta can use
up to 4 different line styles to denote the scores of
local alignments. The scores that correspond to these
line styles can be specified with the environment vari-
able LINVAL, or with the -v option. In either case, a
string with three numbers separated by spaces should be
given. This string must be surrounded by double quota-
tion marks. For example, LINEVAL="200 100 50" tells
plfasta to use solid lines for local alignments with
scores greater than 200, long dashed lines for scores
between 100 and 200, short dashed lines for scores
between 50 and 100, and dotted lines for scores less
than 50.
plfasta -v "200 100 50"
Normally, the values are 200, 100, and 50 for protein
sequence comparisons and 400, 200, and 100 for DNA
sequence comparisons.
-w # (LINLEN) output line length for sequence alignments.
(normally 60, can be set up to 200).
-3 tfasta only. Normally tfasta translate sequences in
the DNA sequence library in all six frames. Wit