IUBio

TFASTA or FASTA?

Will Nelson will at turbo.bio.net
Thu Oct 3 10:26:22 EST 1991


DTANG%UDESVM at PUCC.PRINCETON.EDU (Denis Tang) writes:

>>    Does anybody know about a place where I will be able to do TFASTA searches?
>> I think netservers at EMBL and GENBANK and also GENIUS cannot do that type of
>> search.  I have never used TFASTA but what is its advantage compare to the
>> regular FASTA?  When should someone use FASTA or TFASTA?
>> 

This is the man page on our system, genbank.bio.net:

FASTA/TFASTA/LFASTAv1.5(1)USER COMMANDSFASTA/TFASTA/LFASTAv1.5(1)



NAME
     fasta - scan a protein or DNA sequence library  for  similar
     sequences

     tfasta - compare  a  protein  sequence  to  a  DNA  sequence
     library, translating the DNA sequence library `on-the-fly'.

     lfasta - compare two protein  or  DNA  sequences  for  local
     similarity and show the local sequence alignments

     plfasta - compare two sequences  for  local  similarity  and
     plot the local sequence alignments


SYNOPSIS
     fasta [-a -b # -c # -d #  -[f|k] -g # -l FASTLIBS  -r  STAT-
     FILE  -m  #  -o -p # -Q -s SMATRIX -w # -1 ] query-sequence-
     file library-file [ ktup ]

     fasta [-Qacglmoprsw] query-file @library-name-file

     fasta [-Qacglmoprsw] query-file "%PRMVI"

     fasta [-acglmoprsw] - interactive mode

     tfasta  [-abcdfgmoprsw3]  protein-query-file  DNA-library  [
     ktup ]

     lfasta [-ampsw] sequence-file-1 sequence-file-2 [ ktup ]

     plfasta [-ampsv] sequence-file-1 sequence-file-2 [ ktup ]


DESCRIPTION
     fasta is used to compare a protein or DNA sequence to all of
     the  entries  in a sequence library.  For example, fasta can
     compare a protein sequence to all of the  sequences  in  the
     NBRF  PIR  protein  sequence database.  fasta will automati-
     cally decide whether the query sequence is DNA or protein by
     reading  the  query  sequence  as  protein  and  determining
     whether  the  `amino-acid  composition'  is  more  than  85%
     A+C+G+T.   fasta  uses  an  improved  version  of  the rapid
     sequence comparison algorithm described by Lipman and  Pear-
     son  (Science, (1985) 227:1427) that is described in Pearson
     and Lipman, Proc. Natl. Acad. USA, (1988) 85:2444.  The pro-
     gram can be invoked either with command line arguments or in
     interactive mode.  The optional third  argument,  ktup  sets
     the sensitivity and speed of the search.  If ktup=2, similar
     regions in the two sequences being  compared  are  found  by
     looking  at  pairs  of  aligned  residues; if ktup=1, single
     aligned amino acids are examined.  ktup can be set to 2 or 1
     for  protein  sequences,  or  from 1 to 6 for DNA sequences.
     The default if ktup is not specified is 2 for proteins and 6
     for DNA.

     fasta compares a query sequence to a sequence library  which
     consists  of  sequence  data interspersed with comments, see
     below.  Normally  fasta  and  tfasta  search  the  libraries
     listed  in  the  file pointed to by the environment variable
     FASTLIBS.  The format of this file is described in the  file
     FASTA.DOC.   tfasta  compares  a  protein  sequence to a DNA
     sequence database, translating the DNA sequence library in 6
     frames  `on-the-fly'  (3  frames  with  the -3 option).  The
     search uses the standard PAM250 scoring matrix, and  uses  a
     ktup=2  by default.  tfasta searches a DNA sequence database
     in the standard text format described below.

     lfasta and plfasta programs compare  two  sequences  looking
     for  local  sequence  similarities.   While fasta and tfasta
     report only the best alignment between  the  query  sequence
     and the library sequence, lfasta and plfasta will report all
     of the alignments between  the  two  sequences  with  scores
     greater than a cut-off value.  lfasta shows the actual local
     alignments between the two sequences and their scores, while
     plfasta produces a plot of the alignments that looks similar
     to a `dot-matrix' homology plot.  On Unix  systems,  plfasta
     generates tektronix output that can either be displayed on a
     tektronix terminal or piped through the tek2ps  program  for
     output  on  the  laser  printer.  On MS-DOS systems, plfasta
     uses  the  graphics  capabilities  of  the  computer  screen
     together  with the *.BGI graphics device drivers supplied by
     Borland with Turbo `C'.

     The fasta programs use a standard text format sequence file.
     Lines  beginning with '>' or ';' are considered comments and
     ignored; sequences can be upper or lower  case,  blanks,tabs
     and  unrecognizable  characters  are ignored.  fasta expects
     sequences to use the single letter  amino  acid  codes,  see
     protcodes(1) .  Library files for fasta should have the form
     shown below.

OPTIONS
     fasta and the other programs can be directed to  change  the
     scoring   matrix,  search  parameters,  output  format,  and
     default search directories by entering options on  the  com-
     mand line (preceeded by a `-' or `/' for MS-DOS). All of the
     options should preceed the file name  and  ktup  arguments).
     Alternately,   these  options  can  be  changed  by  setting
     environment variables.  The options  and  environment  vari-
     ables are:


     -1   Normally, the top scoring sequences are ranked by their
          initn  score.   By  using  the -1 option, sequences are
          ranked by their init1 score.  -a (SHOWALL) Modifies the
          display  of  the two sequences in alignments. Normally,
          both  sequences  are  shown  only  where  they  overlap
          (SHOWALL=0);  If -a or the environment variable SHOWALL
          = 1, both sequences are shown in their entirety.  -b  #
          The number of similarity scores to be shown when the -Q
          option is used.  This value is usually calculated based
          on the actual scores.

     -c # (OPTCUT) The threshold for  optimization  with  the  -o
          option.   The OPTCUT value is normally calculated based
          on sequence length.

     -d # The number of alignments to be shown.  Normally,  fasta
          shows  the  same  number  of  alignments  as similarity
          scores.  By using fasta -Q -b 200 -d 50, one would  see
          the top scoring 200 sequences and alignments for the 50
          best scores.

     -f | -k
          (PAMFACT) This version of fasta uses a  more  sensitive
          method  for  identifying  initial  regions.  Instead of
          using a constant factor (fact)  for  each  match  in  a
          ktup,  it  uses the scoring matrix (PAM) scores.  While
          this works well for protein sequences, it has not  been
          as  carefully  tested for DNA sequences, so by default,
          this modification is used for proteins but not for DNA.
          The -f option forces this option on.  -k forces it off.
          Setting the PAMFACT environment variable  to  1  forces
          the option on; PAMFACT=0 turns it off.

     -g # (GAPCUT) Sets the threshold  for  joining  the  initial
          regions for calculating the initn score.

     -l # (FASTLIBS) The name of the library menu file.  Normally
          this  will  be  determined  by the environment variable
          FASTLIBS.  However, a library menu  file  can  also  be
          specified with -l.

     -m # (MARKX)  =1,2,3.  Alternate  display  of  matches   and
          mismatches in alignments. MARKX=1 uses ":","."," ", for
          identities,   consevative   replacements,   and    non-
          conservative replacements, respectively. MARKX=2 uses "
          ","x", and "X".   MARKX=3  does  not  show  the  second
          sequence, but uses the second alignment line to display
          matches  with  a  "."   for  identity,  or   with   the
          mismatched  residue  for mismatches.  MARKX=3 is useful
          for aligning large numbers of similar sequences.

     -o   Causes fasta to perform a limited optimization  on  all
          of  the  sequences  in  the  library  with initn scores
          greater than OPTCUT. This slows the program down  about
          5-fold,  but,  when  combined  with ktup=1, provides an
          extremely sensitive sequence comparison.

     -Q   Quiet option.  This allows fasta and tfasta to search a
          database  and  report  the  results  without asking any
          questions. fasta -Q file library > output can be put in
          the  background  or  run  at a later time with the unix
          'at' command.  The  number  of  similarity  scores  and
          alignments displayed with the -Q option can be modified
          with the -b (scores) and -d (alignments) options.

     -r   STATFILE Causes fasta to write out the  sequence  iden-
          tifier, superfamily number (if available), and similar-
          ity scores  to  STATFILE  for  every  sequence  in  the
          library.  These results are not sorted.

     -s str
          (SMATRIX) the filename of an alternative scoring matrix
          file.

     -v str
          (LINEVAL) (plfasta only) plfasta and pclfasta  can  use
          up  to  4 different line styles to denote the scores of
          local alignments.  The scores that correspond to  these
          line styles can be specified with the environment vari-
          able LINVAL, or with the -v option.  In either case,  a
          string with three numbers separated by spaces should be
          given.  This string must be surrounded by double quota-
          tion  marks.   For  example, LINEVAL="200 100 50" tells
          plfasta to use solid lines for  local  alignments  with
          scores  greater  than 200, long dashed lines for scores
          between 100 and 200,  short  dashed  lines  for  scores
          between  50  and  100, and dotted lines for scores less
          than 50.
               plfasta -v "200 100 50"
          Normally, the values are 200, 100, and 50  for  protein
          sequence  comparisons  and  400,  200,  and 100 for DNA
          sequence comparisons.

     -w # (LINLEN) output line length  for  sequence  alignments.
          (normally 60, can be set up to 200).

     -3   tfasta only.  Normally tfasta  translate  sequences  in
          the  DNA  sequence library in all six frames.  Wit



More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net