Regarding FASTA/BLAST vs Smith-Waterman.
I am in the process of writing up an extensive comparison of
FASTA, BLAST, and Smith-Waterman and various scoring matrices. This
paper will be an extension of my earlier one: "Pearson, (1991)
Genomics "Searching Protein Sequence Libraries: Comparison of the
Sensitivity and Selectivity of the Smith-Waterman and FASTA
Algorithms" 11:635-650.
I feel uncomfortable giving away the punch line, since the
paper has neither been written nor reviewed, but one of the
conclusions is that the results of the Genomics paper - that FASTA
with optimization performs as well as Smith-Waterman, will be
supported with considerably more data and better statistical analyses.
I should note also, since some readers of this group may be
interested, that I now have a version of our parallel "platform" for
sequence comparison ( Despande, Richards, and Pearson (1991) CABIOS "A
platform for biological sequence comparison on parallel computers"
7:237-247) running on networks of workstations using PVM (parallel
virtual machine), a freely available package for almost any machine.
If you are doing lots of sequence comparisons, I can provide you with
PVM versions for FASTA and Smith-Waterman, with BLAST to be available
in about a month.
Here are some typical timings on a network of 12 Sparc IPC's
using PVM2.4 (PVM3.0 is a little slower)
pvm2.4, 20 protein sequences vs
annotated PIR34 (approx 10K sequences)
nodes 11 7 3
--------------------
k2 78 105 207 (times in seconds)
76 128 206
(73.9)
k1 310 466 1070
312 471 1083
(94.6)
ok1 559 836 1995
532 811 1898
(97.3)
Smith-Waterman times are about 5X the ok1 times. The values in parentheses
indicate the relative efficiency of 11 nodes compared to 3 nodes. Thus
k2 on 11 nodes is 11/3*.739 times faster; ok1 is 11/3*.973 times faster.
Bill Pearson