There are a growing number of sequence analysis tutorials on the Web.
Mine is at http://twod.med.harvard.edu/seqanal/
It is far from perfect, but it does reference the other two I know about
(which are in many ways superior). It also contains references to other
articles & books of interest.
In reference to BLAST, the algorithm can crudely be described as:
1) For each sequence of k residues in the query ("k-tuple"), generate
a list of all the k-tuples which could be the nucleus of a significant
2) Search the query for these k-tuples
In BLAST, this is done using a comp sci contraption called a
Deterministic Finite State Automaton (DFA). In brief, a DFA is
roughly analogous to a identification key for the set of k-tuples
(such as the sort of key you might use to identify a living organism).
3) At each matched k-tuple, extend the alignment until further extensions
do not improve the score
At all stages, BLAST is using a simple table (substitution matrix) to
score the alignment of the query against the database sequence.
Department of Cellular and Developmental Biology
Department of Genetics / HHMI
krobison at nucleus.harvard.edu