Julie Thompson-Maaloum <julie at titus.u-strasbg.fr> wondered what my
alignments of 10,000 sequences looked like. Well, they are generally
fairly redundant. Although each sequence is unique (from NCBI's NR
protein database), they often have a lot in common. I have large
alignments of immunoglobulins and of zinc fingers. Perhaps the least
diverse alignment started with 1ce4A, which is a domain of an HIV coat
protein. There are over 16000 similar sequences in NR, with only tiny
differences---I think that reducing to sequences which have no more
than 90% similarity would reduce this set to one sequence.
Kevin Karplus karplus at cse.ucsc.eduhttp://www.cse.ucsc.edu/~karplus
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck
Anything below this line is junk added by others without my approval.