Ken Wolfe (khwolfe at tcd.ie) wrote:
> The BLAST low-complexity filters seg and xnu change amino acids in the
> Query sequence into X characters. When these sequences are searched
> (BLASTP) against a database, the Query sequence no longer hits itself with
> 100% identity because matches involving X are counted as mismatches. Is
> there a way of overcoming this so that filtered sequences still have 100%
> identity to themselves?
I'm not sure I can see why this would be desirable: the purpose of
filtering the query sequence in the first place is to suppress motifs
and repetitive sequences which lead to such frequent hits in the
database that any analysis of the remaining sequence is rendered
impossible. Re-enabling counting of suppressed regions would lead to
a reoccurrence of this problem.
Peter A. Stockwell
> For example: yeast TUP1 after seg-ing hits itself with only 81% identity:
> SWISS|TUP1_YEAST|P16649
> Length = 713
> Score = 2930 (1318.2 bits), Expect = 0.0, P = 0.0
> Identities = 581/713 (81%), Positives = 581/713 (81%)
[...]
> Any ideas?
> --
> Ken Wolfe
> Department of Genetics
> University of Dublin e-mail: khwolfe at tcd.ie> Trinity College phone: +353-1-608-1253
> Dublin 2, Ireland FAX: +353-1-679-8558