Francois Jeanmougin <pingouin at crystal.u-strasbg.fr>
> It is filtered due to its low complexity. Removing the
> filters as suggested by Andrew will "flood" your output with lot of
> poly-prolines sequences, with no biological means nor informations.
In many cases, yes, but there have been a few times where I
didn't get any hits with filtering turned on, and some, about
5 or so, hits with filtering turned off.
As for the *relevancy* of the results, that's a whole 'nother
issue :) For example -- example since I don't remember the
details of the original circumstances -- suppose you are looking
for some somewhat homologous sequence in the PDB so you can get
a couple of templates to mutate for the construction of a structure
prediction. All you want are some rough ideas to use as the basis
of the prediction.
Filtering out low-complexity sequences could make it so you have
no hits, whereas without a filter you might get some hits and
be able to use the corresponding structure as a template. Of
course, odds are that the structure will be random coil, but I'll
bet that some of the low-complexity sequences have a pretty well
defined structure; and I'll bet that SEG doesn't take structural
aspects into account.
But I could be wrong. Hmm, there's a project for someone :)
Andrew Dalke
dalke at bioreason.com