We have come across an interesting problem and wondered if anyone
had any insights or alternative strategies.
One of our users has several short polypeptide sequences believed to be
from the same gene. He wants to search portein sequence databases
to find sequences that are homologous to all of these fragments. The
user anticipates 30/40 % sequence similarity overall between the
fragments and their hits.
Searches have been done with the individual fragment sequences,
which have been useful, but the user wishes to combine them all in
one search. I would expect that others will want to be doing similar
in the future, so a simple but effective strategy would be very useful.
Thoughts so far:
1) Concaternate sequence files together and run against the databases
using the blast program.
-lose information however, because you know that you likely have gaps
between the fragments.
2) Run fasta with the different permuations of the arrangement of the
fragments, each in a different file, with a low gap penalty.
-fine for a small number of fragments, but the number of permutations
soon increases.
3) A recursive searching of the sequence databases:
First fragment -get top 500 hits
-make a database with the hits
Second fragment-get top 50 hits
-make another database with the hits.
Third fragment -get top 5 hits
The database could be made by editing the output from the
sequence search of the first fragment produced by fasta (or Blast)
and producing a file of (database) sequence names.
--
Best Regards,
Peter Woollard
-----------------------------------------------------------------------------
Computing Services Section, MRC-Clinical Internet: p.woollard at hgmp.mrc.ac.uk
Research Centre and Human Genome Mapping Janet: p.woollard at uk.ac.hgmp.mrc
Project Resource Centre, Watford Rd, EARN/Bitnet:
HARROW, Middx, HA1 3UJ, UK. p.woollard%crc at ukacrl
Tel +44 (0)81 869 3294 Fax +44 (0)81 423 1275
------------------------------------------------------------------------------