searching for sequences homologus to fragments

Peter Woollard x3296 pwoollar at crc.ac.uk
Mon Nov 8 12:20:49 EST 1993

We have come across an interesting problem and wondered if anyone
had any insights or alternative strategies.

One of our users has several short polypeptide sequences believed to be
from the same gene. He wants to search portein sequence databases
to find sequences that are homologous to all of these fragments. The
user anticipates 30/40 % sequence similarity overall between the 
fragments and their hits.

Searches have been done with the individual fragment sequences,
which have been useful, but the user wishes to combine them all in
one search. I would expect that others will want to be doing similar
in the future, so a simple but effective strategy would be very useful.
Thoughts so far:

1) Concaternate sequence files together and run against the databases
   using the blast program.
    -lose information however, because you know that you likely have gaps
     between the fragments.

2) Run fasta with the different permuations of the arrangement of the
   fragments, each in a different file, with a low gap penalty.
   -fine for a small number of fragments, but the number of permutations
    soon increases.

3) A recursive searching of the sequence databases:

    First fragment -get top 500 hits
                   -make a database with the hits
    Second fragment-get top 50 hits
                   -make another database with the hits.
    Third fragment -get top 5 hits

    The database could be made by editing the output from the
    sequence search of the first fragment produced by fasta (or Blast)
    and producing a file of (database) sequence names. 

Best Regards,
             Peter Woollard

Computing Services Section, MRC-Clinical   Internet: p.woollard at hgmp.mrc.ac.uk
Research Centre and Human Genome Mapping   Janet:    p.woollard at uk.ac.hgmp.mrc
Project Resource Centre, Watford Rd,       EARN/Bitnet:
HARROW, Middx, HA1 3UJ, UK.                         p.woollard%crc at ukacrl
Tel +44 (0)81 869 3294   Fax +44 (0)81 423 1275

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net