IUBio

findpatterns

Tim Bolling bollingt at ugene1.abbott.com
Thu May 26 11:22:03 EST 1994


Keith writes:
> I got a note from Mike Hogan at GCG, that confirmed this.  However, when I looked
> at a sample of my search results, I did not find a single sequence of
> length>50.  When I searched OWL 22 (72,017 sequences) I found 2,320 sequences
> that fit the criteria.  I also did a couple of searches leaving out the end
> constraints....  with the cterm contraint removed, I found 65,845 hits.  The
> others are still running... :)
> 
> I don't know if others have the same results, but it appears that my
> findpatterns is working just fine.  I am running the vms version, at level 7.3.

I think that Keith's suggestion (for finding valine-less 50-mers or less) was
practically brilliant in its simplicity, however, as Conrad and Mike stated, I
don't think it works on version 7.3 (I know that it doesn't on the Solaris 
version).  On our system, when you use the pattern "<~V{1,49}~V>" what appears
to come up as matches are proteins that do not have a valine in their last
50 residues.  In fact, the program appears to convert the query to ~V{49,49}~V>
(49 non-valines followed by a non-valine at the end of the peptide): 


          A1AA_HUMAN  ck: 3876  len: 501   ! P25100 homo sapiens (human). 
alpha-1a adrenergic receptor. 5/92

1                     <~V{1,49}~V>
                        ~V{49}~V
           452: SHPAP SASGGCWGRSGDPRPSCAPKSPACRTRSPPGARSAQRQRAPSAQRWRLCP      


          A1AB_CANFA  ck: 5600  len: 417   ! P11615 canis familiaris (dog). 
alpha-1b adrenergic receptor (fragment). 8/

1                     <~V{1,49}~V>
                        ~V{49}~V
           368: PGRRG RRDSGPLFTFRLLAERGSPAAGDGACRPAPDAANGQPGFKTNMPLAPGQF      

************************

However, if only one of the two contraints are used the "contrainter"
appears to work properly (although I would have expected 49 matches per
peptide ranging in length from 2 to 50):

Using <~V{1,49}~V gives the 1st 2aas from each protein in the database:

          A103_SCHMA  ck: 2206  len: 263   ! P13492 schistosoma mansoni (blood fluke). antigen 10-3 precursor. 1/90

1                     <~V{1,49}~V
                         ~V~V
             1:           MN      IYLIG


          A1A1_MOUSE  ck: 7845  len: 213   ! P07758 mus musculus (mouse). alpha-1 antitrypsin 1 (alpha-1 protease inhib

1                     <~V{1,49}~V
                         ~V~V
             1:           SP      ANYIL

*************************


To confirm my suspicions about how findpatterns was interpreting the
"<~V{1,49}~V>" query I repeated the findpatterns with the query
"~V{49,49}~V>" and got identical results (ie: "<~V{1,49}~V>" == "~V{49,49}~V>").It appears that the contraint symbols "<", ">", may work "properly" when used 
independently, but the results are not as expected when both symbols are used inthe same pattern.

Tim Bolling
bollingt at ugene1.abbott.com



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net