Tim Bolling bollingt at ugene1.abbott.com
Thu May 26 11:22:03 EST 1994

Keith writes:
> I got a note from Mike Hogan at GCG, that confirmed this.  However, when I looked
> at a sample of my search results, I did not find a single sequence of
> length>50.  When I searched OWL 22 (72,017 sequences) I found 2,320 sequences
> that fit the criteria.  I also did a couple of searches leaving out the end
> constraints....  with the cterm contraint removed, I found 65,845 hits.  The
> others are still running... :)
> I don't know if others have the same results, but it appears that my
> findpatterns is working just fine.  I am running the vms version, at level 7.3.

I think that Keith's suggestion (for finding valine-less 50-mers or less) was
practically brilliant in its simplicity, however, as Conrad and Mike stated, I
don't think it works on version 7.3 (I know that it doesn't on the Solaris 
version).  On our system, when you use the pattern "<~V{1,49}~V>" what appears
to come up as matches are proteins that do not have a valine in their last
50 residues.  In fact, the program appears to convert the query to ~V{49,49}~V>
(49 non-valines followed by a non-valine at the end of the peptide): 

          A1AA_HUMAN  ck: 3876  len: 501   ! P25100 homo sapiens (human). 
alpha-1a adrenergic receptor. 5/92

1                     <~V{1,49}~V>

          A1AB_CANFA  ck: 5600  len: 417   ! P11615 canis familiaris (dog). 
alpha-1b adrenergic receptor (fragment). 8/

1                     <~V{1,49}~V>


However, if only one of the two contraints are used the "contrainter"
appears to work properly (although I would have expected 49 matches per
peptide ranging in length from 2 to 50):

Using <~V{1,49}~V gives the 1st 2aas from each protein in the database:

          A103_SCHMA  ck: 2206  len: 263   ! P13492 schistosoma mansoni (blood fluke). antigen 10-3 precursor. 1/90

1                     <~V{1,49}~V
             1:           MN      IYLIG

          A1A1_MOUSE  ck: 7845  len: 213   ! P07758 mus musculus (mouse). alpha-1 antitrypsin 1 (alpha-1 protease inhib

1                     <~V{1,49}~V
             1:           SP      ANYIL


To confirm my suspicions about how findpatterns was interpreting the
"<~V{1,49}~V>" query I repeated the findpatterns with the query
"~V{49,49}~V>" and got identical results (ie: "<~V{1,49}~V>" == "~V{49,49}~V>").It appears that the contraint symbols "<", ">", may work "properly" when used 
independently, but the results are not as expected when both symbols are used inthe same pattern.

Tim Bolling
bollingt at ugene1.abbott.com

