In article <445odq$rrb at wn1.sci.kun.nl>, Jack Leunissen <jackl> writes:
>Tylzanowski Przemko <przemko> wrote:
>>I have to search genpept for a pattern. Well, everything worked, I got my
>>findpatterns.find file and now I am stuck. I have about 140 hits in about as
>>many genes. I would like to pull all the sequences out and have a quick look
>>but I don't knwo how to do that. Of coursde, I could do it by hand, but it
>>would be a bit tedious, especially if with some other patterns I find more
>>hits.
>>So, my question is this, how could I, easily and painlessly, retrieve the
>>sequences and comments in question. Subsequently I would like to put them into
>>pileup or smething like that.
>
I have a modified version of findpatterns called "rfindpatterns" - it
starts out like findpattern, looking for a pattern in a set of target
sequences, but it then extracts the hits with a user specified amount of
border around, and optionally replaces the pattern with a different one,
the results of each hit go to a separate file. After that they can be fed
through pileup to align them, or if they are already aligned, through
reformat to put them straight into .msf format. Works on protein and DNA.
For instance, let's say that you know that the protein motif DDX{3,5}DD
(completely fictitious motif!!!) is interesting and want to see if anything
lines up outside of it, say within 10 amino acids on either side.
Use a command like this:
$ rfindpatterns/infile=sw:*/pattern="DDX{3,4}DD" -
/lwidth=10/rwidth=10/replace=DDXDD
Here is what one of the output files looks like:
$ type R1.RFIND;
RFINDPATTERNS on: Sw:2aaa_Pea
Original file info: P36875 pisum sativum (garden pea). protein phosphatase
pp2a
regulatory subunit a (pr65) (fragment). 2/95
Matching pattern: DDX{3,4}DD
Pattern location: 15 to 21
Lwidth: 10
Rwidth: 10
Replaced matching pattern with: DDXDD
R1 Length: 25 September 26, 1995 10:30 Type: P Check: 4262 ..
1 HLKTDIMSVF DDXDDQDSFR FLAVE
You can pick up rfindpatterns.for,.txt,.cmd via FTP from
seqaxp.bio.caltech.edu, in the [.software] directory. Use ASCII transfer
mode.
ONLY SITES THAT HAVE A VALID GCG LICENSE SHOULD PICK UP THIS SOFTWARE!
Regards,
David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech