IUBio

Parsing findpatterns output

Michael_A_Lonetto at sbphrd.com Michael_A_Lonetto at sbphrd.com
Wed Oct 1 08:56:10 EST 1997






To:       jeanmougin at titus.u-strasbg.fr
cc:       info-gcg at net.bio.net
From:     Michael A Lonetto @ SB_PHARM_RD
Date:     01-Oct-97 14:57:05
Subject:  Re: Parsing findpatterns output
Categories:

Hi,

I have a perl script that solves a similar problem, but needs a little work
to get to what you want.  I wrote it as the second step (after
FindPatterns) of a system for emulating pcr (extracts sequences between
primer pairs to a new file).  You will need to adjust the "print" routines
to get the output you're looking for:

It was one of my first perl scripts, so it's a little clunky and heavily
commented.  Here it is (called "fptocsv"):
<---------------------------------CUT
HERE------------------------------------>
#!/usr/local/bin/perl
###################################
# fptocsv: USAGE:  fptocsv [<] INFILE > OUTFILE
# converts GCG FindPatterns output to comma separated table format.
# output format:
# !PrimerName,Dir,SeqName,Len,Pos,Mis
# includes primer name, direction (F or R), Sequence Name, Seq Length,
# Position within the sequence of the primer match, # of mismatches.
# Michael_A_Lonetto at sbphrd.com  (610)-917-6960 .
###################################
print "!PrimerName,Dir,SeqName,Len,Pos,Mis\n";
while ($line = <>) {
    $line =~ tr/a-z/A-Z/;
    chomp $line;
    if ($line !~ /^\s+\w+ +CK:/) {  #not starting a new seq
          if ($line =~ /^\w+/ ){  # primer name match
          $line =~ s/^(\w+)\s(\/REV|\s{4}).*$/$1 $2/;
                    # print "found primer $1  $2 ";
          $pname = $1;
               if ($2 =~ /\REV/) {
                         $pdir = "R";
               } else {
                    $pdir = "F";
                   }
#                   print "$pname,$pdir,$sname,$slen,";
        } elsif ( $line =~ /^\s+[0-9,]+:/) {
          $line =~ s/^\s+([0-9,]+)\b.*([A-Z ]$|[1-3]$)/$1  x $2 /;
               $ppos = $1;        #primer position in the sequence
               $ppos =~ tr/,//d;  #strip commas for csv output
               $mis = $2;
               ($mis =~ s/([1-3])$/$1/) or $mis=0;
          #findpattens only produces mismatch number if there is a mismatch
               print "$pname,$pdir,$sname,$slen,$ppos,$mis\n";
     }
    }
    elsif ($line =~ /^\s+\w+/){  # new sequence
     $line =~ /^\s+(\w+)\s+CK.*?LEN:\D*([0-9,]+).*$/;
     $sname = $1;
     $slen = $2;
     $slen =~ tr/,//d; #strip commas for csv output
    }

}
<-----------------CUT HERE--------------------------->




pingouin at chouchen.u-strasbg.fr on 01-Oct-1997 08:18 AM



Please respond to jeanmougin at titus.u-strasbg.fr

To:   info-gcg
cc:    (bcc: Michael A Lonetto)
Subject:  Parsing findpatterns output




     Hi all,
     If you use findpatterns, you can have this (standard output):
           BT09237_1  ck: 8916  len: 910   ! U09237 product: "auxilin";...
1                     (L,I,V,M,F)HCXXGXXX(S,T,C)(S,T,A,G)X(L,I,V,M,F,Y)
                                    (V)HCxxGxxx(S)(S)x(L)
           159: KNVCV                   VHCLDGRAASSIL
VGAMF

     Or this (FOSN output):
     TRREL:BT09237_1  ck: 8916  len: 910    finds: 1    ! U09237 product:
...

     But what I need is this:
TRREL:BT09237_1 Begin:100 End:200 ! comments
     So that I can use this as input for other programmes. I could do
such programme, but if someone already done, this could be faster.
                    Thanks for any help,
                                   Fran?ois.
--
Fran?ois Jeanmougin     | groupe de bioinformatique / bioinformatics
groupe
tel:(+33) 3 88 65 32 71 | IGBMC BP 163 67404 Illkirch France













More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net