To: jeanmougin at titus.u-strasbg.fr
cc: info-gcg at net.bio.net
From: Michael A Lonetto @ SB_PHARM_RD
Date: 01-Oct-97 14:57:05
Subject: Re: Parsing findpatterns output
Categories:
Hi,
I have a perl script that solves a similar problem, but needs a little work
to get to what you want. I wrote it as the second step (after
FindPatterns) of a system for emulating pcr (extracts sequences between
primer pairs to a new file). You will need to adjust the "print" routines
to get the output you're looking for:
It was one of my first perl scripts, so it's a little clunky and heavily
commented. Here it is (called "fptocsv"):
<---------------------------------CUT
HERE------------------------------------>
#!/usr/local/bin/perl
###################################
# fptocsv: USAGE: fptocsv [<] INFILE > OUTFILE
# converts GCG FindPatterns output to comma separated table format.
# output format:
# !PrimerName,Dir,SeqName,Len,Pos,Mis
# includes primer name, direction (F or R), Sequence Name, Seq Length,
# Position within the sequence of the primer match, # of mismatches.
# Michael_A_Lonetto at sbphrd.com (610)-917-6960 .
###################################
print "!PrimerName,Dir,SeqName,Len,Pos,Mis\n";
while ($line = <>) {
$line =~ tr/a-z/A-Z/;
chomp $line;
if ($line !~ /^\s+\w+ +CK:/) { #not starting a new seq
if ($line =~ /^\w+/ ){ # primer name match
$line =~ s/^(\w+)\s(\/REV|\s{4}).*$/$1 $2/;
# print "found primer $1 $2 ";
$pname = $1;
if ($2 =~ /\REV/) {
$pdir = "R";
} else {
$pdir = "F";
}
# print "$pname,$pdir,$sname,$slen,";
} elsif ( $line =~ /^\s+[0-9,]+:/) {
$line =~ s/^\s+([0-9,]+)\b.*([A-Z ]$|[1-3]$)/$1 x $2 /;
$ppos = $1; #primer position in the sequence
$ppos =~ tr/,//d; #strip commas for csv output
$mis = $2;
($mis =~ s/([1-3])$/$1/) or $mis=0;
#findpattens only produces mismatch number if there is a mismatch
print "$pname,$pdir,$sname,$slen,$ppos,$mis\n";
}
}
elsif ($line =~ /^\s+\w+/){ # new sequence
$line =~ /^\s+(\w+)\s+CK.*?LEN:\D*([0-9,]+).*$/;
$sname = $1;
$slen = $2;
$slen =~ tr/,//d; #strip commas for csv output
}
}
<-----------------CUT HERE--------------------------->
pingouin at chouchen.u-strasbg.fr on 01-Oct-1997 08:18 AM
Please respond to jeanmougin at titus.u-strasbg.fr
To: info-gcg
cc: (bcc: Michael A Lonetto)
Subject: Parsing findpatterns output
Hi all,
If you use findpatterns, you can have this (standard output):
BT09237_1 ck: 8916 len: 910 ! U09237 product: "auxilin";...
1 (L,I,V,M,F)HCXXGXXX(S,T,C)(S,T,A,G)X(L,I,V,M,F,Y)
(V)HCxxGxxx(S)(S)x(L)
159: KNVCV VHCLDGRAASSIL
VGAMF
Or this (FOSN output):
TRREL:BT09237_1 ck: 8916 len: 910 finds: 1 ! U09237 product:
...
But what I need is this:
TRREL:BT09237_1 Begin:100 End:200 ! comments
So that I can use this as input for other programmes. I could do
such programme, but if someone already done, this could be faster.
Thanks for any help,
Fran?ois.
--
Fran?ois Jeanmougin | groupe de bioinformatique / bioinformatics
groupe
tel:(+33) 3 88 65 32 71 | IGBMC BP 163 67404 Illkirch France