Blast output

John Edward Hill hillj01 at MCRCR6.MED.NYU.EDU
Fri Feb 2 16:37:47 EST 1996

Not directly to my knowledge.  You can fairly easily edit the high scores
list in the BLAST output, however, to make such a list.  If you use EVE
(TPU) you can use its record macros (CTRL/K) ability to semi-automate the 
process.  Here's one possibility for protein databases. 

A.  First delete everything from the beginning of the first alignment to
the end of the file (if you select the beginning of the first alignment,
you can do GOLD-DOWN ARROW to select to the end in one step). 

B.  Then put two periods (full stops for you British-English types) ".."
on the line before the first score in the list. 

C.  Then record the following macro: 

1.  Do a WILDCARD FIND for  \<sp|   (NOTE: this finds "sp|" at the 
                                           beginning of a line)
2.  SELECT and FIND "|" 
3.  Move one charcter to the right with the arrow key
4.  Remove (i.e., cut)
5.  Type in the logical for Swiss-Prot on your system plus a colon ":"

Repeat that macro until there are no more matches.  Then, do the
equivalent replacement for the other databases, i.e., "\<pir|" and "\<gp|"
for proteins. 

It may look complicated, but it's much faster to do than to write!  Of
course, some real programmer could write a simple converter to do the
same, but this way you can adjust for changes in database names and
formats; also you can edit out anything not of interest before using the
file with TYPEDATA/REF. 


P.S.  The reason that I search for the second "|" is to use the LOCUS/ID 
for the FETCH instead of the accession number.  In GenPept if you FETCH 
with the accession number you are likely to get the wrong sequence 
because only one protein sequence per each DNA sequence can be FETCHed -- 
and chances are it won't be the correct one when you have multiple CDS 
John Edward Hill, Ph.D.               |    Department of Cell Biology
  212-263-7135 (Phone)                |    NYU Medical Center
  212-263-8139 (FAX)                  |    550 First Avenue 
Email: hillj01 at mcrcr.med.nyu.edu      |    New York, New York 10016  (USA)

On Fri, 2 Feb 1996, SI-Johanne Duhaime wrote:

> Bonjour
> In the output of blast program through GGC, the part of the definition is not 
> very long. Because of that sometimes we do not have enough informtation to judge 
> if the sequence is interesting or not for our work. Looking for every sequence 
> is very long.
> Is it there any way to use the output (as we do for the output of fasta) as 
> input of typedata program? Or is there any other way to have more information 
> easyly on those returned sequence?
> Thank you for the help
> -- 
> Johanne Duhaime
> 110 Ave des Pins O
> Montreal, Quebec
> 987-5556 (tel) 987-5644 (fax)
> Duhaimj at ircm.umontreal.ca

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net