IUBio

NCBI BLAST output format: Question.

Keith Robison robison1 at husc10.harvard.edu
Thu Jan 6 17:08:17 EST 1994


I've developed a filter which converts netBLAST output to HTML, the 
hypertext format used with the World Wide Web (Mosaic, Lynx, Cello, etc).

It's a simple perl script which can be easily re-configured.
The filter adds hyperlinks to the matched database items to links to
the relevant database entries, as well as some links for moving around
the output.  It definitely works best with a GUI browser such as Mosaic`
-- it's a bit link-rich for screen-based browsers.

The main drawback is that you have to keep around separate HTML and
non-HTML files, because there isn't (yet?) a way to get an HTML browser
to run data through a filter prior to viewing.

Enjoy!

Keith Robison
Harvard University
Department of Cellular and Developmental Biology
Department of Genetics / HHMI

krobison at nucleus.harvard.edu 

-------------Cut here with virtual scissors----------------------
#!/usr/local/bin/perl

# blast2html -- converts output from NCBI Network BLAST server
#               to HTML hypertext
#               Keith Robison  November 1993     
#	        krobison at nucleus.harvard.edu
#
#  HTML Markups
#  
#  1) Database accession numbers are links to retrieve database entries
#  2) Poisson score in top summary is a link to alignment
#  3) Angle bracket at start of alignment description is link back to summary
#
#  Citation:
#   Robison, K.  A simple hypertext BLAST output browsing scheme.   
#   Unpublished.
#
#  Freedom to use and modify this program is granted so long as the 
#  citation above remains intact and modifications are documented.
#  
#  <a href="http://golgi.harvard.edu/blast2html.pl">Current Version</a>

$Acc   = '[A-Z]\d{5}';  # Regexp for GB/EMBL/DDJB/SP accession number
$PIRAcc   = '[A-Z][A-Z0-9]\d{4}';  # Regexp for PIR accession number
$Word  = '\w*';         # Regexp for a word

# WWW link stems for databases
$Embl  = 'http://golgi.harvard.edu/htbin/expasygate?get-embl-entry?';
$Genbank = 'http://golgi.harvard.edu/htbin/getseq-gb-a?';
  # EMBL has richer hyperlinks, but ExPasy currently has only subset
$Pir = 'http://golgi.harvard.edu/htbin/getseq-pir-a?';

# Choose the desired SwissProt server
$SwissProt = 'http://golgi.harvard.edu/htbin/expasygate?get-sprot-entry?';
# $SwissProt = 'http://expasy.


while ($_ = <ARGV>)
  {

  #Beginning of report body stuff -- title, 'pre-formatted' instruction
  s#Query= *($Word)(.*)#<TITLE>$1</TITLE><PRE>Query=  <H1>$1</H1><H2>$2</H2>#o;

  #generate section markers at alignment
  s#^>($Word)\|($Acc)\|($Word) #<a href="\#$1_$2_$3_H">></A>$1|$2|$3<a name="$1_$2_$3_A"> <A>#o;

  # make Poisson score link to alignment
  #  note: $6 is in regexp to prevent premature matches
  #    1       2        3       4    5          6
  #    db     acc      loc     des  poi        cnt
  s#^($Word)\|($Acc)\|($Word) (.* )(\d[0-9e\-\.]{2,})( *\d*$)#$1|$2|$3<a name="$1_$2_$3_H"> </A>$4<a href="\#$1_$2_$3_A">$5</A>$6#o;
 
  # make database links
  s# ($Acc) # <a href="$Genbank$1">$1</A> #go;             # "Naked" acc->GB
  s#(g[pb]u?)\|($Acc)#$1\|<a href="$Genbank$2">$2</A>#go;  # GenBank/GenPept
  s#(embu?)\|($Acc)#$1\|<a href="$Embl$2">$2</A>#go;       # Embl
  s#(spu?)\|($Acc)#$1\|<a href=\"$SwissProt$2\">$2</A>#go; # SwissProt
  s#pir\|($PIRAcc)#pir\|<a href="$Pir$1">$1</A>#go;           # PIR
  #
  # PDB --> ?
  # dbest --> ?

  print $_;
}
print "</PRE>";








More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net