Pearson & Lipman's fasta local homology search by e-mail is now
available as well as search/retrieval of database entries at
flat-netserv at smlab.eg.gunma-u.ac.jp.
See below for details.
---------------------------------------------------------------------------
FLAT DB E-Mail Network Server
Sanzo Miyazawa
Gunma Univ., Faculty of Technology
FAX: +81 277 40 1026
Phone: +81 277 22 3181 ext. 262
E-mail address for the server: flat-netserv at smlab.eg.gunma-u.ac.jp
E-mail address for inquiries: sanzo.miyazawa at smlab.eg.gunma-u.ac.jp
or smiyazaw at smlab.eg.gunma-u.ac.jp
Following commands are available:
- Commands must be written not at the "Subject:" field but in a mail.
- Command names and others are case sensitive, unless specified.
- Output may be limited to 2400 lines.
- Character strings may be represented in the regular expression.
man command
Output a manual for the command; only available for some commands.
1. Search/Retrieval Commands
scandir db-name [options] 'keyword[|keyword...]' ['keyword[|keyword...]'] ...
Scan directory files of the "db-name" database to find "keywords"
and output entry names and their definitions; keywords should be
expressed in the regular expression, that is,
key-1|key-2 key3 means "(key-1 or key-2) and key-3"
Options:
-i case insensitive
scanjou db-name ['journal'] ['vol:'['page[-page]']] ['(year)']
Scan journal index files of the "db-name" database to find specified
journals and output journal names and corresponding entry names.
Journal names in the command line are not case sensitive.
scanaut db-name 'Last-name,[First.Middle-Initial.]' ...
Scan author index files of the "db-name" database to find specified
author names and output author names and corresponding entry names
with their definitions.
Author names in the command line are not case sensitive.
scanacc db-name '#acc' ...
Scan accession number index files of the "db-name" database to find
specified accession numbers and output corresponding entry names
with their definitions.
scandb db-name [-1] [-o] {['entry'...]|[-a '#acc'...]}
Scan the "db-name" database to find specified entries or accession
numbers and output those entries.
Options:
-1 'Entry' or '#acc' may specify multiple entries in the DB.
-o The order of arguments is not significant; the order of entries
output may not be in the order specified in the command line.
Available databases which may be specified in scan commands:
db-name = gb | embl | ddbj | gp | swiss | pir | prf
gb or genbank: GenBank DNA database
Regular release and new entries which are updated
twice a day.
embl: EMBL DNA database; regular release + new entries
ddbj: DDBJ DNA database; regular release + new entries
It is included in the GenBank and EMBL DBs.
gp or genpept: GenBank Gene Product Database;
protein database translated from GenBank DNA database
swiss: SwissProt protein database
pir: PIR protein database
prf: Protein Research Foundation peptide database
Command names and others are case sensitive, unless specified.
Output may be limited to 2400 lines.
Examples:
Commands must be written not at the "Subject:" field but in a mail.
"|" is not ";" but "bar".
scandir gb -i 'oncogene' 'human' # oncogene and human
scanjou gb 'J. Biochem.' '107:316-323' '(1990)' # case insensitive
scanjou gb 'J. Biochem.' '107:' # vol. 107
scanjou gb 'J. Biochem.' '(1990)' # 1990 issues
scanjou gb '(1990)' # all 1990 issues
scanaut gb 'Miyazawa,S.' # case insensitive
scanacc gb 'M11391' 'd00611' # case insensitive
scandb gb 'AGMERLTR1' 'musbas' # entries
scandb gb -a 'M11391' 'd00611' # accession numbers
scandb gb 'ECO.*' # all ECO.*
scandir gb -i ' e.*coli' | scandb gb # try to collect E. coli
scanjou gb 'J. Biochem,' '(1991)' | scandb gb
2. Commands for Homology Search
tmpfile filename
Create a temporal file named "filename"; this must be used such as
tmpfile seq-1 <<'*** END ***'
...
...
*** END ***
In the example above, "seq-1" includes lines just before '*** END ***'.
scandb gb hcemle | tmpfile hcemle
In this case, the entry HCEMLE is retrieved into a file named hcemle.
Sequence file formats which are supported in the following commands are
GenBank, EMBL, PIR, SwissProt, PRF formats
and also simple format shown below.
> title # Title; mandatory
.... # Sequence in one letter representation;
# case insensitive; numbers are ignored.
// # This line is optional.
fasta [-o #scores_to_be_printed ] [ -c cutoff ] test_seq. database [ktup]
Fasta (v.1.3) search of Pearson & Lipman for local homology.
Ex. fasta -o 40 -c 1 test_seq. $GB/gbpri.seq 3
See manual; man fasta
See Pearson, W. R. and Lipman, D. J. "Improved Tools for
Biological Sequence Analysis", Proc. Natl. Acad. Sci. USA
85:2444-2448 (1988).
tfasta [-o #scores_to_be_printed ] [ -c cutoff ] test_seq. database [ktup]
Fasta (v.1.3) search of Pearson & Lipman by comparing test_seq. of
amino acids with a DNA database translated into amino acid sequences.
Ex. fasta -o 40 -c 1 test_seq. $EMBL/emblman.seq 1
See manual; man tfasta
lfasta [ -c cutoff ] test_seq. target_seq. [ktup]
Local homology search of Pearson & Lipman.
Ex. fasta -o 40 -c 1 test_seq. target_seq. 1
See manual; man lfasta
RDF2 [ -c cutoff ] test_seq. shuffled_seq. [ktup] [#shuffle]
Evaluate statistical significance of sequence matching;
modified Pearson & Lipman's rdf2 with lfasta alignment.
See manual; man rdf2
RDF2G [ -c cutoff ] test_seq. shuffled_seq. [ktup] [#shuffle]
RDF2 with local shuffle; modified Pearson & Lipman's rdf2g.
RDF2W [ -c cutoff ] test_seq. shuffled_seq. [ktup] [#shuffle] [window_size]
RDF2 with optimal score calculated by using a global
alignment routine; modified Pearson & Lipman's rdf2.
RDF2WG [ -c cutoff ] test_seq. shuffled_seq. [ktup] [#shuffle] [window_size]
RDF2 with local shuffle and optimal score calculated by
using a global alignment routine.
Databases which can be specified in "database" arguments above:
@gb or @genbank All sequence files of GenBank including new entries
@embl All sequence files of EMBL including new entries
@ddbj All sequence files of DDBJ including new entries
@pir All sequence files of PIR
@swiss All sequence files of SwissProt
@prf All sequence files of PRF
@gp or @genpept All sequence files of GenPept
or a sequence file of each taxonomical division:
$GB/gbbct.seq, gbinv.seq, gbmam.seq, gborg.seq, gbphg.seq, gbpln.seq,
gbpri.seq, gbrna.seq, gbrod.seq, gbsyn.seq, gbuna.seq, gbvrl.seq,
gbvrt.seq
$GBNEW/gbnew.seq
$EMBL/emblfun.seq, emblinv.seq, emblmam.seq, emblorg.seq, emblphg.seq,
emblpln.seq, emblpri.seq, emblpro.seq, emblrod.seq, emblsyn.seq,
embluna.seq, emblvrl.seq, emblvrt.seq,
$EMBLNEW/emblnew.seq
$DDBJ/ddbj.seq
$DDBJNEW/ddbjnew.seq
$GENPEPT/gp.seq
$PIR/pir1.seq, pir2.seq, pir3.seq
$SWISS/swiss.prot
$PRF/prf.seq
For details, use "set" commands to see environmental variables defined.
Examples: Multiple commands may be included in a mail.
tmpfile seq-1 <<'*** END ***' # create seq-1 file
> seq-1
atcg ATCG gcta
*** END ***
fasta -o 60 seq-1 $GB/gbpri.seq 3
fasta -o 60 seq-1 $GB/gbmam.seq 3
tmpfile db <<"*** END ***" # double quotation in this case
$EMBL/emblpri.seq
$EMBL/emblmam.seq
$EMBL/emblrod.seq
$EMBL/emblvrt.seq
*** END ***
fasta -o 40 seq-1 @db 6 # search over files written in "db"
tmpfile aa_seq <<'*** END ***'
> aa_seq
1 G D V E K G K K I F I M K C S Q C H T V E K G G K H K T G P
31 N L H G L F G R K T G
*** END ***
fasta -o 20 -c 1 aa_seq @pir 1 # pir is the predefined file.
tfasta -o 20 -c 1 aa_seq @embl 1 # embl is the predefined file.
tfasta -o 20 -c 1 aa_seq $GB/gbbct.seq 1
scandb pir ccsp | tmpfile ccsp.seq
scandb pir CCTW5T | tmpfile cctw5t.seq
RDF2 -c 1 ccsp.seq cctw5t.seq 1 100
------------
However, please note that fasta search over a whole DNA database
takes a lot of time.