IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Searching databases with GCG and others

James Tisdall tisdall at amalthea.humgen.upenn.edu
Thu Jan 13 17:16:09 EST 1994

In article <pagilbert-130194100516 at> pagilbert at cti.ulaval.ca (Philippe-Alexandre Gilbert) writes:
>Is it possible to search PIR or other databases for sequence from a
>specific size (or a size range) from GCG? I also tried with gopher and
>keywords like #Length 300 but it doesn't work (and how to specify a range
>with gopher ?)
>Thank you for your help.
>Philippe-Alexandre Gilbert             tel: (418)-656-2964
>Centre de Traitement de l'Information  e-mail: pagilbert at cti.ulaval.ca
>Departement de Biochimie
>Quebec, Canada

Not sure about GCG - but since you request "GCG or others" - in DNA WorkBench,
free software at cbil.humgen.upenn.edu in pub/dnaworkbench via anonymous ftp,
this works:

  #for length exactly 300, in PIR-
database pir
sequence ^.{300}$ pirall

  #for length 300 or greater-
sequence ^.{300,}$ pirall

  #for length between 300 and 400-
sequence ^.{300,400}$ pirall

  #for length less than or equal to 300, in GenBank-
database genbank
sequence ^.{1,300}$ gball

The SEQUENCE command searches for sequence, which may be something like
ACCTGGGCT, or may incorporate "regular expressions", a form of "wild card"
notation much used in computer science.  
^         means starting from the beginning
.         means match any nucleotide or amino acid
{300,500} means match 300 to 500 of them
$         means match the end of the sequence.  
So, all together it means match any sequence that has 300 to 500
nucleotides or amino acids from beginning to end.
James Tisdall
Departments of Genetics and Computer and Information Science
Computational Biology and Informatics Laboratory, Human Genome Project
University of Pennsylvania

tisdall at cbil.humgen.upenn.edu
fax 215-573-3111

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net