searching databases with GCG and others

unknown at dl.ac.uk unknown at dl.ac.uk
Mon Jan 17 04:53:20 EST 1994

From:	MX%"tisdall at AMALTHEA.HUMGEN.UPENN.EDU" 14-JAN-1994 04:35:51.02
Subj:	Re: Searching databases with GCG and others

Return-Path: <server-daemon at DL.ac.uk>
Received: from hermes.cnrs-gif.fr by cgmvax.cgm.cnrs-gif.fr (MX V3.2) with
          SMTP; Fri, 14 Jan 1994 04:35:48 EST
X400-Received: by /PRMD=cnrs-gif/ADMD=0/C=fr/; Relayed; 14 Jan 94 04:36:21+0100
X400-Received: by /PRMD=internet/ADMD=red/C=fr/; Relayed; 14 Jan 94
X400-Received: by /PRMD=bitnet/ADMD=red/C=fr/; Relayed; 14 Jan 94 03:32:20+0000
X400-Received: by /PRMD=UK#d#AC/ADMD= /C=GB/; Relayed; 14 Jan 94 03:25:33+0000
Date: 14 Jan 94 03:25:33+0000
From: server-daemon <server-daemon at DL.ac.uk>, James Tisdall
      <tisdall at AMALTHEA.HUMGEN.UPENN.EDU>
Sender: server-daemon at DL.ac.uk
Message-ID: <2h4h79$r1f at NETNEWS.UPENN.EDU>
Reply-To: James Tisdall <tisdall at AMALTHEA.HUMGEN.UPENN.EDU>
000X-MX-Warning:   Warning -- Invalid "To" header.
To: bionet.software mail newsgroup <bionet-news at DL.ac.uk>
Subject: Re: Searching databases with GCG and others
Via: uk.ac.dl.pserv1; Fri, 14 Jan 1994 03:32:28 +0000
Precedence: list
Original-Sender: "bionet.software mail newsgroup" <server-daemon at dl.ac.uk>
Comments: List problems/queries to <biosci at daresbury.ac.uk>
Comments: To mail both the group and netnews send to (bio-software at dl.ac.uk)
X-Article-Number: bionet.software Msg # 4440
X-Listpath: bionet-news
X-Mailer: MXT V 12.13.5

In article <pagilbert-130194100516 at> pagilbert at cti.ulaval.ca
 (Philippe-Alexandre Gilbert) writes:
>Is it possible to search PIR or other databases for sequence from a
>specific size (or a size range) from GCG? I also tried with gopher and
>keywords like #Length 300 but it doesn't work (and how to specify a range
>with gopher ?)
>  ..
>Thank you for your help.
>Philippe-Alexandre Gilbert             tel: (418)-656-2964
>Centre de Traitement de l'Information  e-mail: pagilbert at cti.ulaval.ca
>Departement de Biochimie
>Quebec, Canada

Not sure about GCG - but since you request "GCG or others" - in DNA WorkBench,
free software at cbil.humgen.upenn.edu in pub/dnaworkbench via anonymous ftp,
this works:

  #for length exactly 300, in PIR-
database pir
sequence ^.{300}$ pirall

  #for length 300 or greater-
sequence ^.{300,}$ pirall

  #for length between 300 and 400-
sequence ^.{300,400}$ pirall

  #for length less than or equal to 300, in GenBank-
database genbank
sequence ^.{1,300}$ gball

The SEQUENCE command searches for sequence, which may be something like
ACCTGGGCT, or may incorporate "regular expressions", a form of "wild card"
notation much used in computer science.
^         means starting from the beginning
.....         means match any nucleotide or amino acid

{300,500} means match 300 to 500 of them
$         means match the end of the sequence.
So, all together it means match any sequence that has 300 to 500
nucleotides or amino acids from beginning to end.
James Tisdall
Departments of Genetics and Computer and Information Science
Computational Biology and Informatics Laboratory, Human Genome Project
University of Pennsylvania

tisdall at cbil.humgen.upenn.edu
fax 215-573-3111

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net