IUBio

searching databases with GCG and others

unknown at dl.ac.uk unknown at dl.ac.uk
Mon Jan 17 04:53:20 EST 1994


From:	MX%"tisdall at AMALTHEA.HUMGEN.UPENN.EDU" 14-JAN-1994 04:35:51.02
To:	BIONET
CC:	
Subj:	Re: Searching databases with GCG and others

Return-Path: <server-daemon at DL.ac.uk>
Received: from hermes.cnrs-gif.fr by cgmvax.cgm.cnrs-gif.fr (MX V3.2) with
          SMTP; Fri, 14 Jan 1994 04:35:48 EST
X400-Received: by /PRMD=cnrs-gif/ADMD=0/C=fr/; Relayed; 14 Jan 94 04:36:21+0100
X400-Received: by /PRMD=internet/ADMD=red/C=fr/; Relayed; 14 Jan 94
    04:32:12-0500
X400-Received: by /PRMD=bitnet/ADMD=red/C=fr/; Relayed; 14 Jan 94 03:32:20+0000
X400-Received: by /PRMD=UK#d#AC/ADMD= /C=GB/; Relayed; 14 Jan 94 03:25:33+0000
Date: 14 Jan 94 03:25:33+0000
From: server-daemon <server-daemon at DL.ac.uk>, James Tisdall
      <tisdall at AMALTHEA.HUMGEN.UPENN.EDU>
Sender: server-daemon at DL.ac.uk
Message-ID: <2h4h79$r1f at NETNEWS.UPENN.EDU>
Reply-To: James Tisdall <tisdall at AMALTHEA.HUMGEN.UPENN.EDU>
000X-MX-Warning:   Warning -- Invalid "To" header.
To: bionet.software mail newsgroup <bionet-news at DL.ac.uk>
Subject: Re: Searching databases with GCG and others
Via: uk.ac.dl.pserv1; Fri, 14 Jan 1994 03:32:28 +0000
Precedence: list
Original-Sender: "bionet.software mail newsgroup" <server-daemon at dl.ac.uk>
Comments: List problems/queries to <biosci at daresbury.ac.uk>
Comments: To mail both the group and netnews send to (bio-software at dl.ac.uk)
X-Article-Number: bionet.software Msg # 4440
X-Listpath: bionet-news
X-Mailer: MXT V 12.13.5

In article <pagilbert-130194100516 at 132.203.140.7> pagilbert at cti.ulaval.ca
 (Philippe-Alexandre Gilbert) writes:
>
>Is it possible to search PIR or other databases for sequence from a
>specific size (or a size range) from GCG? I also tried with gopher and
>keywords like #Length 300 but it doesn't work (and how to specify a range
>with gopher ?)
>  ..
>Thank you for your help.
>
>--
>Philippe-Alexandre Gilbert             tel: (418)-656-2964
>Centre de Traitement de l'Information  e-mail: pagilbert at cti.ulaval.ca
>Departement de Biochimie
>Quebec, Canada


Not sure about GCG - but since you request "GCG or others" - in DNA WorkBench,
free software at cbil.humgen.upenn.edu in pub/dnaworkbench via anonymous ftp,
this works:

  #for length exactly 300, in PIR-
database pir
sequence ^.{300}$ pirall

  #for length 300 or greater-
sequence ^.{300,}$ pirall

  #for length between 300 and 400-
sequence ^.{300,400}$ pirall

  #for length less than or equal to 300, in GenBank-
database genbank
sequence ^.{1,300}$ gball

Explanation:
The SEQUENCE command searches for sequence, which may be something like
ACCTGGGCT, or may incorporate "regular expressions", a form of "wild card"
notation much used in computer science.
^         means starting from the beginning
.....         means match any nucleotide or amino acid

{300,500} means match 300 to 500 of them
$         means match the end of the sequence.
So, all together it means match any sequence that has 300 to 500
nucleotides or amino acids from beginning to end.
======================================================================
James Tisdall
Departments of Genetics and Computer and Information Science
Computational Biology and Informatics Laboratory, Human Genome Project
University of Pennsylvania

tisdall at cbil.humgen.upenn.edu
215-573-3113
fax 215-573-3111
======================================================================










More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net