In article <pagilbert-130194100516 at 132.203.140.7> pagilbert at cti.ulaval.ca (Philippe-Alexandre Gilbert) writes:
>>Is it possible to search PIR or other databases for sequence from a
>specific size (or a size range) from GCG? I also tried with gopher and
>keywords like #Length 300 but it doesn't work (and how to specify a range
>with gopher ?)
>>Thank you for your help.
>>--
>Philippe-Alexandre Gilbert tel: (418)-656-2964
>Centre de Traitement de l'Information e-mail: pagilbert at cti.ulaval.ca>Departement de Biochimie
>Quebec, Canada
Not sure about GCG - but since you request "GCG or others" - in DNA WorkBench,
free software at cbil.humgen.upenn.edu in pub/dnaworkbench via anonymous ftp,
this works:
#for length exactly 300, in PIR-
database pir
sequence ^.{300}$ pirall
#for length 300 or greater-
sequence ^.{300,}$ pirall
#for length between 300 and 400-
sequence ^.{300,400}$ pirall
#for length less than or equal to 300, in GenBank-
database genbank
sequence ^.{1,300}$ gball
Explanation:
The SEQUENCE command searches for sequence, which may be something like
ACCTGGGCT, or may incorporate "regular expressions", a form of "wild card"
notation much used in computer science.
^ means starting from the beginning
. means match any nucleotide or amino acid
{300,500} means match 300 to 500 of them
$ means match the end of the sequence.
So, all together it means match any sequence that has 300 to 500
nucleotides or amino acids from beginning to end.
======================================================================
James Tisdall
Departments of Genetics and Computer and Information Science
Computational Biology and Informatics Laboratory, Human Genome Project
University of Pennsylvania
tisdall at cbil.humgen.upenn.edu
215-573-3113
fax 215-573-3111
======================================================================