GCG's FASTA anomaly

Tim Cutts tjrc1 at mole.bio.cam.ac.uk
Tue Oct 21 11:01:04 EST 1997

In article <62i959$lr at acebo.sdi.uam.es>,
José R. Valverde <txomsy at cnb.uam.es> wrote:
>Hi, I've been receiving user reports of and experiencing with anomalies 
>in the behaviour of GCG's FASTA recently, and I'm curious if anybody 
>else has noticed them.
>In short, the problem is that when FASTA is run against GenEMBL:*
>with a sequence, and that sequence matches 100% a sequence in the
>databases, the 100% matching sequence DOES NOT appear in the results
>I.e.: if one takes a sequence from the database with "fetch" and
>then runs fasta for it against GenEMBL:* then the results listing
>does not contain that sequence's counterpart in the databases. Not
>even if one changes the name or reduces the sequence size. 100% matches
>are not listed at all. I've tested with HSU22963, in full, subsequences 
>and with different names/format/info. Playing with the E values has
>no effect -as expected.
>This does not happen when the same query is run against EMBL or
>GenBank separately, nor when run against sections or selections
>of sequences made with lookup, stringsearch or wildcards, nor when
>run againsts sets of files or listing files.
>We have an old GenBank, plus EMBL updated nightly here, and I know 
>for sure this didn't happen some time ago (not how long though) for 
>we have old FASTAs with that same sequence against GenEMBL:* that 
>worked correctly. So the only thing changed is the total size of 
>the databases. Formats are updated nightly and should be OK since 
>separate searches do work correctly.
>Hence, it looks like the problem might be with some hard coded value 
>inside FASTA itself, which makes it ignore higher matches when run
>against too many sequences.
>I haven't dwelled too deep in the problem and only looked at the
>topmost score in each case -I was in a hurry to find an explanation
>and possible solution for my users-, so I haven't more details, but
>the anomaly is real and worries me seriously.
>So the question is, has anyone else notied similar anomalies? Anyone
>knows of any fix (besides not using GenEMBL)?

Comparison on my machine using GCG 9.0 worked fine, and the sequence
found itself (comparing against our GenEMBL, which is a non-redundant
composite of EMBL 51 and GenBank 103).

I would check your genembl farm that it isn't missing any divisions.


More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net