GCG's FASTA anomaly

José R. Valverde txomsy at cnb.uam.es
Tue Oct 21 08:02:33 EST 1997

Hi, I've been receiving user reports of and experiencing with anomalies 
in the behaviour of GCG's FASTA recently, and I'm curious if anybody 
else has noticed them.

In short, the problem is that when FASTA is run against GenEMBL:*
with a sequence, and that sequence matches 100% a sequence in the
databases, the 100% matching sequence DOES NOT appear in the results

I.e.: if one takes a sequence from the database with "fetch" and
then runs fasta for it against GenEMBL:* then the results listing
does not contain that sequence's counterpart in the databases. Not
even if one changes the name or reduces the sequence size. 100% matches
are not listed at all. I've tested with HSU22963, in full, subsequences 
and with different names/format/info. Playing with the E values has
no effect -as expected.

This does not happen when the same query is run against EMBL or
GenBank separately, nor when run against sections or selections
of sequences made with lookup, stringsearch or wildcards, nor when
run againsts sets of files or listing files.

We have an old GenBank, plus EMBL updated nightly here, and I know 
for sure this didn't happen some time ago (not how long though) for 
we have old FASTAs with that same sequence against GenEMBL:* that 
worked correctly. So the only thing changed is the total size of 
the databases. Formats are updated nightly and should be OK since 
separate searches do work correctly.

Hence, it looks like the problem might be with some hard coded value 
inside FASTA itself, which makes it ignore higher matches when run
against too many sequences.

I haven't dwelled too deep in the problem and only looked at the
topmost score in each case -I was in a hurry to find an explanation
and possible solution for my users-, so I haven't more details, but
the anomaly is real and worries me seriously.

So the question is, has anyone else notied similar anomalies? Anyone
knows of any fix (besides not using GenEMBL)?


Jose R. Valverde

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net