In article <62i959$lr at acebo.sdi.uam.es>,
José R. Valverde <txomsy at cnb.uam.es> wrote:
>>Hi, I've been receiving user reports of and experiencing with anomalies
>in the behaviour of GCG's FASTA recently, and I'm curious if anybody
>else has noticed them.
>>In short, the problem is that when FASTA is run against GenEMBL:*
>with a sequence, and that sequence matches 100% a sequence in the
>databases, the 100% matching sequence DOES NOT appear in the results
>listing.
>>I.e.: if one takes a sequence from the database with "fetch" and
>then runs fasta for it against GenEMBL:* then the results listing
>does not contain that sequence's counterpart in the databases. Not
>even if one changes the name or reduces the sequence size. 100% matches
>are not listed at all. I've tested with HSU22963, in full, subsequences
>and with different names/format/info. Playing with the E values has
>no effect -as expected.
>>This does not happen when the same query is run against EMBL or
>GenBank separately, nor when run against sections or selections
>of sequences made with lookup, stringsearch or wildcards, nor when
>run againsts sets of files or listing files.
>>We have an old GenBank, plus EMBL updated nightly here, and I know
>for sure this didn't happen some time ago (not how long though) for
>we have old FASTAs with that same sequence against GenEMBL:* that
>worked correctly. So the only thing changed is the total size of
>the databases. Formats are updated nightly and should be OK since
>separate searches do work correctly.
>>Hence, it looks like the problem might be with some hard coded value
>inside FASTA itself, which makes it ignore higher matches when run
>against too many sequences.
>>I haven't dwelled too deep in the problem and only looked at the
>topmost score in each case -I was in a hurry to find an explanation
>and possible solution for my users-, so I haven't more details, but
>the anomaly is real and worries me seriously.
>>So the question is, has anyone else notied similar anomalies? Anyone
>knows of any fix (besides not using GenEMBL)?
Comparison on my machine using GCG 9.0 worked fine, and the sequence
found itself (comparing against our GenEMBL, which is a non-redundant
composite of EMBL 51 and GenBank 103).
I would check your genembl farm that it isn't missing any divisions.
Tim.