IUBio

Is GenBank in a tail spin?

Keith Robison robison1 at husc10.harvard.edu
Fri Jul 3 09:54:26 EST 1992


The type of error Tom is describing was present in over 200 entries
in GenBank 69/70/71 (I have not run the count on GenBank 72).  All of
the entries have release dates from last summer.
When I inquired about them back in Nov/Dec, I was told that the problem
was:

	1) Well known
	2) Supposedly due to a problem at EMBL/with EMBL-->GenBank program
	3) Too laborious to fix anytime soon.


In my opinion #3 is utter hogwash, as I can generate the list of 
problem sequences automatically and with a slight extension of the
software generate corrected CDS and mRNA keys (in all of these entries
which I have seen, both the CDS and mRNA keys are screwed up).
	The algorithm for finding these entries is to count the
number of exon (e), intron (i), CDS (c), and mRNA (r) keys.
If e=0, and (c=m=i+1 or c=m=i), then the entry falls into this
category.  This method finds most of the problems, and has an
extremely low false positive rate.  Any other entries with e=0 
and i<>0 are suspect. C/C++/UNIX source code to 
find these entries is available on request, as well as a list
of the entries (I'll do the GenBank 72 run pronto).



Keith Robison
Harvard University
Program in Biochemistry, Molecular, Cellular, and Developmental Biology

robison at ribo.harvard.edu 



More information about the Bioforum mailing list

Send comments to us at biosci-help [At] net.bio.net