In article <1992Jul2.234102.23832 at bronze.ucs.indiana.edu>
gilbertd at sunflower.bio.indiana.edu (Don Gilbert) writes:
> AUTHORS van Steeg,H., van Oostrom,C.T., Martens,J.W., van Kreyl,C.F.,
> Schepens,J. and Wieringa,B.
>> ^^^ Are not these the ones who criticisms about faulty annotations
> should be directed?
No, the problem still lies squarely with the database, to educate the authors.
As I've said before, there needs to be a Definition of GenBank document. Such
a document would lay down the philosophy of the database. This includes what
data items are to be stored, how they are to be stored, and THE REASONING
BEHIND THESE DECISIONS. The features table was a good start in this direction,
but it did not go anywhere near far enough. IF such a document existed, then
these authors would be able to put the data into the database directly. If it
does not exist, they will do whatever they want. So the problem lies with the
database staff to create documentation that gets the scientists to enter the
data in a consistent and complete way.
I do not imagine that the Definition of GenBank would be fixed in stone, because
we are still learning biology. But it would be better than having no definition.
ASN.1 is like a BNF: it makes sure you can get at the data, but does not make
sure that you have stored the data sensibly. For THAT one must have a
philosophy.
As one person who just emailed to me pointed out, updating the entire database
is an enormous task. The only way it can be done is for people who work with
sequences to do it. That means a much greater participation in the database
than we have ever seen before. But these people need guidance. They do not
know the issues at stake because they think about one small sequence and don't
see that every object in the database has to have a name and a type or it will
be impossible to extract the data later. Actually, its impossible now!
The longer we wait the worse it will become.
Tom "Cassandra" Schneider
National Cancer Institute
Laboratory of Mathematical Biology
Frederick, Maryland 21702-1201
toms at ncifcrf.gov