What CDS means ?

Peter Rice pmr at sanger.ac.uk
Tue Apr 11 03:44:44 EST 1995

In article <1995Apr11.024746.18275 at nlm.nih.gov> francis at borduas.nlm.nih.gov (Francis Ouellette) writes:
>   BMPOUNY at weizmann.weizmann.ac.il wrote:
>   >           I have to know for sure what is the meaning of CDS.
>   >      For example: after doing "fetch" for one gene sequence,
>   >      I get in one of the lines :
>   >       CDS   283.   .1695
>   >     What does that mean ?
>   CDS is CoDing Sequence and the numbers refer to the interval
>   on a nucleotide sequence which refer to the the coordinates
>   which encode the protein.  This is usually expressed as:
>   CDS   283..1695
>   in a GenBank flatfile. (no space between the "..")

And those spaces? Well, they come because when GCG extracts the sequences
the software puts a line ending in ".." just before the sequence starts, 
and ".." is not allowed *anywhere* else. Even though this was well known
when the feature table was redesigned, for some reason the ".." still
got used.

Curiously though, GCG database reformatting insists on changing ".." to ". ."
(with a single space) when writing the database files. This is not necessary,
and makes GCG formatted entries look funny in, for example, SRS. When GCG
software extracts a database entry, it checks again for changing ".." to
". ." anyway. This will "always" be the case - the PIR database also has
".." around and those files get taken 'as is' when they are reformatted.

So I have modified GCG's EMBLTOGCG locally to keep the ".." with no
problems and definite benefit for SRS.

I also changed the DE line reformatting to put the species last instead of
first. Much nicer when looking at the output of a database search. GCG only
did it, I think, for the abominable STRINGSEARCH program (RIP).
