In article <49893v$hs9 at rc1.vub.ac.be> rherzog at ben.vub.ac.be (Robert Herzog) writes:
> I have been extracting several hundreds of CDS from various sequences lately, i
> and found a funny extraction from the MIDMM1 file (from Drosophila) :
>> the second CDS srs extracts is as follows:
>> MIDMM1 Length: 1536 Check: 5450 ..
>> 1 @M at CGACAAT GATTATTTTC TACAAATCAT AAAGATATTG GAACTTTATA TTTTATTTTT
> 61 GGAGCTTGAG CTGGAATAGT TGGAACATCT TTAAGAATTT TAATTCGAGC TGAATTAGGA
> 121 CATCCTGGAG CATTAATTGG AGATGATCAA ATTTATAATG TAATTGTAAC TGCACATGCT
> 181 TTTATTATAA TTTTTTTTAT GGTTATACCT ATTATAATTG GTGGATTTGG AAATTGATTA
>> two or three unusual bases at the start of this one...!
Probably connected with trying to get a translation of it, bacause the
feature table entry is:
FT CDS 1071..2606
FT /note="NCBI gi: 903727" /codon_start=1
FT /transl_except=(pos:1071..1073,aa:Met)
FT /transl_table=5 /product="cytochrome c oxidase I"
Translation exception handling is in function SlbDoTranslExcept in seqlib.c,
and puts "@X@" into the codon where "X" is the correct amino acid
code.
Of course, this should not happen for display of the sequence, only for
translation. Seems to me that getz does not do translations anyway.
certainly this code should be executed only if needed.
I am not at all surprised LookUp is different - it seems to be a
rather dated SRS version. I am still looking for signs of what GCG
added (rather than the many things left out) in LookUp.
--
------------------------------------------------------------------------
Peter Rice | Informatics Division
E-mail: pmr at sanger.ac.uk | The Sanger Centre
Tel: (44) 1223 494967 | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919 | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/ | England