Base pair encoding

Mon Jul 1 18:20:30 EST 1991

In article <1991Jul1.185311.8785 at jax.org> mrk at jax.org (Michael Kosowsky) writes:
>How do GENBANK and NCBI's GENINFO symbolize uncertain base pairs?
>I've so far learned of three incompatible systems.
>For example, to represent "A or G", Microgenie
>uses 'P', REBASE use 'R', and DNA Inspector uses something
>like '(A/G)'.
>I naively hope to get away with implementing just one.
>-- Michael
There is a well established international standard for representing
ambiguities, adopted by the Nomenclature Committee of the International
Union of Biochemistry (Cornish-Bowden, A., Nucl. Acids Res. 13, 3021-3030 
(1985). The symbols are used as follows:

     Symbol         Meaning              | Symbol         Meaning
     G              Guanine              | K              G or T
     A              Adenine              | S              G or C
     C              Cytosine             | W              A or T
     T              Thymine              | H              A or C or T
     U              Uracil               | B              G or T or C
     R              Purine (A or G)      | V              G or C or A
     Y              Pyrimidine (C or T)  | D              G or T or A
     M              A or C               | N              G or A or T or C

This standard is followed by GenBank, but I would assume that NCBI does so
as well.
The Microgenie use of P for purine probably goes back to its original
incarnation in the dim past as the 'Korn/Queen' program, before the standard
was adopted. However, that's no excuse for not keeping up with the times.
Fortunately for you, there is now a well-accepted standard. If you want to
do it right, stick with the standard. As for those software manufacturers
who wish to complicate things by refusing to make the trivial changes
necessary to comply with internationally agreed-upon standards, well,
that's their problem. 

