In article <1991Jul1.185311.8785 at jax.org> mrk at jax.org (Michael Kosowsky) writes:
>>How do GENBANK and NCBI's GENINFO symbolize uncertain base pairs?
>>I've so far learned of three incompatible systems.
>For example, to represent "A or G", Microgenie
>uses 'P', REBASE use 'R', and DNA Inspector uses something
>like '(A/G)'.
>>I naively hope to get away with implementing just one.
>>>-- Michael
Everybody should use the IUB nomenclature now (shouldn't they?).
This is summarized here from the GCG software manual:
GCG uses the letter codes for amino acid codes and nucleotide
ambiguity proposed by IUB (Nomenclature Committee, 1985,
Eur. J. Biochem. 150; 1-5). These codes are compatible with the codes
used by the EMBL, GenBank, and PIR data libraries.
NUCLEOTIDES
The meaning of each symbol, its complement, and the Cambridge
equivalents are shown below. Cambridge files can be converted into GCG
files and vice versa with the programs FROMSTADEN and TOSTADEN.
IUB/GCG Meaning Complement Staden/Sanger
A A T A
C C G C
G G C G
T/U T A T
M A or C K 5
R A or G Y R
W A or T W 7
S C or G S 8
Y C or T R Y
K G or T M 6
V A or C or G B not supported
H A or C or T D not supported
D A or G or T H not supported
B C or G or T V not supported
X/N G or A or T or C X -/X
. not G or A or T or C . not supported
--
Don Gilbert gilbert at bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405