Base pair encoding

Will Fischer wmf at LARIAT.LANL.GOV
Mon Jul 1 17:28:58 EST 1991

Michael Kosowsky asks:

>>How do GENBANK and NCBI's GENINFO symbolize uncertain base pairs?

>>I've so far learned of three incompatible systems.
>>(Stuff deleted)
>>I naively hope to get away with implementing just one.

The standard code was defined in Cornish-Bowden,A. (1985) Nucl Acid Res 13,
3021-3030.  GenBank uses it, as should all right-thinking sequence programs.

I.  Ambiguous assignments are represented as follows:

symbol     meaning
a          a
g          g
c          c
t          t
r          a or g
y          c or t
m          a or c
k          g or t
s          c or g
w          a or t
h          a or c or t
b          c or g or t
v          a or c or g
d          a or g or t
n          a or c or g or t

II.  Base complementary relationships are as follows:

symbol     complement
a          t
b          v
c          g
d          h
g          c
h          d
k          m
m          k
s          s
t          a
v          b
w          w
n          n

Hope this helps.

-- Will Fischer

   (Working at GenBank, but speaking for myself)

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net