Translation of protein sequence to nucleotide sequence

James Bonfield jkb at mrc-lmb.cam.ac.uk
Thu Aug 13 03:26:01 EST 1998

In article <35CEA9D3.AE38C855 at imcb.nus.edu.sg> mcbaet at MCBSGS1.IMCB.NUS.EDU.SG (Anthony Ting) writes:
>    Is anyone aware of a program that translate a protein sequence into
>a nucleotide sequence given a specific codon usage table?

The "pip" program (also xpip) in the Staden Package can manage
this. It is option "27 = Back translate to dna". Be sure to type "d27"
rather than just "27", for the addition dialogue questions, or when
using xpip have the "execute with dialogue" button enabled. Eg:

 ? Menu or option number=m1

 General menu
  0 = List of menus
  3 = Read a new sequence
  4 = Redefine active region
  5 = List a sequence
  6 = List a text file
  7 = Direct output to disk
  8 = Write active region to disk
  9 = Edit the sequence
 17 = Short sequence search
 18 = Compare a sequence
 19 = Compare a sequence using a score matrix
 27 = Back translate to dna
 ? Menu or option number=d27
 Back translate to dna
 ? No codon preference (y/n) (y) = n
 ? Codon table file name=

I've included a copy of the help for the dialogue in this article.
Please also see the staden package web pages for more details (in my .sig).



  Help on 'Back translate to dna' (option 27)

        This routine back translates protein sequences into DNA  using
  the  standard  genetic  code. The level of redundancy can be plotted
  and the backtranslation saved to a file.

        The translation can use either the IUB symbols shown below, or
  a  set  of codon preferences. If a set of codon preferences are used
  they must conform to the format of  codon  tables  produced  by  the
  nucleotide  analysis  program, and the back translation will contain
  the favoured codons. If there is no favoured codon the  IUB  symbols
  will  be  employed. The window length for plotting the redundancy is
  in codons.

        The program will plot the redundancy along  the  sequence  and
  hence can be used to find the best sequences to use as primers. Note
  that the program plots the inverse, and so the higher the  plot  the
  LESS  redundant the sequence. For primers look for peaks rather than

        The DNA sequence can be saved to a file and analysed using the
  nucleotide  analysis  program.   Depending  on the application it is
  often useful to produce a back translation using  both  a  table  of
  codon preferences and one using the IUB symbols. This is because the
  restriction enzyme search program can distinguish  between  definite
  and  possible  cuts  in  the  sequence.   These matches are what the
  program  terms  "definite  matches"  and  are  ones  in  which   the
  specification  of  the  recognition  sequence corresponds exactly to
  that of the back translation. The program will  also  find  what  it
  terms   "possible  matches"  which  are  ones  that  depend  on  the
  particular codons chosen for each amino acid.  These  are  sites  at
  which  recognition sequences could be engineered to produce a cut in
  the  DNA  without  changing  the  amino  acid,  but  which  are  not
  necessarily found in the original sequence.

              NC-IUB SYMBOLS

        R        (A,R)        'puRine'
        Y        (T,C)        'pYrimidine'
        W        (A,T)        'Weak'
        S        (C,G)        'Strong'
        M        (A,C)        'aMino'
        K        (G,T)        'Keto'
        H        (A,T,C)      'not G'
        B        (G,C,T)      'not A'
        V        (G,A,C)      'not T'
        D        (G,A,T)      'not C'
        N        (G,A,C,T)    'aNy'

James Bonfield (jkb at mrc-lmb.cam.ac.uk)   Tel: 01223 402499   Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net