Hello fellow Bio-Netter's -
After Bill Pearson sent his program mrtrans in response to a current discussion
regarding obtaining aligned DNA sequences from a peptide alignment in Message
Id <9306241435.AA28319 at net.bio.net>:
>Here is a little program that I call "mrtrans" that will do what you
>want. It expects the aligned protein and cDNA sequences to be in "fasta" ....
>This is a unix "shar" file. It is also available for anonymous ftp at
>virginia.edu in pub/fasta/mrtrans.shar.
I had to immediately try it out. I am pleased to report that it compiled,
linked and ran just fine on our VAX/VMS system here. I wrote a very primitive
DCL .com file to help run it and I include that here, in case anybody would
like it:
$! This command file runs the program mrtrans from Bill Pearson which
$! takes as input files an amino acid sequence alignment and the corresponding
$! cDNA sequences file, both in FASTA format, and outputs the corresponding
$! aligned cDNA sequences. Gaps must be represented by `-' not `.' characters!
$!
$ Type/Page MRTrans.txt
$ MRTrans := $ Disk03:[Thompson.Com.MRTrans]MRTrans
$ Inquire OutFile "enter your desired output file name"
$ Define/User_Mode SYS$INPUT SYS$COMMAND
$ Define/User_Mode SYS$OUTPUT 'OutFile'
$ MRTrans
$ Exit
!!with MRTrans.txt merely being his reformatted Unix Manual page included below:
=============================start of MRTrans.txt==============================
mrtrans - produce align cDNA sequences from aligned protein sequences
protein-sequence-library + cDNA-sequence-library ===> aligned-cDNA-sequences
mrtrans is a simple program that allows you to produce
aligned cDNA sequences from aligned protein sequences.
This can be very useful for phylogeny programs, e.g. in
PHYLIP (dnadist, dnapars, dnaml, etc.). In general, it
is better to use protein sequences for multiple
alignments, but to use DNA sequences for phylogeny.
This can be time consuming when there are gaps in the
aligned protein sequences.
mrtrans takes a protein sequence library and a DNA
sequence library. It reads the first protein sequence
and the first DNA sequence, translates the DNA sequence
in each of the three frames, compares the protein
sequence to the translated DNA sequence to find the
protein coding region, and then writes out the DNA
sequence that encoded the protein. Both libraries should
be in Pearson/FASTA format. The sequences must be in the
same order in both libraries. The protein library may
include '-' characters to specify alignments. Each '-'
character in the protein library is ignored during the
sequence comparison but replaced by '---' in the DNA
sequence output.
mrtrans finds the coding regions for contiguous
sequences only. It will not splice together different
exons to produce a coding sequence.
Bill Pearson
wrp at virginia.EDU
==============================end of MRTrans.txt===============================
Bill! Steve
Steven M. Thompson
Consultant in Molecular Genetics and Sequence Analysis
VADMS (Visualization, Analysis & Design in the Molecular Sciences) Laboratory
Washington State University, Pullman, WA 99164-1224, USA
AT&Tnet: (509) 335-0533 or 335-3179 FAX: (509) 335-0540
BITnet: THOMPSON at WSUVMS1 or STEVET at WSUVM1
INTERnet: THOMPSON at wsuvms1.csc.wsu.edu