IUBio

Molecular Weight Calculator for Proteins

Ivan Torshin TIY at phys.chem.msu.ru
Sat Jun 19 12:48:39 EST 1999


> these values. But since I don't know any programming language this is no
> option for me.

Hopefully, it is not the last but rather the first task. So, it is better
to learn a simple language, Basic, for example, as the "basic" programming 
language.

> That's not a lot of work but the problem is something else: I usually
> have the amino acid sequence in its one-letter code. And changing all
> the one-letter code to the three-letter code is a lot of work in case of
> a large protein.

As the first task it would be easier to write one-three letter sequence 
converting program. Although the last time I was writing in Basic  a number 
of years ago, it would look something like this (IBM PC standard):

1  NAA = 20: REM number of amino acids
2  aanum = 1: rem variable for amino acid number in table
10 DIM AA3lett$(NAA): DIM AA1lett$(NAA): REM arrays for the conversion table
20 AA3lett$(1) = "GLY": AA1lett$(1) = "G": REM continue, please...
30 REM ... and compose the table for other amino acids
100 OPEN "1a05aa.seq" FOR INPUT AS #1: REM this opens file with sequence
105 OPEN "output." FOR OUTPUT AS #2: REM output file
110 LINE INPUT #1, seqid$: REM 1st string of the file is ">[sequence ID]"
120 PRINT "Sequence:", seqid$
130 LINE INPUT #1, s$: REM reading 1st string of the sequence
140 FOR i = 1 TO LEN(s$): REM finding 3lett code for each letter in the string
150 char$ = MID$(s$, i, i):rem get current character in the string
160 FOR j = 1 TO NAA: IF char$ = AA1lett$(j) THEN aanum = j: GOTO 170:
165 NEXT j
170 IF char$ = " " GOTO 190: REM next letter
180 PRINT #2, AA3lett$(aanum), " "
190 NEXT i
200 LINE INPUT #1, s$
210 IF NOT EOF(1) GOTO 140
220 CLOSE #1, #2

File is to be supplied in the FASTA format and placed in one directory
with Basic interpreter/compiler. Output will look like:

ALA
GLY
...

and possibly may be read by the program you are using.

The example above has a number of programming drawbacks and may be written
at least two times smaller. However it seems to be simple to understand.
Of course, in Pascal or C it would look more compact.

> I can well imagine that it would be easy to create a program calculating
> the MW of a protein based on its AA sequence in one-letter code. Just
> tell the program what numerical values the different letters stand for
> and tell it to subtract the MW of (n-1) H2O molecules from the sum of
> Does anybody know about such a program? Is it available from the
> Internet?
> 
Try writing yourself after purchasing a book on Basic. Unfortunately,
could not recommend any, try to search at amazon.com.

Hope this helps,
Ivan Torshin.
---




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net