IUBio

Bruno Kieffer's GCG secondary alignment question.

beck at mpimg-berlin-dahlem.mpg.de beck at mpimg-berlin-dahlem.mpg.de
Thu Jan 13 11:06:37 EST 1994


Hello netters, 

I miss orginal message from Bruno Kieffer, but I saw  the answer from 
Steven M. Thompson:

>In Message-Id: <9401120921.AA19753 at net.bio.net> Bruno Kieffer 
>(kieffer at bali.u-strasbg.fr)
>
>asks:
>
>>	Does anybody knows a program able to read a multi-alignment
>>file (msf) from the UWGC command PILEUP, and align secondary structure
>>predictions obtained for each sequence ?
>
>... deletet ...
                              
some year ago I wrote a programm called Predict_multi which makes secondary 
structure predictions for aligned peptide sequences (e.g. output from Lineup)
according to the Chou-Fasman method. All secondary structur predictions and 
an average are displayed above each other. You can get the output as a
alphanumeric printout (see below) or as a graphic on a gcg-plotting device.

I send here the program description, if someone has interrest, I can send
the sources or put them on a FTP-server (which?).

Alfred 


============================================================

Predict_Multi

FUNCTION

Predict_Multi makes secondary structure predictions for aligned peptide
sequences according to the Chou-Fasman method. All secondary structur
predictions and an average are displayed above each other.

DESCRIPTION

Predict_Multi makes secondary structure predictions according to the
Chou-Fasman method of an aligned set of amino acid sequences. The results for
each sequence and for the consensus are written into a file and furthermost
graphicaly displayed. Predict_Multi uses the original method of Chou and Fasman
(Adv. in Enzymol., 47; 45-148) to predict helices, sheets, and turns. It
resolves overlapping regions of helix and beta-sheet with the "overall
probability" procedure introduced by K. Nishikawa (Biochimica et Biophysica
Acta, 748; 285-299). This same procedure also locates turns that are not in
conflict with other secondary structures. The Chou-Fasman rules are slightly
modified as follows: Helix: The condition that p(bound) be greater than 1.0 and
that p(alpha) be greater than p(beta) are not used. Sheet: A minimum length of
5 residues is required.

EXAMPLE

In this example Predict_Multi calculates the secondary structure of some
DNA-binding proteins:

$ Predict_Multi

  Predict_Multi makes secondary structure predictions for aligned
  peptide sequences according to the Chou-Fasman method. All secondary
  structur predictions and an average are displayed above each other.

              PREDICT_MULTI for what LINEUP-file ?  @DN2.FIL

            Do you like to print the total output (* yes *) ?
            Prediction of ANADN2.FRG
            Prediction of RMEDN2.FRG
            Prediction of CPADN2.FRG
            Prediction of BSTBSB.FRG
            Prediction of BSTCSB.FRG
            Prediction of ECNS2.FRG
            Prediction of ECNS1.FRG

            What should I call the output file (* Dn2.Cfas *) ?

            The plot is now being sent to "PREDICT_MULTI.FIG".

$

OUTPUT

Here is some of the output file:


Secondary Structure according to Chou-Fasman of @dn2.fil

                    Date: September 30, 1988  16:04


 Schwellwert : 0.50     Faktor: 1.00

                                 20                  40                  60
                                  |                   |                   |
     Ecns1     --TTaaaaaaaaaaaaaaaaaaaaaaBBBBBBAAAAAA-ttBBBBBB-AAAAAAAAAAtt...
               mnksqlidkiaagadiskaaagraldaiiasvteslkegddvalvgfgtfavkeraartg...
                 +    -+     -  +    +  -       -  +- --           +-+  +  ...

     Ecns2     ---BBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBB----aaaaaaaa-tt...
               mnktqlidviaekaelsktqakaalestlaaiteslkegdavqlvgfgtfkvnhraertg...
                 +    -   -+ -  +   +   -       -  +- -          +  ++ -+  ...

    Bstcsb     AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBB---AAAAAAAAAAA-...
               mnkaelitsmaekskltkkdaelalkaliesveealekgekvqlvgfgtfetreraareg...
                 + -      -+ +  ++- -   +   -  --  -+ -+         - +-+  +- ...

    Bstbsb     --AAAAAAAAAAtTTaaaaaaaaaaaaaaaaaaaaaaattBBBBB----AAAAAAAAAtt...
               mnktelinavaetsglskkdatkavdavfdsitealrkgdkvqligfgnfevreraarkg...
                 + -      -     ++-  +  -   -   -  ++ -+         - +-+  ++ ...

    Cpadn2     AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBB---AAAAAAAAAAA-...
               mnkaelitsmaekskltkkdaelalkaliesveealekgekvqlvgfgtfetreraareg...
                 + -      -+ +  ++- -   +   -  --  -+ -+         - +-+  +- ...

    Rmedn2     --TTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaaatTTtBBBBBB----------tTT...
               mnknelvaavadkaglskadassavdavfetiqgelknggdirlvgfgnfsvsrreaskg...
                 + -      -+    + -     -   -    - +   - +          ++-  + ...

    Anadn2     --TTAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBtTTtBBBBB-ttAAAAAAAAAAA-...
               mnkgelvdavaekasvtkkqadavltaaletiieavsrgdkvtlvgfgsfesrerkareg...
                 + -  -   -+    ++  -       -   -   + -+         - +-++ +- ...


   Average     ----AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA-TBBBBB---AAAAAAAAAATT...

RELATED PROGRAMS

PeptideStructure calculates the secondary structure of a peptide and writes a
file suitable for input to PlotStructure. PlotStructure plots the results from
PeptideStructure. PepPlot plots parallel curves of the standard measures of
protein secondary structure.

RESTRICTIONS

Predict_Multi requires a file of filenames with aligned peptide sequences.
Maximal 16 sequences can predicted in one run. The B (Asx) and Z (Glx)
characters (Appendix III) are not yet supported for this program

ALGORITHM

See the papers cited in the description section. Do not attempt to interpret
protein secondary structure predictions without reading the Robson-Garnier
paper. For calculating the consensus each strong prediction is weigthed with
1.0 and each weak is weigthed with the "Faktor". The sum is calculated in each
position for helices, turn, beta-sheet and coil. If in this four sums one and
only one maximum exist and if this maximum devided by the number of sequences
is greater then the threshold ("Schwellwert") the average is set to the
coresponding element. In all other cases the consensus is set to coil
(unpredicted).

CONSIDERATIONS

You should realize that measures of protein secondary structure are only weakly
correlated with actual structures. The Chou-Fasman method was designed to apply
to soluble (globular) proteins.

DEVICES REQUIRED
                               

Before you run a program with graphics output,
the GCG Package must know the language, the device, and the port or queue to
which that device is connected. These configuration parameters are set in
advance with the $ SetPlot command, or with commands like $ PostScript that
correspond to the different graphics languages the GCG Package supports. See
the Graphics section of the User's Guide for information about configuring your
process for graphics.

CTRL-C

If you need to stop this program, use Ctrl-C to reset your terminal and session
as gracefully as possible. The graphics device stops plotting the current page
and starts plotting the next. If the current page is the last page, plotters
should put the pen away and graphic terminals should return to interactive
mode.

COMMAND LINE SUMMARY

All parameters for this program may be put on the command line. Use the option
/CHEck to see the summary below and to have a chance to modify the command line
before the program executes. In the summary below, the capitalized letters in
the qualifier names are the letters that you must type in to run the command.
Square brackets ([ and ]) enclose qualifiers or parameter values that are
optional. The Command Line Control section of the User's Guide describes how to
use command lines effectively.

Syntax: $ PREDICT__MULTI [/INfile=]@WildSeqName.fil /Default

Required Parameters:

[/OUTfile=]WildSeqName.CFas     output file name

Local Data Files: None

Optional Parameters:

/FAKtor=1. 
/SCHWellwert=0.5 
/PLOT             (def.) 
/NOPLOT 
/CONsensus        (def.)
/NOCONsensus 
/SECconsensus=SeqName.Sec  (input-file with secondary structure)

LOCAL DATA FILES

None.

OPTIONAL PARAMETERS

You can set the parameters and switches listed below from the command line.
Optional parameters available to all programs are described in the Command Line
Control section of the User's Guide.

/SCHWellwert=0.5  

see algorithm.

/FAKtor=1.0       

see algorithm.

/PLOT             

this is default, enabel the plotting.

/NOPLOT

suppresses the plotting.

/CONsensus

this is default.

/NOCONsensus

suppresses the consensus.

/SECconsensus=FileName.con

displays the secondary structure given in the file as consensus. These options
apply to all GCG graphics programs. These and many others are described in
detail in the Graphics section of the User's Guide.

/FIGure=ProgramName.Figure

writes the plot as a text file of plotting instructions suitable for input to
the Figure program instead of drawing the plot on your plotter.

/FONT=3

draws all text characters on the plot using font 3 (see Appendix I).

/COLor=1

draws the entire plot with the pen in stall 1. These options let you expand or
reduce the plot (zoom), move it in either direction (pan), or rotate it 90
degrees (rotate).

/SCAle=1.2

expands the plot by 20 percent by resetting the scaling factor (normally 1.0)
to 1.2 (zoom in). You can expand the axes independently with /XSCAle and
/YSCAle. Numbers less than 1.0 contract the plot (zoom out).

/XPAN=30.0

moves the plot to the right by 30 platen units (pan right).

/YPAN=30.0

moves the plot up by 30 platen units (pan up).

/PORtrait

rotates the plot 90 degrees. Usually, plots are displayed with the horizontal
axis longer than the vertical (landscape). Note that plots are reduced or
enlarged, depending on the platen size, to fill the page.
                               

 ==============================================================================

 Dr. Alfred Beck
 Max-Plank-Institut fuer Molekulare Genetik             Telefon: (030) 8413-226
 Ihnestrasse 73                                         FAX:     (030) 8413-365
 D-14195 Berlin (Dahlem)                EAN-Mail: Beck at MPIMG-BERLIN.MPG.D400.DE
 Germany                         Internet-Mail: Beck at MPIMG-BERLIN-DAHLEM.MPG.DE



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net