IUBio

EGCG Sequence Analysis Programs

Peter Rice rice at embl-heidelberg.de
Sun Nov 14 13:37:07 EST 1993


Attention fellow GCG users:

ANNOUNCEMENT

  EGCG is a package of 65 programs which extend the programs in the current
  GCG package and add many entirely new functions.

  Many of the programs were previously available as "GCGEMBL" in the
  UNSUPPORTED.BCK saveset on the GCG distribution for VMS. As the programs
  are no longer entirely written at EMBL, we have changed the name of the
  package. The "E" stands for "Extended" GCG.

  Porting of the EGCG programs from VAX/VMS to ALPHA OpenVMS was supported by
  DEC under a University Porting Agreement.

  The EGCG programs can be installed as a single copy on a mixed VAX/ALPHA
  cluster. Separate directories are used for the system-specific files
  (object files and image files).

DESCRIPTION

  (1) 26 new programs in this VMS release

      (* indicates programs included in the Unix release distributed by GCG
      on CD-ROM with GCG version 7.2)
	
	AllTrans : Translates a set of aligned DNA sequences into aligned
	    protein sequences.

*	BasePairPlot : Plots the % occurrence and obs/expected frequency of
	    any dinucleotide pair i a sequence

	BFastA : A version of FastA using the BLOSUM62 matrix.

	BTFastA : A version of TFastA using the BLOSUM62 matrix.

*	DbStats : Reports database statistics.

*	GelFigure : Produces a graphical report of a contig in a Fragment
             Assembly project, including restriction map, open reading
             frames and fragment alignment.

*	KabatToGcg : Converts the KABAT database to GCG format.

	MapSelect : Selects restriction enzymes by name or by ability to cut
             a specified sequence, and creates a new ENZYME.DAT file for
	     use by other programs.

*	Melt : Calculates the melting temperature and %GC of a sequence.

*	MeltPlot : Plots the melting curve for a nucleic acid sequence.

	NewFeatures : An interactive editor for entering and modifying the
             feature table, and for minor editing of the sequence itself.
	     Also able to understand most feature table syntax, including
	     joins across entries and additional qualifiers.

	NoReturn : Removes trailing carriage returns and line feeds.

*	Palindrome : Searches for perfect inverted repeats.

	PepAllWindow : Plots hydrophobicity for one or more multiple
	    sequence alignments.

	PepCoil : Identifies potential coiled-coil regions in proteins.

*	PlotAlign : Plots conserved properties at each position in a multiple
	    sequence alignment.

*	SeqDbToGcg : Converts the SEQDB database to GCG format.

	ToEmbl : Extracts an EMBL entry in EMBL format.
	ToGenBank : Extracts a GenBank entry in GenBank format.

	ToPirAll : Converts a set of sequences of subsequences into a single
	    file in PIR format.

*	ToText : Converts a sequence to plain text.

	TProfileGap : ProfileGap with optional 6-frame translation of a
	    DNA sequence.

	TProfileSearch : ProfileSearch with ability to search any size of
	    database, and optional 6-frame translation of DNA databases. [1,2]

	TProfileSegments : [1,2] Processes the output file from TProfileSearch.

	TSegments : Processes TWordSearch output.

	TWordSearch : WordSearch with a 6-frame translation of the database.

  (2) 19 GCG programs with command line control

	Command line control has been added to all the GCG programs that did
	not provide full support. This work was done by summer student
	Jaakko Hattula from Tampere University of Technology in Finland.

	The programs (see the GCG manual for details) are:
	
	EAssemble, ECodonFrequency, ECompTable, EConsensus, ECorrespond,
	ECrypt, EDiverge, EExtractPeptide, EFingerPrint, EFromStaden,  
	EGetSeq, EPublish, ERepeat, EReverse, EStatPlot, ETerminator,  
	EToStaden, ETranslate, EWindow      

  (3) 20 programs from the original GCGEMBL package, many now enhanced:
	
	Antigenic : Reports potential antigenic regions.

	CheckLen : Calculates checksums for entries in a database.

	CheckLenComp : Compares CheckLen output for two databases, and reports
	    a list of unique entries.

	CpGPlot : Plots the frequency of occurrence of CG dinucleotides
           and percentage of C and G in a sequence.

	FastACheck : Selects significant alignments from (T)Fasta output files.

	GbOnly : Creates a list of GenBank entries that have accession numbers
	    not found in the latest EMBL release.

	GelAnalyze : Reads the output of GelStatus, and produces project
            statistics for shotgun sequencing.

	GelPicture : Displays a diagram and printout of a contig from a
            Fragment Assembly project, with ambiguities highlighted.

	GelStatus : Reports progress of a Fragment Assembly project.

	HelixTurnHelix : Predicts helix-turn-helix DNA binding domains.

	NewQuickIndex : A much faster version of QuickIndex that produces the
	    index files for NewQuickSearch.

	NewQuickSearch : A much faster version of QuickSearch that can run
	    on almost all systems without a major virtual memory overhead.

	PepNet : Displays part of a protein as a helical net.

	PepStats : Gives a short statistical summary on the composition
	    of a protein sequence, or a 3-franme translation.

	PepWheel : Displays part of a protein as a helical wheel.

	PepWindow : Plots hydrophobicity of a protein sequence.

	PirOnly : Selects entries from PIR that are not in the latest
	    Swiss-Prot release.

	PrettyPlot : Displays multiple sequence alignments with boxes around
	    conserved regions.

	QuickMatch : Displays the overlaps found by NewQuickSearch (or by
	    QuickSearch), with selection for good quality matches.

	SigCleave : Predicts signal peptide cleavage sites.

DISTRIBUTION

  The programs are available from EMBL as follows:

  (1) by anonymous FTP (binary mode) to ftp.embl-heidelberg.de

	directory: /pub/software/vax/egcg

       Files:

       ecore.bck     : command procedures and data
       edoc.bck      : documentation source
       ehelp.bck     : help files
       esource.bck   : full source code
       000readme.txt : installation advice
       fixrec.c      : utility to make .bck files readable
       fixrec.com    : utility to make .bck files readable
       whats_new.lis : release notes (empty initially)
       
	EGCG is stored in 4 VMS backup savesets. The 000README.TXT file
	explains how to fix the file format of these savasets.

       Additional files:

       fixrec.c    C source to produce fixrec.exe which fixes
                   the record length of the save set if you have
  	           any problems.

       fixrec.com  DCL script that does the same as the above.  


  (2) by E-mail from the EMBL Network File Server

	Send E-mail to address NETSERV at EMBL-Heidelberg.DE with the
	following message text:

	HELP SOFTWARE
	HELP VAX_SOFTWARE

	GET VAX_SOFTWARE:EGCG.UAA

      EGCG is provided in (at present) 46 separate files. They are unpacked
      with UUDECODE and ZOO (the VAX_SOFTWARE file explains how) to give
      the following files:

       ecore.bck     : command procedures and data
       edoc.bck      : documentation source
       ehelp.bck     : help files
       esource.bck   : full source code
       000readme.txt : installation advice
       fixrec.c      : utility to make .bck files readable
       fixrec.com    : utility to make .bck files readable
       whats_new.lis : release notes (empty initially)
       
	EGCG is stored in 4 VMS backup savesets. The 000README.TXT file
	explains how to fix the file format of these savasets.

       Additional files:

       file.exe    utility to fix the record format of VMS BACKUP savesets
		   restored by ZOO

       file.zoo    ZOO archive of the original distribution of FILE

       fixbck.com  DCL procedure to run FILE

REFERENCES

   [1] Gibson, TJ, et al. (1993) TIBS 18:331-333
   [2] Musacchio, A, et al. (1993) TIBS 18:343-348


ACKNOWLEDGEMENTS

  Version  7.2  of  the  EGCG  Programs was  prepared by Peter Rice  (EMBL,
  Heidelberg, Germany), Rodrigo Lopez  (Biotechnology Centre of Oslo, 
  Norway), Jaakko Hattula (Tampere University of Technology, Finland),
  Reinhard  Doelz (Basel, Switzerland)  and Jack Leunissen (CAOS/CAMM Centre,
  Netherlands).

  We are very grateful to (in alphabetical order) Rein Aasland, Wilhelm
  Ansorge, Peer Bork, Thure Etzold, Toby Gibson, Tom Kristensen, Franc 
  Pattus, Kate Rice, Christian Schwager, Peter Sibbald, Julie Thompson,
  Hartmut Voss and Gert Vriend for their many contributions and critical
  comments as users of the EGCG Programs.

  We are also deeply indebted to the staff of GCG Inc. who provided rapid
  and helpful answers to our many questions during the development of 
  the programs. Many thanks to Irv Edelman, Maggie Smith, Donald Katz, Michael
  Hogan, Joseph King, Mary Schulz and especially John Devereux.


CONTACTS

  Peter Rice      Peter.Rice at EMBL-Heidelberg.de   Tel: +49 6221-387247
  Rodrigo Lopez   rodrigol at biotek.uio.no          Tel: +47 22958756

 -----------------------------------------------------------------------------
 Peter Rice, EMBL                             | Post: Computer Group
                                              |       European Molecular
 Internet:    Peter.Rice at EMBL-Heidelberg.DE   |            Biology Laboratory
                                              |       Postfach 10-2209
 Phone:   +49-6221-387247                     |       69012 Heidelberg
 Fax:     +49-6221-387306                     |       Germany



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net