IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

potyvirus sequence database

RYBICKI, ED ED at micro.uct.ac.za
Thu Apr 7 07:43:23 EST 1994

Dear Colleagues:

I enclose a release from Phil Berger and myself which has already gone 
out in various forms to a number of workers in the field, and will 
hopefully be formally published sometime soon.  We would like to hear 
as soon as possible if anyone has any difficulty accessing either 
database, so that we can improve access if necessary.  It goes without 
saying that anyone who would like us to do sequence comparisons for 
them has only to email them to us, and ask...!  We welcome any 

		The Potyvirus Sequence Database

The potyvirus sequence database (PSD) is an attempt to provide workers 
in the field with a comprehensive collection of potyvirus nucleotide 
and amino acid sequences extant.  In this way it is hoped that 
researchers will be able to obtain information easily and 
effectively.  The database will be updated, hopefully, as new 
sequences become available.

	How to access and use the database:

The database is located on the University of Idaho Gopher and FTP 
servers, and the University of Cape Town Gopher and FTP servers.  
Anyone with access to theInternet can access the PSD.  The easiest 
method is to simply use PC Gopher / WinGopher for PC compatible 
computers or TurboGopher for Macintosh computers.  Gopher access in 
Idaho is via gopher.uidaho.edu, 1/ UI Gopher Services/Library: 
Electronic Publications/Potyvirus Databases; at the University of Cape 
Town it is via gopher.uct.ac.za, University of Cape Town Campus
Information/Microbiology/Potyvirus Sequence Data.  Both use server 
port 70.  For those who want to do it the hard way, use anonymous FTP 
to crow.csrv.uidaho.edu ( or ucthpx.uct.ac.za 
( and go to /pub/data wherein you will find a directory 
called potyvirus, which will contain the individual sequence files.  
You can, in either case, FTP data files from the server to your own 
computer or simply list the file and copy and paste.  Specifically how 
this is done will depend on the kind of computer you are using and the 
specific hardware.

Should you encounter any problems accessing or using the database, 
please don't hesitate to e-mail us at:

	pberger at marvin.ag.uidaho.edu
	ed at micro.uct.ac.za

	Other associated files:

A text document is also in the database, called References.  It is in 
Word format (Mac Word and MS-DOS) as well as text.  It documents the 
source of individual sequence files.  It also tries to make some sense 
(or further confuses the issue) on naming of strains or isolates.  In 
most cases, authors define the source and strain of virus that was 
used to generate the sequence data.  Occasionally, however, this is 
not the case and we've given the isolate a name if we felt it 
necessary to do so.  We have tried wherever possible to retain 
the original designation used.

	What is in the database:

The main portion of the PSD is, as expected, sequence data files.  The 
vast majority of information is 3'-end sequence encompassing the coat 
protein cistron and 3'-end nontranslated region.  In most cases you 
will find files any given virus as in the following example:


The SEQ extension is for nucleotide sequence while the .PEP extension 
is for amino acid sequence.  Thus, TamMVCP.SEQ is the coat protein 
cistron nucleotide sequence from tamarillo mosaic virus, TamMV3p.SEQ 
is the 3'-end nontranslated sequence (the CP stop codon will always be 
with the CP sequences and not 3'-end nontranslated sequences), and 
TamMVCP.pep is coat protein amino acid sequence.  If a strain 
designation is required, it will be just prior to the extension.  If 
datafile includes additional data upstream of the CP cistron, there 
will be a `+' in the name, such as `TAMMVCP+.SEQ'.  Complete 
nucleotide sequences will be listed as:  


(Nearly) All sequences are in GCG format.  This means that comments 
are preceded by a colon (:) and the last comment before data will have 
a colon followed by two periods (`..)  If, for some reason, you desire 
data in a format other than GCG and cannot easily reformat it, 
contact me and we can probably accomplish this for you.  In many 
cases, the file in the database is identical to that submitted to 
GenBank or EMBL, so the accession number will be evident.  These would 
also, in many cases, provide another source of amino acid sequence as 
well as amino acid sequence not included in the .pep file (e.g., 
partial NIb sequence).  In other cases, where data were obtained from 
a publication (and not in GenBank or EMBL databases), we have tried to 
annotate at least to the point that one can find the original source 
of data.

In the near future, we intend to also include multiple sequence 
alignments of the entire dataset.  These will be in GCG .msf format.  
If you have access to GCG (on a VAX) then it is possible to download 
the .msf files(s) and extract what is of interest to you, using the 
relevant GCG REFORMAT commands.

We also plan to include phylogenetic trees of CP and 3' non-coding 
region (NCR) sequences, as ASCII files and also as HPGL plotter files, 
for easy printing.


At the present time, data contained within the PSD is correct and 
accurate to best of our knowledge.  There is no guarantee that this is 
so, however.  We urge researchers generating sequence data to submit 
it to the major databases (e.g., GenBank) and also to the PSD.  Any 
users who find errors or  discrepancies should contact Phil Berger or 
Ed Rybicki at the above e-mail addresses or:

Dr. Phil Berger
Dept. of Plant, Soil and Entomological Sciences
Rm 242, Ag. Sci.
University of Idaho
Moscow, ID  83844-2339

208-885-7760 (fax)

Dr Ed Rybicki
Department of Microbiology
University of Cape Town
Private Bag, Rondebosch
7700 South Africa

voice xx27-21-650-3265
fax xx27-21-650-4023

Your comments and suggestions on what extra information should be 
included in the database are most welcome.

More information about the Virology mailing list

Send comments to us at biosci-help [At] net.bio.net