Dear Colleagues:
I enclose a release from Phil Berger and myself which has already gone
out in various forms to a number of workers in the field, and will
hopefully be formally published sometime soon. We would like to hear
as soon as possible if anyone has any difficulty accessing either
database, so that we can improve access if necessary. It goes without
saying that anyone who would like us to do sequence comparisons for
them has only to email them to us, and ask...! We welcome any
comments.
__________________________________________________________________
The Potyvirus Sequence Database
The potyvirus sequence database (PSD) is an attempt to provide workers
in the field with a comprehensive collection of potyvirus nucleotide
and amino acid sequences extant. In this way it is hoped that
researchers will be able to obtain information easily and
effectively. The database will be updated, hopefully, as new
sequences become available.
How to access and use the database:
The database is located on the University of Idaho Gopher and FTP
servers, and the University of Cape Town Gopher and FTP servers.
Anyone with access to theInternet can access the PSD. The easiest
method is to simply use PC Gopher / WinGopher for PC compatible
computers or TurboGopher for Macintosh computers. Gopher access in
Idaho is via gopher.uidaho.edu, 1/ UI Gopher Services/Library:
Electronic Publications/Potyvirus Databases; at the University of Cape
Town it is via gopher.uct.ac.za, University of Cape Town Campus
Information/Microbiology/Potyvirus Sequence Data. Both use server
port 70. For those who want to do it the hard way, use anonymous FTP
to crow.csrv.uidaho.edu (129.101.119.223) or ucthpx.uct.ac.za
(137.158.128.1) and go to /pub/data wherein you will find a directory
called potyvirus, which will contain the individual sequence files.
You can, in either case, FTP data files from the server to your own
computer or simply list the file and copy and paste. Specifically how
this is done will depend on the kind of computer you are using and the
specific hardware.
Should you encounter any problems accessing or using the database,
please don't hesitate to e-mail us at:
pberger at marvin.ag.uidaho.edu
and:
ed at micro.uct.ac.za
Other associated files:
A text document is also in the database, called References. It is in
Word format (Mac Word and MS-DOS) as well as text. It documents the
source of individual sequence files. It also tries to make some sense
(or further confuses the issue) on naming of strains or isolates. In
most cases, authors define the source and strain of virus that was
used to generate the sequence data. Occasionally, however, this is
not the case and we've given the isolate a name if we felt it
necessary to do so. We have tried wherever possible to retain
the original designation used.
What is in the database:
The main portion of the PSD is, as expected, sequence data files. The
vast majority of information is 3'-end sequence encompassing the coat
protein cistron and 3'-end nontranslated region. In most cases you
will find files any given virus as in the following example:
TAMMVCP.SEQ
TAMMV3P.SEQ
TAMMVCP.pep
The SEQ extension is for nucleotide sequence while the .PEP extension
is for amino acid sequence. Thus, TamMVCP.SEQ is the coat protein
cistron nucleotide sequence from tamarillo mosaic virus, TamMV3p.SEQ
is the 3'-end nontranslated sequence (the CP stop codon will always be
with the CP sequences and not 3'-end nontranslated sequences), and
TamMVCP.pep is coat protein amino acid sequence. If a strain
designation is required, it will be just prior to the extension. If
datafile includes additional data upstream of the CP cistron, there
will be a `+' in the name, such as `TAMMVCP+.SEQ'. Complete
nucleotide sequences will be listed as:
PVYNCOMPL.seq
(Nearly) All sequences are in GCG format. This means that comments
are preceded by a colon (:) and the last comment before data will have
a colon followed by two periods (`..) If, for some reason, you desire
data in a format other than GCG and cannot easily reformat it,
contact me and we can probably accomplish this for you. In many
cases, the file in the database is identical to that submitted to
GenBank or EMBL, so the accession number will be evident. These would
also, in many cases, provide another source of amino acid sequence as
well as amino acid sequence not included in the .pep file (e.g.,
partial NIb sequence). In other cases, where data were obtained from
a publication (and not in GenBank or EMBL databases), we have tried to
annotate at least to the point that one can find the original source
of data.
In the near future, we intend to also include multiple sequence
alignments of the entire dataset. These will be in GCG .msf format.
If you have access to GCG (on a VAX) then it is possible to download
the .msf files(s) and extract what is of interest to you, using the
relevant GCG REFORMAT commands.
We also plan to include phylogenetic trees of CP and 3' non-coding
region (NCR) sequences, as ASCII files and also as HPGL plotter files,
for easy printing.
Misc.:
At the present time, data contained within the PSD is correct and
accurate to best of our knowledge. There is no guarantee that this is
so, however. We urge researchers generating sequence data to submit
it to the major databases (e.g., GenBank) and also to the PSD. Any
users who find errors or discrepancies should contact Phil Berger or
Ed Rybicki at the above e-mail addresses or:
Dr. Phil Berger
Dept. of Plant, Soil and Entomological Sciences
Rm 242, Ag. Sci.
University of Idaho
Moscow, ID 83844-2339
208-885-6319
208-885-7760 (fax)
and:
Dr Ed Rybicki
Department of Microbiology
University of Cape Town
Private Bag, Rondebosch
7700 South Africa
voice xx27-21-650-3265
fax xx27-21-650-4023
Your comments and suggestions on what extra information should be
included in the database are most welcome.