The Brugia malayi clustered EST dataset on the web
http://nema.cap.ed.ac.uk/nematodeESTs/Brugia/brugia.php
The Filarial Genome Project, sponsored by the World Health
Organisation/TDR, the UK Medical Research Council, New England
Biolabs, The Edna McConnell Clark Foundation and the Wellcome Trust,
has been sequencuing expressed sequence tags (ESTs) from Brugia
malayi for several years. The 23,000 ESTs have been the subject of
ongoing analyses, and we are pleased to announce the availability of
the latest version of our clustered database.
The Brugia malayi EST dataset has been clustered into groups of
sequences which are thought to encode for one gene. This clustering
process has been carried out here in Edinburgh using in-house
software based on the BLAST algorithm. Each cluster is identified by
a unique ID starting with 'BMC' (for Brugia malayi cluster) followed
by 5 digits. The ID numbers are consistent with previous publications
on the B.malayi dataset and will remain intact following any future
rebuilds. The sequences from each cluster are then used to build a
consensus sequence which is used for further analysis. The results of
this clustering analysis along with some basic annotation is now
available on NEMBASE
(http://nema.cap.ed.ac.uk/nematodeESTs/nembase.html) a nematode
specific database resource available for searching via the world wide
web. There is also a Brugia specific cluster search web page which
you may find easier to use at :
http://nema.cap.ed.ac.uk/nematodeESTs/Brugia/brugia.php
At present there are four ways in which the database can be searched :
A) By Accession Number or Cluster ID
B) By simple keyword searching of blast output
C) By sequence similarity
D) By stage expression
A) By cluster ID or Accession number of a constituent EST sequence.
On the search page there is a small text box in which you can enter
the ID of the cluster you are interested in (if already known) or the
accession number of an EST. Enter the appropriate ID and click on the
'go' button. You will be taken straight to a page detailing the
relevant cluster. [Please note that, at present, the ribosomal
RNA-derived ESTs are NOT included, and thus if you enter a ribosomal
RNA EST accession number, no answer will be returned. This omission
will be rectified in the near future.]
B) By BLAST annotation. After the consensus sequences were created
for each cluster, they were used to perform three separate blast
searches [blastn against the non-redundant DNA database, blastx
against the non-redundant protein database and blastn against the EST
database (dbEST)]. Results from these blasts are stored in NEMBASE
and may be searched by the use of simple keywords.
Simply enter the word you are interested in into the box marked
'annotation text' and click on the submit button. After a few moments
you will be directed to a page listing the clusters whose blast
results match you search keyword (e.g. If you entered 'globin', you
will be given a list of all the clusters in which the word 'globin'
appeared in the blast output). This list may be ordered either by
their relative abundance (number of sequences in the cluster) or by
their relative blast probability (e) value. In addition you may
specify a minimum blast probability to ensure that the blast hit is
'real'. The list shows the cluster ID, the number of sequences within
the cluster and the three top blast hits against each of the
databases. By clicking on the cluster ID you will be taken to the
page detailing that cluster.
C) By sequence similarity. If you are interested in finding the
clusters which most closely match a sequence you are interested in,
you may use the local BLAST facility. Simply cut and paste your
sequence into the large box of the search by sequence similarity
section, select any appropriate options and click on the submit
button. After a few moments you will see a page detailing the BLAST
output. The graphic at the top indicates the relative position and
score of each 'hit' against your sequence. Clicking on the cluster ID
in this graphic will take you to the alignment of that cluster
against your input sequence. Clicking on the cluster ID by the
alignment view will take you to the page detailing that cluster.
D) By stage expression profile. You may be interested in clusters
containing sequences which are expressed only at particular stages or
at particular levels of abundance. This search mode is accessed via a
separate page. Click the link from the Brugia page and you will see a
form which enables you to enter a profile into the boxes and retreive
a list of clusters which satisfy that profile. Valid arguments are
numbers to indicate that exactly that number of sequences is found in
the cluster, or you can use the '>' and '<' symbols to specify a
minimum or maximum number of sequences that have to be present
respectively. For example if you wanted clusters which were
relatively highly expressed in microfilaria but were not found to be
expressed in adults, you might enter '>5' in the MF box and '0' in
the 'Total Adults' box. Pressing submit would then retreive all
clusters which contained more than 5 sequences from MF libraries but
no sequences from any adult library. At this step you have two
choices for output
1: Normal list of clusters (as above)
2: A graphic (PhyloView) showing the realtive phylogenetic
distribution of blast similarity matches of the clusters with three
datasets.
To use the PhyloView option, simply select the appropriate button and
choose the three datasets from the lists before clicking the submit
button. After a few moments you will see a graphic appear with a list
of clusters beneath it. The graphic is an interactive Java
application which allows you to zoom in and around the triangle
representing phylogenetic phase space. Within the triangle are
coloured squares. Each represents a unique cluster. The relative
position of the square to the three vertices represents the realtive
phylogenetic distribution of blast similarity matches of that cluster
to the three organisms chosen on the previous page. The colour of the
square indicates the highest Blast score obtained against the three
datasets. Clicking on each square reveals its cluster number. By
holding down the <ctrl> key while clicking on a square will launch a
new web window detailing that cluster. [Please note that the full
functionality of the PhyloView is only available in either Netscape
4.7 and above or Internet Explorer 5 and above]
What the Cluster View Shows :
The detailed view of each cluster contains the following information :
TOP LEFT
A brief summary of the cluster indicating its index number, the
number and types of sequences belonging to the cluster, the number of
contigs predicted for the cluster by the assembly program (different
contigs represent either alternative splices or different alleles)
and the libraries represented by the cluster. For sequence types :
Blue = EST, Red = cDNA, Magenta = genomic DNA, Green = GSS
TOP RIGHT
To the right of the summary table is the precomputed blast
information available on the cluster. By moving the mouse over the
appropriate button, different hits will be displayed within the text
window. Clicking on the buttons launches a new window showing the
BLAST output. It should be noted that similarity scores > e^-99 are
scored as 0.
The text "no significant hits " indicates that no hits with a
similarity score of < e^-5 were obtained, while "no hits" indicates
that no blast hits were found for this cluster.
MIDDLE
Below the header information is data pertaining to each contig - the
contig number, length of the sequence, number of ESTs which make up
this particular contig and a list of these ESTs (coloured by type -
see above). Click on the EST name to retreive the GenBank entry.
BOTTOM
Under the contig information is a simple graphic indicating the
position of the sequences relative to the contig. The sequences are
coloured according to quality/alignment information. Gold indicates
sequence of high quality; purple indicates lower quality sequence
used in creating the consensus sequence. GenBank entries can again be
retreived by clicking on each sequence within the graphic. Finally,
below the graphic the cluster consensus sequence is given. The BLAST
button below this takes you to our in-house BLAST server
automatically pasting in the consensus sequence.
Analyses using this database should reference Parkinson, J., C.
Whitton, D. Guiliano, J. Daub and M.L. Blaxter. 2001. 200,000
nematode ESTs on the net. Trends in Parasitology. 17: 394-396.
If you have any problems or questions do not hesitate to contact John
Parkinson at john.parkinson at ed.ac.uk.
Mark Blaxter, David Guiliano and John Parkinson
Edinburgh 21/Aug/2001
--
_________________________________________________
Dr. Mark Blaxter email Mark.Blaxter at ed.ac.uk
Reader in Nematode Genetics
Institute of Cell, Animal and Population Biology
Ashworth laboratories, Room 311
King's Buildings, University of Edinburgh,
West Mains Road, EDINBURGH EH9 3JT, UK
phone: (+44) 131 650 6760 **NEW** Fax :...650 7489
see http://www.nematodes.org
~ may all beings be happy ~
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://iubio.bio.indiana.edu/bionet/mm/parasite/attachments/20010822/65a420bb/attachment.html