The Informatics Group at Oak Ridge National Laboratory announces the
release of a new version of GRAIL system. The new release includes a
new version of GRAIL 2 exon prediction system, a new version of GAP
III gene assembly program, and a new version of GRAIL 1a coding region
prediction system. The new version also supports tools to find Pol II
promoters, PolyA sites, CpG islands, and repetitive DNA sequences.
GRAIL 2 is designed to predict coding exons in genomic sequences. It
evaluates discrete exon candidates using coding information from both
within and outside of the candidate region, and other genomic context
information including splice junctions, translation starts, local G/C
composition, etc.
GAP III is a gene model construction program. It builds reading frame
consistent gene models from the output of GRAIL 2. The current version
of GAP III only prints out one "best" gene model.
GRAIL 1a evaluates coding potential along a whole DNA sequence. It uses
a fixed-size window (100 bases long) to evaluate coding signal of each
position. It is designed mainly to predict coding regions in situations
where genomic context information is lacking.
GRAIL 2, GRAIL 1a and the original GRAIL system (GRAIL 1) can be accessed
through an email server at ORNL. To access them send email to grail at ornl.gov.
To get more information about how to use the GRAIL system, send email to
grail at ornl.gov with HELP on the subject line or as the first line of the
message.
GRAIL 2, GAP III, GRAIL 1a and the original GRAIL system (GRAIL 1) can
also be accessed through a client/server program, called XGRAIL. The
client code is open to public and it can be obtained via anonymous ftp
from arthur.epm.ornl.gov from directory pub/xgrail/sun/ver1.2.
The following summarizes the performance of the three systems on 137
independent Human and Mouse DNA test sequences (there is NO overlap
between this set and the training set of the systems) consisting of total
954 exons containing 161642 coding bases:
T.P. (#exons) T.P.(#bases) F.P.(#exons) F.P.(#bases)
GRAIL 2 867 (91.0%) 146048 (90.4%) 82 (8.6%) 14781 (9.2%)
GAP III 859 (90.1%) 147388 (91.2%) 33 (3.7%) 9466 (6.0%)
GRAIL 1a 787 (82.5%) 132887 (82.2%) 100 (11.2%) 25681 (16.2%)
where T.P. and F.P. represent the true and false positives, respectively.
Since GRAIL 1a uses 100 base long windows we should not expect it will
find most of the exons short than 100. So we also list the performance
statistics of GRAIL 1a on exons of at least 100 bases long. On the same
test sequences there are 711 exons at least 100 bases long, which
consists of a total of 146724 coding bases.
T.P. (#exons) T.P.(#bases)
GRAIL 1a 675 (95.0%) 125586 (85.6%)
-----------------------------------------------------------------------------
The list of test sequences:
hsckbg hsg6pdgen hsgstpig hsmpog hsupa huma1gly2 humaccybb humacroa humadred2
humagal humak1 humalifa humalppd humalred2 humant2x humapexn humapoai1
humapocia humapocib humaprta humatp1a2 humatpgg humazcdi humbhsd humbmyh7
humbtfe humcad humcapll humcavii5 humcd19a humcel humcnp humcola humcox5b
humcp21oh humcpgisl humcspa humcspb humctla1a humcycaa humcyp2d6 humcyp8p
humdef5a humdes humdkerb humdmkin humedhb17 humfabp humfkbpx humfos humg0s19a
humgamgloa humgapdhg humgcb1 humgck humglpex humglut4b humgos24b humgrp78
humhap humhepgfb humhkatpc humhll4g humhmg2a humhox13g humhox4a humhpars1
humhsd3ba humhskpqz7 humibp3 humifnrf1a humigfbp1a humirbpg humitilc08 humkal
humker18 humkertra humkrt1x hummh6 hummhba17w hummhchlab hummhcp42 hummhcp51
hummhcw1b hummhea hummk hummkxx hummyc3l humnakatp1 humnucleo humodc1a humop18a
humpci humpdhal humpdhbet humpem humpp14b humpreelas humprf1a humprok humregb
humrps17a humsoda humspbaa humsproz humtcrbra humtnc2 humtrhyal humtroc
humtrpy1b humubilp mus21 musaca musagp musalifa musalpbcry musantp91a musap5a
muscrknb muscyp14x muscytcb1 muscytcc musfisp12a musgusb musint1a musker19
musldha musmhka1a musmycna musmyogen musnfil musodcc musops muss100b musthygp
mustis105 rataccyb