Release of New Version of GRAIL

Ying Xu ying at mars.epm.ornl.gov
Thu Jul 28 12:37:16 EST 1994

The Informatics Group at  Oak Ridge National Laboratory announces  the
release of a new version of  GRAIL system. The new  release includes a 
new version  of GRAIL 2 exon  prediction system, a  new version of GAP 
III gene assembly program, and a new version of GRAIL 1a coding region 
prediction system. The new version also supports tools to find  Pol II 
promoters, PolyA sites, CpG islands, and repetitive  DNA  sequences.

GRAIL 2 is  designed to  predict coding exons in genomic sequences. It 
evaluates discrete exon candidates using  coding information from both
within and outside of the candidate region, and  other genomic context 
information including  splice junctions, translation starts, local G/C 
composition, etc.

GAP III  is a gene model construction  program. It builds reading frame
consistent  gene models from the output of GRAIL 2. The current version 
of GAP III only prints out one "best" gene model.
GRAIL 1a evaluates coding potential along a whole DNA sequence. It uses
a fixed-size window (100 bases long) to evaluate coding signal of  each 
position. It is designed mainly to predict coding regions in situations 
where genomic context information is lacking.

GRAIL 2, GRAIL 1a and the original GRAIL system (GRAIL 1) can be accessed 
through an email server at ORNL. To access them send email to grail at ornl.gov.
To get more information about how to use the GRAIL system, send email to 
grail at ornl.gov with HELP on the subject line or as the first line of the 

GRAIL 2, GAP III, GRAIL 1a and the original GRAIL system (GRAIL 1)  can 
also  be accessed through  a client/server program,  called XGRAIL. The 
client code is open to public and it can be obtained via anonymous ftp 
from arthur.epm.ornl.gov from directory pub/xgrail/sun/ver1.2.

The following  summarizes the  performance of the  three systems  on 137 
independent  Human and  Mouse DNA test  sequences (there  is NO  overlap 
between this set and the training set of the systems) consisting of total 
954 exons containing 161642 coding bases:

            T.P. (#exons)   T.P.(#bases)   F.P.(#exons)   F.P.(#bases) 

GRAIL 2      867 (91.0%)   146048 (90.4%)    82 (8.6%)    14781 (9.2%)
GAP III      859 (90.1%)   147388 (91.2%)    33 (3.7%)     9466 (6.0%)
GRAIL 1a     787 (82.5%)   132887 (82.2%)   100 (11.2%)   25681 (16.2%)

where T.P. and F.P. represent the true and false positives, respectively. 

Since GRAIL 1a  uses 100 base long windows we should not expect it will
find most of the exons short than 100. So we also list the  performance 
statistics of GRAIL 1a on exons of at least 100 bases long. On the same
test  sequences  there  are 711 exons at  least 100  bases long,  which 
consists of a total of 146724 coding bases.

            T.P. (#exons)                  T.P.(#bases)   

GRAIL 1a     675 (95.0%)                  125586 (85.6%)          


The list of test sequences:

hsckbg hsg6pdgen hsgstpig hsmpog hsupa huma1gly2 humaccybb humacroa humadred2
humagal humak1 humalifa humalppd humalred2 humant2x humapexn humapoai1
humapocia humapocib humaprta humatp1a2 humatpgg humazcdi humbhsd humbmyh7
humbtfe humcad humcapll humcavii5 humcd19a humcel humcnp humcola humcox5b
humcp21oh humcpgisl humcspa humcspb humctla1a humcycaa humcyp2d6 humcyp8p
humdef5a humdes humdkerb humdmkin humedhb17 humfabp humfkbpx humfos humg0s19a
humgamgloa humgapdhg humgcb1 humgck humglpex humglut4b humgos24b humgrp78
humhap humhepgfb humhkatpc humhll4g humhmg2a humhox13g humhox4a humhpars1
humhsd3ba humhskpqz7 humibp3 humifnrf1a humigfbp1a humirbpg humitilc08 humkal
humker18 humkertra humkrt1x hummh6 hummhba17w hummhchlab hummhcp42 hummhcp51
hummhcw1b hummhea hummk hummkxx hummyc3l humnakatp1 humnucleo humodc1a humop18a
humpci humpdhal humpdhbet humpem humpp14b humpreelas humprf1a humprok humregb
humrps17a humsoda humspbaa humsproz humtcrbra humtnc2 humtrhyal humtroc
humtrpy1b humubilp mus21 musaca musagp musalifa musalpbcry musantp91a musap5a
muscrknb muscyp14x muscytcb1 muscytcc musfisp12a musgusb musint1a musker19
musldha musmhka1a musmycna musmyogen musnfil musodcc musops muss100b musthygp
mustis105 rataccyb

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net