Regulatory regions analysis: NSITE

Victor Solovyev solovyev at sanger.ac.uk
Tue Aug 17 16:15:00 EST 1999

 We installed NSITE program to analyze genome Regulatory regions
It is available at http://genomic.sanger.ac.uk/ of our
Computational Genomic Group WEB server

NSITE Program Description

NSITE - Search for of consensus patterns with statistical estimation

by Ilham Shahmuradov and Victor Solovyev

Analysis of nucleotide sequences is available through WWW:

NSITE serves for analysis of regulatory regions and their functional
motifs composition. The program is designed on UNIX OS and
adopted to work with Transfac type sites.

Method description:
     The method is based on statistical estimation of expected number
     of a nucleotide consensus pattern in a given sequence [1-2]. It
     uses the NSITE formatted datafile, which can include any set
     of consensus sequences of functional motifs. In current version this
     file consists of the public release of Transfac sequences (3.4, 1998),
     composite elements [3] and a set additioanl functional

     If we found a pattern which has expected number significantly less
     than 1, it can be supposed that the analysed sequence
     possesses the pattern's function.

     In the output of NSITE we can see a pattern, its position in the
     sequence, accession number, ID, Description of motif and binding
     factor name from the original database if exist.

     Asknowledgments: We asknowledge Igor Rogozin which took part in
     development some applications of this method for
     nucleotide consensuses searching on IBM PC [4].

     Output example:

      Program  *** N S I T E *** Shahmuradov, Solovyev

      File with SITEs:     nsite.dat
      File with SEQUENCEs: ace1.seq
      Search PARAMETRS: Expected. Number -  0.0100000
      Siginicance Level -  0.9500000  Print Status - Yes

      Note: AC - Accession no. in TRANSFAC   or  NSITE DB
            DE - Description (gene or gene product)
            RE - Gene region (e.g. promoter,enhancer or unknown)
            BF - Binding factor(s)
            OS - Organism species

     > ace-1 /acetylcholinesterase 1 (ACHE)/* Chr. 10*/C.elegans/-2200:-1/
     Frequencies:  A -  0.31   G -  0.16   T -  0.35   C -  0.18 ... Length =

              10        20        30        40        50        60
          2110      2120      2130      2140
         25. [  3] T: AC: R00037  / DE: beta-actin
      RE: unknown                            / OS: human, Homo sapiens
      BF:  SRF ..

     ---------- Sites in  2nd chain ----------
      Max mismatch :  2
      Exp.Number:    0.006 Conf.Interval:   0 Found:   1
      begin: 1704 end: 1695 mismatch:   0 exp.num.:   0.006, site:CCTTTTATGG
         74. [  1] T: AC: R00103  / DE: AMV (avian myeloblastosis virus)
      RE: unknown                            / OS: AMV, avian myeloblastosis
      BF:  C/EBPalpha ..

     ---------- Sites in  1st chain ----------
      Max mismatch :  0
      Exp.Number:    0.004 Conf.Interval:   0 Found:   1
      begin: 1920 end: 1928 mismatch:   0 exp.num.:   0.004, site:CTTGCGTCA
        103. [  1] T: AC: R00140  / DE: apoAII (apolipoprotein AII)
      RE: unknown                            / OS: human, Homo sapiens
      BF:  Tf-LF1 ..  NF-BA1 ..

              10        20
     ---------- Sites in  2nd chain ----------
      Max mismatch :  4
      Exp.Number:    0.001 Conf.Interval:   0 Found:   1
      begin:  897 end:  879 mismatch:   4 exp.num.:   0.001,
        287. [  1] T: AC: R00381  / DE: EGF receptor
      RE: unknown                            / OS: human, Homo sapiens
      BF:  Sp1 ..

              10        20
     ---------- Sites in  1st chain ----------
      Max mismatch :  4
      Exp.Number:    0.004 Conf.Interval:   0 Found:   1
      begin:   79 end:   94 mismatch:   4 exp.num.:   0.004,


Victor Solovyev
The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK
Email: solovyev at sanger.ac.uk  http://genomic.sanger.ac.uk
Phone: 44-1223-494799  FAX:   44-1223-494919

