repeat search

Keith James kdj at fes1.sanger.ac.uk
Wed Apr 19 08:46:37 EST 2000

>>>>> "Vladislav" == Vladislav Grebenyuk <grebenyu at mail.Uni-Mainz.de> writes:

    Vladislav> Many thanks to Michael Mitchell and Fred The program
    Vladislav> REPRO (Heringa and Argos, 1993) is able to recognize
    Vladislav> distant repeats in a single query sequence.

    Vladislav>  There is also some commercially available software
    Vladislav> with a capability of repeats search in a SINGLE
    Vladislav> sequence. For example GeneQuest from DNA Star
    Vladislav> package. But I have a lot of sequences. And I have to
    Vladislav> find repeats, not to mask them To find a repeats (like
    Vladislav> SINE, LINE)I have to compare all my sequences to each
    Vladislav> other. Old famous PC-Gene is able of database creation
    Vladislav> and a homology search. That could be done also in
    Vladislav> FASTA. Then it is going to be N-1 runs. I.e. 499 runs
    Vladislav> of homology search for my 500 sequences. And what shell
    Vladislav> I do than? How can I handle this data? It is also just
    Vladislav> a little hard to perform 499 runs of homology search
    Vladislav> manually.

Again, assuming DNA repeats...

If you are looking for new (previously uncharacterised) repeats you
could use the miropeats scripts by Jeremy Parsons at the EBI. This
contains a C-shell script which you will display a Postscript diagram
of the repeats within or between different sequences. Another script
will print a report of the repeats as text. The package requires the
icatools programs. Both are available at:


You will need a C compiler and access to a Unix-style C-shell (and a
Postscript viewer like Ghostview). I've you have access to Linux a box
you have all you need.

I've had good results with them. I would not try to look at all 500
sequences in the Postscript diagram at one time - it will not be

I don't think it will help you (having 500 sequences) but you could
take a look at Reputer:


It might still be useful if your individual sequences are very long.

As you may have a lot of runs to do, you should probably write a shell
or Perl script to perform the searches (and possibly to filter the



Keith James  --  kdj at sanger.ac.uk  --  http://www.sanger.ac.uk/Users/kdj
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net