Database of repetitive sequences?

Tue Mar 24 11:21:00 EST 1992

<        I need a collection of human repetitive sequences that I will use
<to prescreen sequences before we send them off for Fasta or Blast searches.
<If anyone has or knows of a database of human repetitive sequences that I
<could get, preferably by FTP, I would be most grateful to find out. I
<checked in LIMB but only saw a reference to the Alu database. Even a list
<of GenBank locus names or accession numbers would do.
<        Thanks in advance,
<Steve Clark
<clark at salk-sc2.sdsc.edu  (Internet)
<clark at salk               (Bitnet)

The following additions to SWISS-PROT will probably be useful to you !
Here is an excerpt for release notes of release 21:

   2.7  Alu-derived warning entries

   Following the  advice and  in collaboration with Jean-Michel Claverie of
   the National  Center for  Biotechnology  Information  (NCBI,  Washington
   D.C.) we  have added  to SWISS-PROT Alu-derived "warning" entries. These
   entries are  provided in  order to  avoid  the  further  'pollution'  of
   protein sequence databases with Alu-derived amino acid sequences.

   Alu repetitive  sequences are  interspersed in human and primate genomes
   with an  average spacing  of 3 Kb. Some of them are actively transcribed
   by pol  III. Normal  transcripts may contain Alu-derived sequences in 5'
   or 3' untranslated regions. however, cDNA libraries also contain partial
   and/or  rearranged  cDNAs  ligated  with  Alu-derived  sequence  in  any
   orientation. This  has been  overlooked in  several occasions,  with the
   consequence  of   erroneous  Alu-derived   amino  acid  sequences  being

   Various analyses  indicate that  Alu repeats fall into six classes (A to
   F). Therefore  six "warning"  entries have been constituted with all six
   frames conceptual  translations of  one random  member of  each of these
   classes of Alu repeats. Any significant similarity of a putative protein
   sequence with  an Alu-translated entry must be taken as a warning that a
   part of  Alu repeat  may have  been artifactually included in the coding
   nucleotide sequence.

   These sequences have been assigned accession numbers P23959 (ALUA_HUMAN)
   to P23964 (ALUF_HUMAN).

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net