<Hi,
<
< I need a collection of human repetitive sequences that I will use
<to prescreen sequences before we send them off for Fasta or Blast searches.
<If anyone has or knows of a database of human repetitive sequences that I
<could get, preferably by FTP, I would be most grateful to find out. I
<checked in LIMB but only saw a reference to the Alu database. Even a list
<of GenBank locus names or accession numbers would do.
<
< Thanks in advance,
<
<
<Steve Clark
<
<clark at salk-sc2.sdsc.edu (Internet)
<clark at salk (Bitnet)
The following additions to SWISS-PROT will probably be useful to you !
Here is an excerpt for release notes of release 21:
2.7 Alu-derived warning entries
Following the advice and in collaboration with Jean-Michel Claverie of
the National Center for Biotechnology Information (NCBI, Washington
D.C.) we have added to SWISS-PROT Alu-derived "warning" entries. These
entries are provided in order to avoid the further 'pollution' of
protein sequence databases with Alu-derived amino acid sequences.
Alu repetitive sequences are interspersed in human and primate genomes
with an average spacing of 3 Kb. Some of them are actively transcribed
by pol III. Normal transcripts may contain Alu-derived sequences in 5'
or 3' untranslated regions. however, cDNA libraries also contain partial
and/or rearranged cDNAs ligated with Alu-derived sequence in any
orientation. This has been overlooked in several occasions, with the
consequence of erroneous Alu-derived amino acid sequences being
reported.
Various analyses indicate that Alu repeats fall into six classes (A to
F). Therefore six "warning" entries have been constituted with all six
frames conceptual translations of one random member of each of these
classes of Alu repeats. Any significant similarity of a putative protein
sequence with an Alu-translated entry must be taken as a warning that a
part of Alu repeat may have been artifactually included in the coding
nucleotide sequence.
These sequences have been assigned accession numbers P23959 (ALUA_HUMAN)
to P23964 (ALUF_HUMAN).