Hans Stenvien wrote:
> I have data sets consisting of several thousand unique coding sequences. For
> each sequence, I would like to generate a number of randomised sequences
> with the same length and nucleotide composition. Does anyone know of
> appropriate online service/software allowing me to input all of my sequences
> from one file? It would be great if one file is generated containing all the
> random sequences for all original sequences. I have so far only found online
> services/softwares enabling me to shuffle nucleotide order in one sequence
> at a time (i.e. manual input of each sequence). I am afraid I am not able to
> do any programming myself.
>> Any help is appreciated.
>> Best regards,
> Hans
>>
The shuffle program from XYLEM will do that. The manual page is appended
below. XYLEM can be downloaded from:
http://home.cc.umanitoba.ca/%7Epsgendb/XYLEM.html
Binaries are available for Solaris and Linux, and source
code should compile readily on other platforms, since
the code is pretty simple.
--
======================================
Brian Fristensky (ON SABBATICAL til July 1, 2003)
Department of Plant Science
University of Manitoba
Winnipeg, MB R3T 2N2 CANADA
frist at cc.umanitoba.ca
Sabbatical phone: 204-474-6724
Voicemail: 204-474-6085
Home phone: 204-261-3960
FAX: 204-474-7528
http://home.cc.umanitoba.ca/~frist/
===========================================================
The most unforgiveable sin of all is being right too soon.
===========================================================
shuffle.doc update 3 Feb 94
SYNOPSIS
shuffle -sn [-wn -on]
DESCRIPTION
Shuffles sequences locally. See Lipman DJ, Wilbur WJ, Smith TF
and Waterman MS (1984) On the statistical significance of nucleic
acid similarities. Nucl. Acids Res. 12:215-226.
-sn n is a random integer between 0 and 32767. This number
must be provided for each run.
-wn n is an integer, indicating the width of the window for
random localization. If w exceeds the length of a
sequence,
or is negative, the entire sequence is scrambled as a
single
window. This is also the case if w is not specified.
-on n is an integer, indicating the number of nucleotides
overlap between adjacent windows. It should never exceed
the window size. o defaults to 0 if not specified.
If w and o are specified, overlapping windows of w nucleotides
are shuffled, thus preserving the local characteristic base
composition. Windows overlap by o nucleotides.
If w and o are not specified, each sequence is shuffled
globally,
thus preserving the overall base composition, but not the local
variations in comp.
Any number of sequences may be processed from a single input
file. In Pearson-format files, each new sequence begins with a
'>' comment line, indicating the name and a short description of
the sequence.
No distinction is made between protein or nucleic acid sequences.
That is, shuffle will read any of the following characters as
sequence:
T,U,C,A,G,N,R,Y,M,W,S,K,D,H,V,B,L,Z,F,P,E,I,Q,X,*,-
where '*' is the result of translating a stop codon, and '-'
is a gap generated during sequence alignment. Lowercase is
also accepted.
EXAMPLE
A sample output file is shown below. Note that the first two
lines of output are comment lines, listing the version of the
program and the parameters used in the run.
>SHUFFLE VERSION 11/ 8/93
>RANDOM SEED: 9873 WINDOW: 12
OVERLAP: 3
>BAZFAZ - Borborigmus azerbi F-actin-zeta gene
ctgagtagctagtcctaaatagttagtccatagtactagtacgggtcgtt
cacccttgggcagtg.....(etc.)
AUTHOR
Dr. Brian Fristensky
Dept. of Plant Science
University of Manitoba
Winnipeg, MB Canada R3T 2N2
Phone: 204-474-6085
FAX: 204-474-7528
frist at cc.umanitoba.ca
REFERENCE
Fristensky, B. (1993) Feature expressions: creating and manipulating
sequence datasets. Nucleic Acids Research 21:5997-6003.