I have no idea of Consed so what format is suitable for its input.
But what you want is splitting a big FASTA format sequence file into
smaller ones I can show one.
In my case, with Perl read the definition line begining with ">" and
extract an ID, e.g. GenBank id or Swissprot ID etc.
Then read the first 3 characters from the extracted id and make directory
with 3 characters. The remaining part of the ID is used as a file name for
that sequence.
Read seqeunce part and save to a directory/file above.
And then go on and on....
For example)
> X91003
GCACGATCGTATGCTAGGATGATGTGCTCGATGATCTAGTCGTAGCTAGTGCTGATGTCGATG
is saved to a X91/003 file.
Sincerely,
On Thu, 14 Oct 1999, Mei wrote:
> Hello,
>> I am interested in assemble some of the EST sequences that I have downloaded
> from Entrez. So far, I am using csplit command in unix, then use a perl
> script to rename files. Finally, use a shell script to generate fake phd
> files for Consed. This approach works well if I have less than 100
> sequences, because csplit only split up to 99 files. Id like to know how
> to split and rename the fasta file according to the gi numbers in the
> definition lines when I have large number of sequences to assemble. A hint
> in how to write a perl script for this purpose will be greatly appreciated.
>> Thanks,
>> Mei
>>>
===============================================================================
Science is the game we play with God to find out what his rules are.
-------------------------------------------------------------------------------
Doo Suk Yang
Research Scientist Voice: 82-42-866-2222
LG Chemical Ltd. Research Park
Biotech Research Institute I FAX: 82-42-861-2566
-------------------------------------------------------------------------------
A fool hath no delight in understanding, but that his heart may discover itself.
(Proverbs 18:2)
===============================================================================