Consed, sequence assemble

microbe at CHOLLIAN.NET microbe at CHOLLIAN.NET
Fri Oct 15 04:39:08 EST 1999

I have no idea of Consed so what format is suitable for its input.
But what you want is splitting a big FASTA format sequence file into
smaller ones I can show one.
In my case, with Perl read the definition line begining with ">" and
extract an ID, e.g. GenBank id or Swissprot ID etc.
Then read the first 3 characters from the extracted id and make directory
with 3 characters. The remaining part of the ID is used as a file name for
that sequence.
Read seqeunce part and save to a directory/file above.
And then go on and on....

For example)
> X91003

is saved to a X91/003 file.


 On Thu, 14 Oct 1999, Mei wrote:

> Hello,
> I am interested in assemble some of the EST sequences that I have downloaded
> from Entrez.  So far, I am using “csplit” command in unix, then use a perl
> script to rename files.  Finally, use a shell script to generate fake phd
> files for Consed.  This approach works well if I have less than 100
> sequences, because csplit only split up to 99 files.  I’d like to know how
> to split and rename the fasta file according to the gi numbers in the
> definition lines when I have large number of sequences to assemble.  A hint
> in how to write a perl script for this purpose will be greatly appreciated.
> Thanks,
> Mei

Science is the game we play with God to find out what his rules are.
Doo Suk Yang
Research Scientist                              Voice: 82-42-866-2222 
LG Chemical Ltd. Research Park
Biotech Research Institute I                    FAX:   82-42-861-2566
A fool hath no delight in understanding, but that his heart may discover itself.
(Proverbs 18:2)

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net