IUBio

CDS parse software for UNIX - is something available?

Brian Fristensky frist at cc.umanitoba.ca
Thu Jun 4 18:08:53 EST 1998


Alexander Kanapin wrote:

> I am looking for softwrare (source code) for simple parsing of GenBank
> CDS field. Does anybody know about such tools? I need some kind of
> automatic splicing - to read CDS data and produse new sequence string
> according to CDS coordinates of coding regions from entry sequence.
> 


The XYLEM package is specifically designed for creating and
managing sequence database subsets. It includes a program
called FEATURES that, given a set of GenBank Accession
numbers or LOCUS names, extracts all features corresponding
to one or more feature keys (eg. CDS, mRNA, exon, etc.).
ANY legal feature can be parsed to yeild a sequence. Even
when features are scattered across SEGMENTED GenBank
entries (eg. exons of a large gene were sequenced, but
not introns) FEATURES will properly reassemble them, 
according to join() statements in the Features Table
of the entry. For example, given the feature

CDS            
join(M38619:160..256,M38620:11..307,                                       
M38621:11..179,M38622:11..176, 
                M38623:11..250,11..103) 
                /product="green visual pigment" 
                /gene="G101"
                /codon_start=1 

sequences from six different entries would be joined to
recreate the CDS.

FEATURES also allows extraction of sequence fragments
using Feature Table expressions . For example, give the
following feature:

         CDS             305..640
                         /gene="flaL"
                         /label=ORF2
                         /note="flaD (sin) homologue; putative"
                         /codon_start=1

Any of the following expressions would return the same sequence:

M60287:305..640
M60287:ORF2
M60287:/label=ORF2
M60287:/gene="flaL"
M60287:/note="flaD (sin) homologue; putative" 

Expressions allow you to extract sequence fragments even
when they are not explicitly annotated, or to work around
errors in annotation.

To learn more about XYLEM, or download Solaris binaries or
source code, see:

http://home.cc.umanitoba.ca/~psgendb/XYLEM.html

===============================================================================
Brian Fristensky                | Let me get this straight:
Department of Plant Science     |  
University of Manitoba          | A company that dominates the desktop,
Winnipeg, MB R3T 2N2  CANADA    | and can afford to hire an army of the 
frist at cc.umanitoba.ca           | world's best programmers, markets 
Office phone:   204-474-6085    | what is arguably the world's LEAST
FAX:            204-474-7528    | reliable operating system?            
http://home.cc.umanitoba.ca/~frist/    What's wrong with this picture?
===============================================================================




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net