hello all,
The Urbigene Package contains modest C++ tools for molecular biology I
wrote at the INTEGRAGEN company. As a subset of those tools do not
present any commercial interest so I've been allowed to release it to
the scientific community as an open source package under the GNU
General Public License (GPL). You'll find sources for parsing blast
results in XML format, the new versions of the CloneIt program,
filters for FASTA sequences, for PRIMER3 output... etc...
(There are also programs that are not dedicated to biology but may be
of general interest. For example PIVOT creates cross tables from
delimited files, GeneticProg tries to find an equation that fits
experimental values, etc...)
The package is available at:
http://www.urbigene.com
Usage Example
Consider the following script:
#This script takes as input the chromosome 22 from the goldenpath
#It then digests the whole chromosome by NotI
#cuts the boundaries by 6 bases,
#keeps fragments between 100 bases and 10Kb,
#keeps fragments containing a CA repeat,
#keeps fragments where %GC is between 40 and 60%,
#just keeps the 10 first sequences,
#converts the sequences as an input for primer3
#launches primer3
#converts the amplified fragments to FASTA
#blast those fragments against the whole goldenpath
#retains BLAST HSP where score is lower than 10 or greater then 50
#converts the output to text
#transforms this text to XML
#keeps the 50 first lines
#
BIN=./bin/
${BIN}/fastaretrieve -chr 22 -entry
/env/ig/pubdb/mirror/golden_path/14nov2002/chromosomes/entry_points.csv
|${BIN}/fastadigest -e NotI |
${BIN}/fastacrop -5 6 -3 6 |
${BIN}/fastasize -m 100 -M 100000 |
${BIN}/fastaslice -e 5000 -n 10000 |
${BIN}/fastafind -s CACACACACACA -print T |
${BIN}/fastagc -min 40 -max 60 -sort T |
${BIN}/fastahead |
${BIN}/fasta2primer3 -max-stgy 1 -gc-min 20 -gc-max 80 -max-size 2000
|primer3 |
${BIN}/primer3tofasta |
blastall -e 10 -p blastn -d
/env/ig/pubdb/blastdb/GP10apr2003/gp10apr2003 -m 7 |
#${BIN}/blastlisp -e 'or(lt(hsp.score(),10),gt(hsp.score(),50))' |
${BIN}/blast2txt |
${BIN}/text2xml |
head -n 50 > demo.txt
Result will be
//iteration
###################################################################### 22:0-47748584(+)|restriction_fragment[NotI(35516844)-NotI(35580318):63482]|crop_5(6)crop_3(6)|size_filter(100-100000)|slice(50000-59999)|gc(41.53%)|pcr_0(34-884:851
pb) primer_left(TTCCAAAGTGCTGGGATTATAG)
primer_right(TCTGGGATTTTCCAGAGGTATAG) len:851
####################################################################> build33|chr22|slice(37101000-37250999) len:150000 Object:94394-95244 Query:1-851 830 0
.....>
build33|chr22|slice(37101000-37250999) len:150000 Object:47208-47286 Query:682-761 32 1.48067e-07
..>
build33|chr22|slice(37101000-37250999) len:150000 Object:138640-138677 Query:3-40 30 2.31183e-06
<..
build33|chr22|slice(37101000-37250999) len:150000 Object:88694-88734 Query:43-3 29 9.13492e-06
<..
build33|chr22|slice(37101000-37250999) len:150000 Object:113943-113983 Query:43-3 29 9.13492e-06
<.
build33|chr22|slice(37101000-37250999) len:150000 Object:16320-16349 Query:32-3 26 0.000563575
<..
build33|chr22|slice(37101000-37250999) len:150000 Object:76801-76838 Query:40-3 26 0.000563575
<..
build33|chr22|slice(37101000-37250999) len:150000 Object:104040-104080 Query:43-3 25 0.0022269
<.
build33|chr22|slice(37101000-37250999) len:150000 Object:132737-132766 Query:32-3 22 0.137388
<.
build33|chr22|slice(37101000-37250999) len:150000 Object:71167-71196 Query:32-3 22 0.137388
<.
build33|chr22|slice(37101000-37250999) len:150000 Object:142101-142130 Query:32-3 22 0.137388
<.
build33|chr22|slice(37101000-37250999) len:150000 Object:127879-127904 Query:28-3 22 0.137388
.>
build33|chr22|slice(37101000-37250999) len:150000 Object:82934-82963 Query:3-32 22 0.137388
<.
build33|chr22|slice(37101000-37250999) len:150000 Object:53573-53602 Query:32-3 22 0.137388
<.
build33|chr22|slice(37101000-37250999) len:150000 Object:62190-62219 Query:32-3 22 0.137388
.>
build33|chr22|slice(37101000-37250999) len:150000 Object:33547-33576 Query:3-32 22 0.137388
..>
build33|chr22|slice(37101000-37250999) len:150000 Object:94134-94171 Query:3-40 22 0.137388
<..
build33|chr22|slice(37101000-37250999) len:150000 Object:95472-95509 Query:40-3 22 0.137388
.>
build33|chr22|slice(37101000-37250999) len:150000 Object:121477-121506 Query:3-32 22 0.137388
(...
Enjoy
Pierre Lindenbaum PhD