Hi,
We're still using GCG 8.1, so maybe this has been added to more recent
versions already. In any case, in 8.1 if you do a shotgun sequencing
project where a clone "p1" has generated two nonoverlapping end sequences
"p1_left" and "p1_right", the GCG assembly package will not tell you when
"p1_left" is in one contig, and "p1_right" in another. It is possible to
derive this information by inspection, but that can be fairly painful to do
when more than a handful of clones are involved.
To address this problem, I wrote a small C program "showmesh" which
processes an "ends" file, which has no blank lines, and consists of a
series of records like this:
Test0011,Test0004 !intracontig
Test0007,Test0017 !intramesh
Test0013,noright !right missing
noleft,Test0010 !left missing
and a relation directory in a GCG assembly project. From that information,
it generates an output file like:
Test0011 test0001.fil Test0004 test0001.fil IntraContig
Test0007 test0001.fil Test0017 test0017.fil IntraMesh
Test0013 test0017.fil noright MISSING right_missing
noleft MISSING Test0010 test0017.fil left_missing
That is, it classifies each clone into:
IntraContig both ends in one contig
IntraMesh each end in a different contig
right_missing one end of clone not in project (either
left_missing removed or not entered yet)
If all contigs appear in IntraMesh records, then the project can be reduced
to a single contig by directed sequencing within the IntraMesh clones.
Anyway, some of you may find it useful. The program can be downloaded from:
http://seqaxp.bio.caltech.edu/pub/SOFTWARE/SHOWMESH.C
I only tested it on VMS, but it should work pretty much out of the box on
Unix as well. Please report all bugs, etc.
Regards,
David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech