IUBio

showmesh.c - when ends of clones join contings

mathog at seqaxp.bio.caltech.edu mathog at seqaxp.bio.caltech.edu
Mon Feb 15 15:28:14 EST 1999


Hi,

We're still using GCG 8.1, so maybe this has been added to more recent
versions already.  In any case, in 8.1 if you do a shotgun sequencing
project where a clone "p1" has generated two nonoverlapping end sequences
"p1_left" and "p1_right", the GCG assembly package will not tell you when
"p1_left" is in one contig, and "p1_right" in another.  It is possible to
derive this information by inspection, but that can be fairly painful to do
when more than a handful of clones are involved. 

To address this problem, I wrote a small C program "showmesh" which
processes an "ends" file, which has no blank lines, and consists of a 
series of records like this: 

Test0011,Test0004  !intracontig
Test0007,Test0017  !intramesh
Test0013,noright   !right missing
noleft,Test0010    !left missing

and a relation directory in a GCG assembly project.  From that information, 
it generates an output file like:

       Test0011    test0001.fil        Test0004    test0001.fil IntraContig
       Test0007    test0001.fil        Test0017    test0017.fil IntraMesh
       Test0013    test0017.fil         noright         MISSING right_missing
         noleft         MISSING        Test0010    test0017.fil left_missing

That is, it classifies each clone into: 

  IntraContig     both ends in one contig
  IntraMesh       each end in a different contig
  right_missing   one end of clone not in project (either
  left_missing       removed or not entered yet)

If all contigs appear in IntraMesh records, then the project can be reduced 
to a single contig by directed sequencing within the IntraMesh clones.

Anyway, some of you may find it useful.  The program can be downloaded from:
  
  http://seqaxp.bio.caltech.edu/pub/SOFTWARE/SHOWMESH.C

I only tested it on VMS, but it should work pretty much out of the box on 
Unix as well.   Please report all bugs, etc.

Regards,

David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech 



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net