I just wanted to point out that an addition has been made to the help page
for the Cereon Polymorphism data at TAIR. The new help document is listed
below. For those of you who aren't aware, Cereon Genomics has made
available a collection of 39,000 predicted polymorphisms between Col and
Ler. These are available to researchers from academic and not-for-profit
institutions through the TAIR website at http://www.arabidopsis.org/cereon/
As always, questions or comments about this dataset can be sent to
athal at cereon.com.
The additional text is:
Please note that the coordinates used in the datafiles refer to the
originally submitted BAC sequence. Many BAC sequences at GenBank have been
edited by the AGI groups in order to produce finished chromosome records.
This involves removing overlapping regions, and flipping some clones in
order to produce a consistent direction along the chromosome. In addition,
AGI groups may make alterations at any time to the submitted sequence in
order to correct errors. This can also cause the original coordinates to be
In order to access the original BAC sequence, you need to use the link
provided in the current GenBank record. The link will look something like
"COMMENT: On Dec 16, 1999 this sequence version replaced gi:5729683"
The flanking sequence provided in the Cereon data files attempts to provide
an alternative way to locate the polymorphism. The 20mers can be used to
BLAST <http://arabidopsis.org/blast/> against the Arabidopsis genome to
identify the specified location. There are some caveats to keep in mind when
* 1. This sequence should help find the appropriate location in the
BAC of interest. It is not necessarily unique to the genome. It may also
match other BACs in the genome, but these are not important for locating the
* 2. If the 20mer matches more than once in the BAC of interest, try
using the other 20mer as well and combining the results. You can also use
TAIR's PatMatch <http://arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl>,
which allows you to put in the polymorphic sequence as well as its
approximate length in between the two 20mer set.
* 3. If the 20mer does not find a match in the BAC of interest, it
could be that the editing mentioned above may have moved this location to a
neighboring BAC. In this case, check your search results against the
* 4. If it still does not match, beware that using the default BLAST
parameters does not always work well with such a small query sequence.
Several things can increase your chances of finding a match in the BAC
sequence of interest.
* A. Use a smaller database. An example would be a species
specific collection at NCBI, or the TAIR BLAST
<http://arabidopsis.org/blast/> server selecting only Arabidopsis genomic
sequences > 10kb
* B. Do not filter for low complexity.
* C. Increase the mismatch penalty to -8. This should force
* 5. If multiple hits to the same BAC occur - do not panic. Remember,
many indels are caused by a different copy number of a direct repeat. The
flanking sequence may therefore hit multiple places. The best bet here is to
pick primers several hundred bases either side of this general region.