Arabidopsis gene nomeclature-a decision

michael.bevan michael.bevan at bbsrc.ac.uk
Wed Sep 22 16:56:26 EST 1999

Dear Colleagues,

A uniform gene nomenclature system for Arabidopsis was discussed at an impromtu
meeting at GSAC in Miami attended by Daphne Preuss, Chris Somerville, Claire
Fraser, Xiaoying Lin and Mike Bevan on Sept. 18th.

It was decided that the following uniform system will be used in the
forthcoming publication of the sequence of chr 2 and chr 4. A rapid decision
was needed due to the time needed to implement the new names.

At			=organism
1,2,3,4,5		=chromosome
g			=gene
00010			=gene id

The g convention is useful as repeats (r) will soon be annotated, initially as
markers. Pseudogenes will be numbered like functional genes.

Gene are numbered in order from the top to bottom of the chromosomes. In the
case of chr 2 and 4 this boundary is known due to the presence of rDNA
clusters. Gene At4g00010 is the first gene south of the cluster.
Gene order is defined in units of 10 ie.

00010, 00020, 00030, etc allowing 9000 genes per chromosome.

If new genes are found between two annotated genes, either by experiment or
improved gene finding programs, these will be numbered as:

00010, 00012,3,4,-9. This give plenty of room for expansion.

Different versions of a gene product, eg a differentially spliced gene , are
denoted as 00010.1,2,3 etc.

Where there are sequence gaps, often of uncertain size and content (eg CEN2 and
CEN4), the sequence groups will leave a space the equivalent of 100 - 200
genes. Where the top arm telomeres have not yet been reached, a gap equivalent
to about 50 genes should be left, ie numbering will start 05000, 05010, etc.

The numbering of repeats will follow an independent system, where repeat ids
are not interpolated between gene identities.

Please don't worry that the BAC naming conventions will be lost or erased from
the records. We realise these are presently the most commonly used names,
therefore the databases will have a simple way of relating the two naming
conventions. Note that a single "At4g00650" gene can have two BAC names, due to
overlaps, and this is one of the reasons for implementing the new nomenclature.
You will be able to search for an individual gene with this new name.

We believe this system conforms to that used in other organisms, and will be
very useful to the community.

Your comments and continued feedback are welcomed.

