We've got several expression values for a gene, derived from mutations in its promoter region. For example, on a relative scale, an A in a specific promoter position triggers an expression value of 10, a C an expression of 8, a G an expression of 20, and a T an expression of 2. We've got such data for ten consecutive promoter positions, and each expression is the mean of ten experiments. But: How to derive the nucleotide frequencies of an "optimal promoter" from these data ? In the example, can we justify something like A=25%, C=20%, G=50%, T=5% ?
What's the best formula?