>> From: champlin at GAS.UUG.Arizona.EDU (Jacob B Champlin)
>> Newsgroups: bionet.software
>> Subject: Database search program?
>> Date: 6 Jul 1995 06:57:24 GMT
>> I am looking for a program that will go through the yeast gene database and
>> retrieve this information:
>>>> 1. The Name of every sequenced gene.
>>>> 2. The base at the start of the translation.
>>>> 3. The first 10 bases before the start of the translated region.
>>>> 4. The first 12 bases after the start of the translated region.
>>>> 5. Calculate the frequency of each of the four bases at each
>> position.
>>
Jacob -
Here's an approximate answer to your 5th question. It's the nucleotide
composition around the start codons of all yeast coding sequences (CDSs)
in GenBank release 89 (June 1995), excluding mitochondrial genes. I
calculated this using ACNUC database access software and a simple Fortran
program.
This analysis will be a bit inaccurate, because many yeast genes appear in
GenBank more than once. GenBank 89 has 6774 start codons, as compared to
about 4000 genes in the YPD non-redundant yeast database
(http://siva.cshl.org). You could use YPD to get a non-redundant list of
every sequenced yeast gene and their GenBank accession numbers.
Ken Wolfe
University of Dublin
actual numbers of bases proportions
---------------------- ----------------------------------
pos. T C A G T C A G total
-10 1965 1203 2534 1072 0.290080 0.177591 0.374077 0.158252 6774
-9 1872 1211 2614 1077 0.276351 0.178772 0.385887 0.158990 6774
-8 1798 1268 2649 1059 0.265427 0.187186 0.391054 0.156333 6774
-7 1888 1137 2683 1066 0.278713 0.167848 0.396073 0.157366 6774
-6 1928 1113 2491 1242 0.284618 0.164305 0.367730 0.183348 6774
-5 1817 1398 2453 1106 0.268231 0.206377 0.362120 0.163271 6774
-4 1497 1317 2982 978 0.220992 0.194420 0.440213 0.144376 6774
-3 827 650 4025 1272 0.122084 0.095955 0.594184 0.187777 6774
-2 1631 1425 2804 914 0.240774 0.210363 0.413936 0.134928 6774
-1 1416 1201 3093 1064 0.209035 0.177296 0.456599 0.157071 6774
+1 31 8 6718 17 0.004576 0.001181 0.991733 0.002510 6774
2 6703 20 20 31 0.989519 0.002952 0.002952 0.004576 6774
3 19 18 30 6707 0.002805 0.002657 0.004429 0.990109 6774
4 1850 917 2060 1947 0.273103 0.135371 0.304104 0.287423 6774
5 1473 2575 1722 1004 0.217449 0.380130 0.254207 0.148214 6774
6 2583 1292 1781 1117 0.381311 0.190729 0.262917 0.164895 6773
7 1510 1008 2360 1896 0.222911 0.148804 0.348391 0.279894 6774
8 1682 1645 2292 1155 0.248302 0.242840 0.338353 0.170505 6774
9 2170 1341 2053 1210 0.320342 0.197963 0.303071 0.178624 6774
10 1622 1205 2295 1652 0.239445 0.177886 0.338795 0.243874 6774
11 1799 1753 2324 898 0.265574 0.258784 0.343076 0.132566 6774
12 2138 1232 2311 1093 0.315619 0.181872 0.341157 0.161352 6774
--
Ken Wolfe
Department of Genetics
University of Dublin e-mail: khwolfe at tcd.ie
Trinity College phone: +353-1-608-1253
Dublin 2, Ireland FAX: +353-1-679-8558