DNA is a language with many hidden redundancies. It's a four-letter language, with symbol choices of A, G, C, and T (adenine, guanine, cytosine, and thymine), which means any given symbol should be able to convey two bits' worth of information, since log2(4) is two. But it turns out, different organisms speak different "dialects" of this language. Some organisms use G and C twice as often as A and T, which (if you do the math) means each symbol is actually carrying a maximum of 1.837 bits (not 2 bits) of information.
Consider how an alien visitor to earth might be able to use information theory to figure out terrestrial molecular biology.
The first thing an alien visitor might notice is that there are four "symbols" in DNA (A, G, C, T).
By analyzing the frequencies of various naturally occurring combinations of these letters, the alien would quickly determine that the natural "word length" of DNA is three.
There are 64 possible 3-letter words that can be spelled with a 4-letter alphabet. So in theory, a 3-letter "word" in DNA should convey 6 bits worth of information (since 2 to the 6th power is 64). But an alien would look at many samples of earthly DNA, from many creatures, and do a summation of -F * log2(F) for every 3-letter "word" used by a given creature's DNA (where F is simply the frequency of usage of the 3-letter combo). From this sort of analysis, the alien would find that even though 64 different codons (3-letter words) are, in fact, being used in earthly DNA, in actuality the entropy per codon in some cases is as little as 4.524 bits. (Or at least, it approaches that value asymptotically.)
Since 2 to the 4.524 power is 23, and since proteins (the predominant macromolecule in earthly biology) are made of amino acids, a canny alien would surmise that there must be around 23 different amino acids; and earthly DNA is a language for mapping 3-letters words to those 23 amino acids.
As it turns out, the genetic code does use 3-letter "words" (codons) to specify amino acids, but there are 20 amino acids (not 23), with 3 "stop codons" reserved for telling the cell's protein-making machinery "this is the end of this protein; stop here."
![]() |
E. coli codon usage. |
While DNA's 6-bit codon bandwidth permits 64 different codons, and while organisms do generally make use of all 64 codons, the uneven usage pattern means fewer than 6 bits of information are used per codon. To get the actual codon entropy, all you have to do is take each usage frequency and calculate -F * log2(F) for each codon, then sum. If you do that for E. coli, you get 5.679 bits per codon. As it happens, E. coli actually does make use of almost all the available bandwidth (of 6 bits) in its codons. This turns out not to be true for all organisms, however.
Just commenting from slightly modified ignorance here, but: when the codons are spelled out in letters it is easy to start thinking of them as a code in the sense of a human code: an arrangement of symbols that are basically featureless except for their sole property of Letter ("A" or "G" etc), and to start analyzing them on that basis. But the codons aren't simple letter-symbols. They are trios of molecules that "code" for a protein only in the sense that their shape matches the shape (of a transcript codon that matches the shape) of a protein. Three-D shapes (modified by electron density) are not simple properties. That's why you get, what is it, six different codons that mean "Leucine"? The shapes of those six codons are probably slightly different, and might not some of them be marginally more effective at grabbing Leucine and attaching it? Just saying, there is more at play here than Shannon's formulas.
ReplyDeleteشركة نقل عفش بالمدينة المنورة
ReplyDeleteشركة نقل عفش بخميس مشيط
شركة نقل اثاث بابها
شركة نقل عفش بنجران
ِشركة نقل عفش بحائل
شركة نقل عفش بالقصيم
شركة نقل عفش بالباحة
شركة نقل عفش بينبع
شركة نقل عفش بالمدينة المنورة
ReplyDeleteشركة نقل عفش بالمدينة المنورة
شركة نقل عفش
ارخص شركة نقل عفش بالمدينة المنورة
شركة نقل عفش بالقصيم
شركة نقل عفش بخميس مشيط
شركة نقل عفش بابها
شركة نقل عفش بتبوك
شركات نقل اثاث بالدمام
ReplyDeleteشركة نقل اثاث بالخبر
شركة نقل عفش بجدة
شركة غسيل مسابح بالدمام
شركة نقل العفش بالمدينة المنورة
ارخص شركات نقل العفش بالدمام
شركة غسيل الفلل بالدمام