*f(x)*log(

*f(x)*), where

*f(x)*is the frequency of occurrence of the symbol

*x*. For example, a series of a coin tosses can be considered a binary information stream with symbol values H and T (heads and tails). If the frequency of H is 0.5 and

*f(T)*is 0.5, the entropy E, in bits per toss, is -0.5 times log (base 2) 0.5 for heads, and a similar value for tails. The values add up (in this case) to 1.0. The intuitive meaning of 1.0 (the Shannon entropy) is that a single coin toss conveys 1.0 bit of information. Contrast this with the situation that prevails when using a "weighted" or unfair penny that lands heads-up 70% of the time. We know intuitively that tossing such a coin will produce less information because we can predict the outcome (heads), to a degree. Something that's predictable is uninformative. Shannon's equation gives -0.7 times log(0.7) = 0.3602 for heads and -0.3 * log (0.3) = 0.5211 for tails, for an entropy of 0.8813 bits per toss. In this case we can say that a toss is 11.87% redundant.

Claude Shannon |

^{2}possible values). But what about organisms in which the bases occur with unequal frequencies? For example, we know that many organisms have DNA with G+C content quite a bit less than (or in some cases more than) 50%. The information content of the DNA will be less than 2 bits per base in such cases.

As an example, let's take

*Clostridium botulinum*(the source of "Botox" serum), a soil bacterium with unusually low G+C content, at 28%. If we go through the organism's 3,404 protein-coding genes, we find actual base contents of:

A 0.40189

T 0.30603

G 0.18255

C 0.10840

These numbers are for a single strand (the coding strand or "message" strand) of DNA, which is why A and T aren't equal. For whole DNA, of course, A =T and G = C, but that's not the case here. We're just interested in the message strand.

If we put the above base frequencies into the Shannon equation, we come up with a value of 1.8467 for the information content (in bits) of one base of

*C. botulinum*DNA. The DNA is about 7.67% redundant on a zero-order entropy basis. The DNA may be over 70% A and T, but it's a long way from being a two-base (one bit) information stream. Each base encodes an average of 1.8467 bits of information, which is a surprising amount (surprisingly close to 2.0) for such a skewed alphabet.

ReplyDeleteشركة نقل عفش | شركة نقل اثاث بجدة | شركة نقل عفش بالرياض | شركة نقل عفش بالمدينة المنورة | شركة نقل عفش بالدمام

شركة نقل عفش بالدمام

شركة نقل عفش بجدة

شركة نقل العفش بالمدينة المنورة

نقل العفش بالرياض

نقل عفش بالدمام

شركات نقل اثاث بالدمام

ReplyDeleteشركات تنظيف خزانات بجدة

نقل عفش بالخبر

شركة نقل عفش بخميس مشيط

شركة نقل عفش بالاحساء

شركة نقل عفش بجدة

نقل عفش بالمدينة المنورة

نقل عفش بالطائف

ReplyDeleteشركات نقل عفش ونظافة ومكافحة حشرات

شركات نقل عفش ونظافة ومكافحة حشرات

شركات تنظيف بالطائف

شركة تنظيف بالطائف

شركة تنظيف خزانات بجدة

شركات تنظيف بالطائف

نقل عفش بالرياض

شركات نقل العفش بالرياض