As an example, let's take Clostridium botulinum (the source of "Botox" serum), a soil bacterium with unusually low G+C content, at 28%. If we go through the organism's 3,404 protein-coding genes, we find actual base contents of:
These numbers are for a single strand (the coding strand or "message" strand) of DNA, which is why A and T aren't equal. For whole DNA, of course, A =T and G = C, but that's not the case here. We're just interested in the message strand.
If we put the above base frequencies into the Shannon equation, we come up with a value of 1.8467 for the information content (in bits) of one base of C. botulinum DNA. The DNA is about 7.67% redundant on a zero-order entropy basis. The DNA may be over 70% A and T, but it's a long way from being a two-base (one bit) information stream. Each base encodes an average of 1.8467 bits of information, which is a surprising amount (surprisingly close to 2.0) for such a skewed alphabet.