Talk:Entropy of a probability distribution: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Robert Tito
mNo edit summary
imported>Greg Woodhouse
(question - a little background on the use of "entropy" in coding theory)
Line 19: Line 19:
:Interesting. I hadn't thought of that. Traditionally, ''H'' is used for the entropy function in coding theory. I'm not sure about the history behind that, but now that I think back to my physics classes, you are right about ''S'' being entropy. [[User:Greg Woodhouse|Greg Woodhouse]] 20:38, 4 July 2007 (CDT)
:Interesting. I hadn't thought of that. Traditionally, ''H'' is used for the entropy function in coding theory. I'm not sure about the history behind that, but now that I think back to my physics classes, you are right about ''S'' being entropy. [[User:Greg Woodhouse|Greg Woodhouse]] 20:38, 4 July 2007 (CDT)
::To put is further: A is the free energy os a system, F the free enthalpy, and G the Gibbs energy (but that is solely used in thermodynamics). What struck me as odd is the unit, bit is no unit I know of, only a quantity of information and for that reason energy/entropy. And as far as I teach statistics I use S as entropy but then I use it in statistical chemistry/pfysics - and I prefer consistency in units and symbols. [[User:Robert Tito|Robert Tito]]&nbsp;|&nbsp;<span style="background:grey">&nbsp;<font color="yellow"><b>[[User talk:Robert Tito|Talk]]</b></font>&nbsp;</span> 21:00, 4 July 2007 (CDT)
::To put is further: A is the free energy os a system, F the free enthalpy, and G the Gibbs energy (but that is solely used in thermodynamics). What struck me as odd is the unit, bit is no unit I know of, only a quantity of information and for that reason energy/entropy. And as far as I teach statistics I use S as entropy but then I use it in statistical chemistry/pfysics - and I prefer consistency in units and symbols. [[User:Robert Tito|Robert Tito]]&nbsp;|&nbsp;<span style="background:grey">&nbsp;<font color="yellow"><b>[[User talk:Robert Tito|Talk]]</b></font>&nbsp;</span> 21:00, 4 July 2007 (CDT)
:Well, entropy is used in coding theory as a measure of information content. A channel is defined as a potentially infinite sequence of symbols from a fixed alphabet. We assume there is a probability distribution telling us how likely each symbol is to occur. The entropy is then (unsurprisingly):
::<math>H = - \sum_{i=1}^n p_i \log p_i</math>
:where the logarithm is taken to a base of 2. (If the logarithms are taken to base ''e'', the units of entropy are sometimes called "nats".) Intuitively, this is just a measure of how much the bit stream can be compressed due to redundancy. The logarithm to base 2 of ''1/p<sub>i</sub>'' is the number of bits that are needed to encode a symbol that occurs with probability ''p<sub>i</sub>''. People care about this, among other things, because it gives you a framework for calculating the data rate you can hope to get from a channel with a given bandwidth, assuming you use the best possible encoding algorithm (this is the famous Shannon limit). I completely sympathize with your dislike of the inconsistency here. [[User:Greg Woodhouse|Greg Woodhouse]] 21:34, 4 July 2007 (CDT)

Revision as of 21:34, 4 July 2007


Article Checklist for "Entropy of a probability distribution"
Workgroup category or categories Mathematics Workgroup, Computers Workgroup [Editors asked to check categories]
Article status Stub: no more than a few sentences
Underlinked article? Yes
Basic cleanup done? Yes
Checklist last edited by Greg Woodhouse 11:20, 27 June 2007 (CDT), Ragnar Schroder 10:58, 27 June 2007 (CDT)

To learn how to fill out this checklist, please see CZ:The Article Checklist.





Fixed by me! --Robert W King 11:04, 27 June 2007 (CDT)

question

H is the science symbol for entthalpy, S the symbol for entropy. I think this H should in reality be an S, as that is the way it is in my books for prob. distribs. Unless it changed within the last year I think it still remains the S. I can even understand using A, F or G but not H, according to the universal definition of entropy it is unitless. Robert Tito |  Talk  19:58, 4 July 2007 (CDT)

Interesting. I hadn't thought of that. Traditionally, H is used for the entropy function in coding theory. I'm not sure about the history behind that, but now that I think back to my physics classes, you are right about S being entropy. Greg Woodhouse 20:38, 4 July 2007 (CDT)
To put is further: A is the free energy os a system, F the free enthalpy, and G the Gibbs energy (but that is solely used in thermodynamics). What struck me as odd is the unit, bit is no unit I know of, only a quantity of information and for that reason energy/entropy. And as far as I teach statistics I use S as entropy but then I use it in statistical chemistry/pfysics - and I prefer consistency in units and symbols. Robert Tito |  Talk  21:00, 4 July 2007 (CDT)
Well, entropy is used in coding theory as a measure of information content. A channel is defined as a potentially infinite sequence of symbols from a fixed alphabet. We assume there is a probability distribution telling us how likely each symbol is to occur. The entropy is then (unsurprisingly):


where the logarithm is taken to a base of 2. (If the logarithms are taken to base e, the units of entropy are sometimes called "nats".) Intuitively, this is just a measure of how much the bit stream can be compressed due to redundancy. The logarithm to base 2 of 1/pi is the number of bits that are needed to encode a symbol that occurs with probability pi. People care about this, among other things, because it gives you a framework for calculating the data rate you can hope to get from a channel with a given bandwidth, assuming you use the best possible encoding algorithm (this is the famous Shannon limit). I completely sympathize with your dislike of the inconsistency here. Greg Woodhouse 21:34, 4 July 2007 (CDT)