In 1939, when Shannon had been working on his equations for some time, he happened to visit the mathematician John von Neumann. During their discussions, regarding what Shannon should call the "measure of uncertainty" or attenuation in phone-line signals with reference to his new information theory, according to one source:[10]
> My greatest concern was what to call it. I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea: ‘You should call it entropy, for two reasons: In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.
I enjoyed reading "A mind at Play", Soni and Goodman. There are several of these kind of stories with colleagues. I also like how the book goes into his childhood, including using electrified fence to communicate with his farm neighbors!
The "Twenty Questions" explanation of Shannon entropy is definitely the most intuitive one I've read so far. I've heard it repeated it many times, but I do see yours is from 2006! Thanks for this.
arXiv submissions are not reviewed in detail (i.e., not reviewed to the same depth as in a journal's peer review procedure), but there is a moderation process: https://arxiv.org/help/moderation
In 1991 the electronic e-print archive, now known as arXiv.org, was
founded at Los Alamos National Laboratories. In the early days of the
World Wide Web it was open to submissions from all scientific
researchers, but gradually a policy of moderation was employed to
block articles that the administrators considered unsuitable. In 2004
this was replaced by a system of endorsements to reduce the workload
and place responsibility of moderation on the endorsers. The stated
intention was to permit anybody from the scientific community to
continue contributing. However many of us who had successfully
submitted e-prints before then found that we were no longer able to.
Even those with doctorates in physics and long histories of
publication in scientific journals can no longer contribute to the
arXiv unless they can find an endorser in a suitable research
institution.
The policies of the administrators of Cornell University who now
control the arXiv are so strict that even when someone succeeds in
finding an endorser their e-print may still be rejected or moved to
the "physics" category of the arXiv where it is likely to get less
attention. Those who endorse articles that Cornell find unsuitable are
under threat of losing their right to endorse or even their own
ability to submit e-prints. Given the harm this might cause to their
careers it is no surprise that endorsers are very conservative when
considering articles from people they do not know. These policies are
defended on the arXiv's endorsement help page
A few of the cases where people have been blocked from submitting to
the arXiv have been detailed on the Archive Freedom website, but as
time has gone by it has become clear that Cornell has no plans to bow
to pressure and change their policies. Some of us now feel that the
time has come to start an alternative archive which will be open to
the whole scientific community. That is why viXra has been created.
viXra will be open to anybody for both reading and submitting
articles. We will not prevent anybody from submitting and will only
reject articles in extreme cases of abuse, e.g. where the work may be
vulgar, libellous, plagiaristic or dangerously misleading.
It is inevitable that viXra will therefore contain e-prints that many
scientists will consider clearly wrong and unscientific. However, it
will also be a repository for new ideas that the scientific
establishment is not currently willing to consider. Other perfectly
conventional e-prints will be found here simply because the authors
were not able to find a suitable endorser for the arXiv or because
they prefer a more open system. It is our belief that anybody who
considers themselves to have done scientific work should have the
right to place it in an archive in order to communicate the idea to a
wide public. They should also be allowed to stake their claim of
priority in case the idea is recognised as important in the future.
Many scientists argue that if arXiv.org had such an open policy then
it would be filled with unscientific papers that waste people's time.
There are problems with that argument. Firstly there are already a
high number of submissions that do get into the archive which many
people consider to be rubbish, but they don't agree on which ones they
are. If you removed them all, the arXiv would be left with only safe
papers of very limited interest. Instead of complaining about the
papers they don't like, researchers need to find other ways of
selecting the papers of interest to them. arXiv.org could help by
providing technology to help people filter the article lists they
browse.
It is also often said that the arXiv.org exclusion policies do not
matter because if an independent (or amateur) scientist were to make a
great discovery, it would certainly be noticed and recognised. Here
are three reasons why this argument is wrong and unhelpful. Firstly,
many independent scientists are just trying to do ordinary science.
They do not have to make the next great paradigm shift in science
before their work can be useful. Secondly, the best new ideas do not
follow from conventional research and it may take several years before
their importance can be appreciated. If such a discovery cannot be put
in a permanent archive it will be overlooked to the detriment of both
the author and the scientific community. Thirdly, it is not just
independent or amateur scientists that are having problems getting
access to repositories and the recognition they deserve.
Another argument is that anybody can submit their work to a journal
where it will get an impartial review. The truth is that most journals
are now more concerned with the commercial value of their impact
factor than with the advance of science. Papers submitted by anyone
without a good affiliation to a research institution find it very
difficult to publish. Their work is often returned with an unhelpful
note saying that it will not be passed on for review because it does
not meet the criteria of the journal.
The visual design of viXra.org (but not its content) is a parody of
arXiv.org to highlight Cornell University's unacceptable censorship
policy. Vixra is also an experiment to see what kind of scientific
work is being excluded by the arXiv. But most of all it is a serious
and permanent e-print archive for scientific work. Unlike arXiv.org it
is truly open to scientists from all walks of life. You can support
this project by submitting your articles.
____
I have several major problems with this explanation (often endemic in the discussion more generally):
1. Shannon's original paper relates "the capacity [of a channel] to transmit information" as well as the potential of a system (e.g. the English language) to generate information. In other words, your pipe needs to be able to accommodate the potential volume coming from the source ("the entropy of the source determines the channel capacity"). The amount of information in a message or the amount of surprise, as the author has it, should not be confused with the amount of potential information.
2. Instead of "system" and "channel" the author uses the word "variable" which I find misleading. Channel (like a telegraph cable) and system (the English language) are specifically relevant to Shannon's discussion.
3. The discussion of "surprise," as the author has it, is misleading. Shannon is writing his paper in conversation with Hartley and Nyquist---all three specifically attempting to bracket out the subjective psychological factors such as surprise, in order to describe the capacity for information transmission in terms of quantitative measures, based on "physical considerations alone" (Hartley). Surprise reintroduces a subjective, relative, psychological understanding of information the original authors wanted to avoid.
With all due respect, you seem to be saying that this 'introductory text' is insufficiently sophisticated, and proceed to dump a high density rebuttal in stilted, academic style.
You may be extra smart in the sense of understanding the topic in a deep way, but it sure does seem foolish and/or myopic to posture this way over a basic introduction that cites the original paper in the first sentence.
(1) A "Random Variable" is a term of art in probability theory and has an entropy.
(2) Your distinction between "capacity" and "potential" is pointless: you have to budget for the expected information you have to transmit.
(3) Your silly games with words would apply to any technical discussion as they all use metaphors. You sound like someone who has never done any technical work at all.
1. You are correct, there are two different perspectives: 1. designer of channel (what most of Shannon work is about) and it deals with what you called 'potential information' and 2. message receiver/sender perspective.
2. True
3. Very true, using psychological terms like 'surprise' or in general psychological terms in strict theories is very often misleading but tempting becasue any reader can always contribute something (like own interpretation) to such theories so later he is personly/emotionaly bounded to it more.
Compression is a good entry to Shannon entropy. What was also eye-opening for me was that the metric was
motivated by fulfilling a specification which matches an intuitive notion of information [1], much like how the Kolmogorov axioms were characterized to capture the intuitive notion of probability, or how the Church-Turing thesis defines computation.
If someone is interested in this, I highly recommend J R Pierce's Symbols, Signals, and Noise. The book puts Shannon's work in perspective and gives extremely useful context so that one can better appreciate its value. One can also clearly understand, for example, why the original title of Shannon's seminal work was "The Mathematical Theory of Communication" instead of Information.
To store the result of a coin toss requires 1 bit of information, I can either give you a 0 or a 1. But implicit in me communicating that to you is that it is 1 'out of' something, namely 1 'out of' 2.
To store a trit, ie a 0, 1 or 2, requires 2 bits. You will recieve either a 00, 01, or 10. You will never recieve a 11, because that is not a valid trit.
Huffman compression reduces the ammount of bits needed to send a block of data, by prematurely terminating the data communicated once a specific sequence has passed. So the huffman compression for a trit value of 0, would not be '00', it would be '0'. We pass fewer bits, because some sequences carry an implication that this is the end of the sequence. The reciever has to know which sequences indicate a termination.
A trit translates to 2 bits, with a wastage of .5 bits. The concept of 'compression' only makes sense when person A throws a sequence of bits at you, and when they stop, you have to interpret what they just said in order to work out what it meant. In truth, the extra bits may not have been sent, but they are implied.
When I write '1' on a piece of paper and slide it across the table to you, I am cheating, I am being ambiguous. 1 out of what? shold be the correct response. A bit is more accurately communicated as 1 out of 2. A 'double' would be 1 out of a million ish. You can only store numbers in a binary computer, out of something. A 1 stored in a bit field is a different quantity than a 1 out of a million ish. In lazy human parlance a '1' represents 1 out of infinity, but this is not technologically possible to store. No computer hard drive can store enough digits to contain the information that 1 out of infinity represents.
If I toss a coin and tell you it is a 1 out of 2, this is valid. But if I have cheated, and tossed a double headed coin then from my perspective the '1' aka 'heads' is 100% predictable, and therefore, for me, does not contain any information. For you, ignorant of my cheating coin, the data I communicate to you is 1 out of 2. What is the entropy of that information? Is it 1 bit, as you would believe, or 0 bits, as I would believe? The answer is that ALL MEASUREMENTS ARE NOT PROPERTIES OF OBJECTS, BUT OF RELATIONSHIPS BETWEEN A MEASURING SYSTEM AND AN OBJECT. The entropy of that coin toss does not live inside the data communicated, it lives inside my head (where the entropy is 0 bits) and also in your head (where the answer is 1 bit). The coin toss measurement of '1' does not contain either 0 or 1 bits because a piece of data in isolation does not have any meaning. Only in my head or yours can it's meaning be measured.
Is there a distinction between the entropy of a message and the impact of a message on the entropy of the system? Where inputs tend to increase system entropy?
They are related by the Boltzmann equation. The intuitive interpretation is this: entropy is the amount of information required to determine a particular microstate from the associated macrostate. (This is how you solve Maxwell's demon)
(Note that I don't really know what I'm talking about, I'm just parroting what I remember from my information theory course)
I thought it was related to the number of micro states that are equivalent to a given macro state. So, as the number of particles increase, the temperature increases or the volume increases, there are more possible micro states. That’s why very cold things or very ordered things (crystals) have low entropy, because you can’t readily swap out microstates.
It’s also why a pile of legos is high entropy; you could easily swap legos around and the pile would be indistinguishable. Whereas you can’t do that with a built lego.
It's correct, but that point of view is not in contradiction with what I said.
You may imagine a macrostate as a state representing incomplete information. In order to uniquely determine the microstate, you need additional information proportional to the logarithm of the number of possible microstates.
Yes, that’s while I don’t frame as disorder but rather explain order in terms of the micro state-macro state relation. While a built lego set and a crystal are well ordered and have low entropy, order and entropy aren’t the same thing.
The thing about Shannon entropy is that it depends upon alphabets and symbol systems. I want to understand how it might be used to describe presymbolic computational systems.
What is a presymbolic computational system? Shannon entropy deals with alphabets in much the same capacity as Turing machines read symbols on a tape. Fundamentally it is about differentiating between states. I don't see how you can get a meaningful definition of information before starting with the notion of discrete states. Even fuzzy logic needs a semantic of states.
Presymbolic computation appears to me to be an invented term. Any theoretical or actual system can be framed in computational terms when analysed, but the properties provided through the use of symbols will still exist in a system, whether or not that analysis has been performed. The paper you cite appears to me to lean in the direction of cybernetics and control theory, that would naturally be able to translate into terms aligned to information theory. The same rules will apply to any physical system, no matter how complicated you believe it to be.
in any case, there seems to be a difference in describing a system with symbols and computational systems that use symbols for information processing. Some information processing seems possible in systems that don’t use symbols.
Where and when information exists so do symbols, in the abstract sense. It’s not symbols as representation of meaning but symbols as mediating meaning.
That’s a strong claim that is not common. Humans are generally considered to be the only animals capable of symbolic thought [1]. Information flows in far simpler systems than symbolic systems.
No, the interesting thing about Shannon entropy is that it is totally independent of how information is represented (as symbols, alphabets, numbers, structured objects, whatever).
Shannon entropy works for anything you can assign an alphabet of symbols to. It doesn't need the thing itself to be symbolic, just the description of the thing.
Maybe I’m not reading it properly, but in part 5, where he deals with transmission of continuous signals he doesn’t use his entropy formulation. He uses a distance metric, showing the difference between the sent signal and the received.
> My greatest concern was what to call it. I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea: ‘You should call it entropy, for two reasons: In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.
[10]: M. Tribus, E.C. McIrvine, "Energy and information", Scientific American, 224 https://en.wikipedia.org/wiki/History_of_entropy