Looking at it quickly, it seems like you just moving entropy to the frequency table.
If the frequency table isn't included in the count, I could just make an "infinite" compressor by storing the frequency and cut one byte off the end. From the frequency table I could then deduce what the last byte should be.
> typical entropy encoders (Huffman, ANS, etc) would require 1 bit per symbol,
No. ANS (and range/arithmetic coding) allows for probabilities to be stored with fractional bits.
I was using what I seemed to be the standard with other encoders, storing the ratios is roughly the same cost compared to other compressors, a few extra bits for exact counts. But I do see how this isn't i.i.d. so I'm fixing that.
I don't think you understand the basic concepts and you are wasting your time trying to "fix" it. Let me try to explain a bit differently: The Shannon limit depends on assuming a model of the source (the thing producing symbols). For example imagine the output of something that you think is "true" noise like a fair coin flip that produces 0 (for heads) and 1 (for tails). That has entropy of 1 bit and you cannot compress it. Now you learn that it is actually a pseudorandom number generator that produces 0 or 1 in a 1:1 ratio. If you know the program (and seed if required) the source now has 0 entropy (you can run the program to know every bit that will be produced). Same symbol outputs, different models of how they are produced, different theoretical compression limits. As another example if you are compressing text one model would be to assume that words are produced independently of one another. We know that that isn't true (certain words are more likely to follow other words like your next word prediction system on your phone works) but you might still choose to implement a compression system based on that naive model because it's less complex, uses less memory or compression/decompression time etc. If you implemented compression using a less naive model (e.g. what is the probability of the next symbol in all human text given all the symbols I have seen before this one) you could get better compression rates but you wouldn't be breaking the Shannon limit that you calculated using the naive model because the "true" limit is based on the "true" model of whatever the entropy of human produced text is, which is not really calculable (people produce new sentences every day) but you might approximate using something like all human text you can find.
As I alluded to above note that Shannon entropy doesn't take into consideration practical complexities (memory, program size, processing time) that are real considerations when implementing a compression scheme. Most of the time you are going to trade higher entropy for simpler, faster, less resource intensive compression.
To sum up: You cannot beat the Shannon limit of the "true" model of the source but you can beat the Shannon limit of a naive approximation of the source. Those naive models are used because they are more practical to implement.
I appreciate the input. I did not mean to imply I could encode a random stream of symbols/characters, that is absolutely valid. I was approaching this as a compression technique for something like a text file, where the symbol counts are known at compression time.
First issue is that if you do comparisons you have to do them apples-to-apples, you can't assume that method A has that information available for free but method B does not or that method A assumes a given source model but method B assumes a different source model.
Second, I really don't understand how you intend to use a table of symbol counts: If you do it over the entire file the table might be a reasonable size but the number of permutations becomes infeasible. Conversely if you do it in small windows (like 8 or so in your examples) you have to store a separate symbol count table for each window which would explode the symbol count table. I really doubt you are gaining anything from doing this. You are going to create an enormous per-file symbol frequency table and then not count it against the compressed size, that isn't compression it's just misdirection.
When I get time may compress the table as well then, in looking at example arithmetic encoders they seem to have some flat table implementations as well (https://github.com/nayuki/Reference-arithmetic-coding/blob/m...). I don't see how that's misdirection, I left the table uncompressed specifically for transparency reasons. Was just trying to keep that part simple.
If the frequency table isn't included in the count, I could just make an "infinite" compressor by storing the frequency and cut one byte off the end. From the frequency table I could then deduce what the last byte should be.
> typical entropy encoders (Huffman, ANS, etc) would require 1 bit per symbol,
No. ANS (and range/arithmetic coding) allows for probabilities to be stored with fractional bits.