I also think that a lot of the waste can be done away with by using application specific codecs. Yes, even gzip compresses logs and metrics by a lot, but one can go further with specialized codecs to hone in on the redundancy much quicker (than what a generic lossless compressor eventually would).
However to build these one can't have a "throw it over the 3rd party wall" mode of development.
One way to do this for stable services would be to build hi-fidelity (mathematical/statistical) models for the logs and metrics, then serialize what is non-redundant. This applies particularly well for numeric data where gzip does not do as well. What we need is the analogue of jpeg for the log type.
At my workplace there has been political buy in of the idea that if a long / metric stream has not been used in 2~3 years, then throw it away and stop collecting. This rubs me the wrong way because so many times I have wished there was some historic data for my data-science project. You never know what data you might need in the future. You, however, do know that you do not need redundant data.
Quite an interesting phenomena though, how affiliations color some unarguable facts. Many clearly believe that ICE agents are doing the right thing, they got what they voted for.
In India we have been going through this the last 14 years or so.
Look up Stanswamy [0], an octagenarian jailed on the basis of trumped up charges and planted evidence (most likely with the help of Israeli companies). Journalists held in jail for five years without any charges pressed. Same fate for those who criticize the government too vocally.
Now pretty much all of the press is but a government press release with a few holding out here and there.
There's a massive reduction in the whale song of the blue whales. Almost halved. They are presumably starving.
That something ginormous can be so elegant, beautiful and sleek is hard to conceive till one meets a blue whale. Let's let them thrive on the blue planet.
The Blue Whale population has actually increased since the 70s. When they were critically endangered, their population numbered roughly 1,000-2,000 but population estimates for today put the number at roughly tenfold that. The 1966 worldwide moratorium on whaling has been incredibly successful and we’ve also seen recoveries in Humpback and Grey Whales.
I also think that a lot of the waste can be done away with by using application specific codecs. Yes, even gzip compresses logs and metrics by a lot, but one can go further with specialized codecs to hone in on the redundancy much quicker (than what a generic lossless compressor eventually would).
However to build these one can't have a "throw it over the 3rd party wall" mode of development.
One way to do this for stable services would be to build hi-fidelity (mathematical/statistical) models for the logs and metrics, then serialize what is non-redundant. This applies particularly well for numeric data where gzip does not do as well. What we need is the analogue of jpeg for the log type.
At my workplace there has been political buy in of the idea that if a long / metric stream has not been used in 2~3 years, then throw it away and stop collecting. This rubs me the wrong way because so many times I have wished there was some historic data for my data-science project. You never know what data you might need in the future. You, however, do know that you do not need redundant data.
reply