Hacker Newsnew | past | comments | ask | show | jobs | submit | texthompson's commentslogin

Why would you PUT an object, then download it again to a central server in the first place? If a service is accepting an upload of the bytes, it is already doing a pass over all the bytes anyway. It doesn't seem like a ton of overhead to calculate SHA256 in the 4092-byte chunks as the upload progresses. I suspect that sort of calculation would happen anyways.


You're right, and in fact S3 does this with the `ETag:` header… in the simple case.

S3 also supports more complicated cases where the entire object may not be visible to any single component while it is being written, and in those cases, `ETag:` works differently.

> * Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-S3 or plaintext, have ETags that are an MD5 digest of their object data.

> * Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-C or SSE-KMS, have ETags that are not an MD5 digest of their object data.

> * If an object is created by either the Multipart Upload or Part Copy operation, the ETag is not an MD5 digest, regardless of the method of encryption. If an object is larger than 16 MB, the AWS Management Console will upload or copy that object as a Multipart Upload, and therefore the ETag will not be an MD5 digest.

https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.h...


S3 supports multipart uploads which don’t necessarily send all the parts to the same server.


Why does it matter where the bytes are stored at rest? Isn't everything you need for SHA-256 just the results of the SHA-256 algorithm on every 4096-byte block? I think you could just calculate that as the data is streamed in.


The data is not necessarily "streamed" in! That's a significant design feature to allow parallel uploads of a single object using many parts ("blocks"). See: https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMu...


> Isn't everything you need for SHA-256 just the results of the SHA-256 algorithm on every 4096-byte block?

No, you need the hash of the previous block before you can start processing the next block.


That's interesting. Would you want it to be something like a bucket setting, like "any time an object is uploaded, don't let an object write complete unless S3 verifies that a pre-defined hash function (like SHA256) is called to verify that the object's name matches the object's contents?"


You can already put with a sha256 hash. If it fails it just returns an error.


> Someone probably told me that every cell in my body has the same DNA. But no one shook me by the shoulders, saying how crazy that was.

This is only mostly right. Every cell in your body has an astonishingly similar amount of DNA, but every cell division (and even steady state DNA repair) offers the opportunity for mutations. So your cells are all astonishingly similar, but there can be detectable differences.

One implication of this is that cells that are closer to each other in developmental history will have more similar DNA. One of my colleagues in graduate school used this to do phylogenetic lineaging, where he looked at markers in DNA from whole organisms to reason about which cells are closely related, and which cells have a more distant developmental ancestor.

Biology is super cool! I hope that everyone finds a little bit of it that they can enjoy. :)


Also B cells and T cells undergo genomic rearrangements to get a diversity of receptors that can make specific targets.


If you didn't know, that 80% number is probably the result of Little's Law. That's the result where if your demand is generated by a Poisson process, and your service has a queue, 80% utilization of the service is where the probability of an infinite queue starts to get really high. People

Here's a nice blog post about the subject:

https://www.johndcook.com/blog/2009/01/30/server-utilization...


This law does not apply to queueing as encountered in routers. It assumes unbounded queues and a poisson arrival process (i.e. a memoryless channel); both assumptions don't hold for packet routers and senders using congestion control (TCP or otherwise).

There is, however, a high chance of encountering buffer bloat if countermeasures are not taken at the chokepoint: https://en.wikipedia.org/wiki/Bufferbloat

Modern cable modems, for example, are required to implement such countermeasures. My ISP is at over 90% capacity and round trip times are still mostly reasonable. (Bandwidth is atrocious, of course.)


How do you monitor this? The 90% over capacity, would like to see where mine is at


There might be a way using a cable TV receiver (see my other comment on this thread), but in my case, a sales rep of my ISP just told me on the phone.


I have an older modem (DCM476) and it definitely doesn't have this or doesn't have it enabled. I have to use/tune queue management myself on the router side.


Yes, it's mandatory only as of DOCSIS 3.1, and yours seems to be 3.0. (Supposedly it has been "backported" to 3.0, but that obviously would not apply to existing devices certified before that amendment to the spec.)


To add:

If you have more control over or knowledge of your load, you can safely go higher than 80%.

Eg when I was working at Google we carefully tagged our RPC calls by how 'sheddable' they were. More sheddable load gets dropped first. Or, from the opposite perspective: when important load is safely under 100%, which it is almost all the time in a well-designed system, we can also handle more optional, more sheddable load.

As a further aside, parts of the financial system work on similar principles:

If you have a flow of income over time, like from a portfolio of offices you are renting out, you can take the first 80% of dollars that come in on average every month and sell that very steady stream of income off for a high price.

The rest of income is much choppier. Sometimes you fail to rent everything. Sometimes occupants fall behind on rent. Sometimes a building burns down.

So you sell the rest off as cheaper equity. It's more risky, but also has more upside potential.

The more stable and diversified your business, the bigger proportion you can sell off as expensive fixed income.


I've noticed that above 70-80% it gets pretty hard to insure that interrupt timing can be met and balanced with low priority main looping in a lot of my bare metal embedded projects.


One of the funniest stories I've ever heard was about how a junior developer asked a more senior developer a creatively terrifying question:

How do I install half of an RPM?


> half of an RPM

Well, I'm stumped... I wonder if he just needed one tool out of suite?


Maybe trying to workaround a dependency issue. Not sure what's in rpms, but in deb the manifest may require libsomething version exactly 1.1, will fail if something else requires libsomething >= 1.2.


1 revolution per 30 seconds maybe?


That would be 2 RPMs. Still easy to install!


From what I can tell, with cpio :-D


I think of myself as a very absent-minded person. I also can't imagine being in my car without knowing my daughter is in my car. After reading this, I'll probably check the back seat for her even when I'm very confident she's not there.

I don't mean to be critical. I've had a lot of advantages in life, and reading about this breaks my heart. I wish I could understand more about how I could help. None of these parents or kids deserve this situation, even if the parents could have done better.


People in Flint, Michigan still don't have clean water. Lots of other folks in the world also don't have clean water. Doesn't that seem like an easier thing to fix than trying out experimental therapies?


This might particularly interest you: http://www.lettersofnote.com/2012/08/why-explore-space.html

The gist of it is, in 1970 someone asked a director of NASA why billions of dollars were being spent on exploring space when millions of children were still dying on Earth. The response in part explained that NASA's R&D was paving the way for satellites with better weather forecasting, better communications, and better equipment that was making its way into people's everyday lives. While a bit morbid to say, the advancements made by NASA have arguably saved many more lives in the long run.

It's hard to see the point in investing in experimental technology whose payoff is unknown, especially when we have definite problems with feasible solutions. That doesn't mean we shouldn't try. The possible payoffs - cancer cures, age prolongation, enhanced food production, disease and sickness prevention - that can come from investing in gene therapies are just too great to ignore.

As a side-thought, the technological singularity is thought of as the point at which we create an AI smarter than us, triggering a run-away effect of self-improvement. What if we end up doing it to our own race first through intelligence-improving gene modification? Can you imagine the implications of applying that intelligence to solving the rest of our problems?


And if you're interested in helping people in places that don't have access to these things that salt ionization is almost certainly the thing to work on first.

http://www.givewell.org/international/technical/programs/sal...

As to Flint that does seem to be finally over now, though goodness knows it went on for a distressingly long time.

https://en.wikipedia.org/wiki/Flint_water_crisis#2017


doesn't doing both seem more effective


I hope that you feel better soon.


Daycare.


It turns out that the people who need the most help might have a hard time telling doctors the truth about their symptoms for a variety of reasons. I think you're right that this would be better, but I think it would be really sad to avoid helping people just because they had problems with self-awareness and communication.

I might be wrong though. :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: