Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

More succinct version of the same from Gary Bernhardt of WAT fame (from 2015, same era) https://twitter.com/garybernhardt/status/600783770925420546

> Consulting service: you bring your big data problems to me, I say "your data set fits in RAM", you pay me $10,000 for saving you $500,000.



Actually the awk solution in the blog post doesn’t load the entire dataset into memory. It is not limited by RAM. Even if you make the input 100x larger, mawk will still be hundreds of times faster than Hadoop. An important lesson here is streaming. In our field, we often process >100GB data in <1GB memory this way.


This. For many analytical use cases the whole dataset doesn't have to fit into memory.

Still: of course worthwhile to point out how oversized a compute cluster approach is when the whole dataset would actually fit into memory of a single machine.


Reminds me of one of my favourite twitter posts:

> Small Data is when is fit in RAM. Big Data is when is crash because is not fit in RAM.

https://twitter.com/DEVOPS_BORAT/status/299176203691098112


DEVOPS_BORAT contains a lot of truth if you think about it, hah.

Our sarcastic team-motto is very much this: https://twitter.com/DEVOPS_BORAT/status/41587168870797312




64TB max:

IBM Power System E980 https://www.ibm.com/downloads/cas/VX0AM0EP (notably the E880 did 32TB in 2014)

SGI UV 300 https://www.uvhpc.com/sgi-uv-300

SGI UV 3000 https://www.uvhpc.com/sgi-uv-3000




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: