Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> MapReduce is a great example of that- Google certainly didn't invent the concepts of Map or Reduce, or even the idea of using those for doing high throughput computing (and the shuffle phase of MapReduce is more "interesting" from a high performance computing perspective than mapping or reducing anyway).

The “Map” in MapReduce does not originally stand for the map operation, it comes from the concept of “a map” (or, I guess, a multimap). MapReduce descends from “the ripper”, an older system that mostly did per-element processing, but wasn't very robust or flexible. I believe the map operation was called “Filter()” at the time, and reduce also was called something else. Eventually things were cleaned up and renamed into Map() and Reduce() (and much more complexity was added, such as combiners), in a sort of backnaming.

It may be tangential, but it's not like the MapReduce authors started with “aha, we can use functional programming here”; it's more like the concept fell out. The fundamental contribution of MapReduce is not to invent lambda calculus, but to show that with enough violence (and you should know there was a lot of violence in there!), you can actually make a robust distributed system that appears simple to the users.



I believe Map in MapReduce stood for "map" function, not a multimap- I've never heard or read otherwise (maps can operate over lists of items, not just map types). That's consistent both with the original mapreduce paper: """Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages. We realized that most of our computations involved applying a map operation to each logical “record” in our input in order to compute a set of intermediate key/value pairs, and then applying a reduce operation to all the values that shared the same key, in order to combine the derived data appropriately"""

and with the internal usage of the program (I only started in 2008, but spoke to Jeff extensively about the history of MR as part of Google's early infra) where the map function can be fed with recordio (list containers) or sstable (map containers).

As for the ripper, if you have any links to that (rather than internal google lore), I'd love to hear about it. Jeff described the early infrastructure as being very brittle.


> and with the internal usage of the program (I only started in 2008, but spoke to Jeff extensively about the history of MR as part of Google's early infra) where the map function can be fed with recordio (list containers) or sstable (map containers).

I worked on the MapReduce team for a while (coincidentally, around 2008), together with Marián Dvorský, who wrote up a great little history of this. I don't think it was ever made public, though.

> As for the ripper, if you have any links to that (rather than internal google lore), I'd love to hear about it. Jeff described the early infrastructure as being very brittle.

I believe it's all internal, unfortunately.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: