Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Another thing I’ve encountered with tree/structured diffs is a concept of identity. diff([{id:1,name:foo}],[{id:2,name:foo}] should show object w/ id:1 removed and id:2 added, not id changed from 1 to 2. Tough because then your diffing algo needs to be aware of the object structure (imo using convention and saying “no objects can contain this key” is pretty tough when you accept any user generated data).


tho i would say that a diff has to define the set of operations allowed to be done to the thing being diff'ed.

E.g., in the example scenario of the diff in json objects, if a possible operation is a change in a property value (such as the "id" field), then the diff correctly deduced the smallest change possible is indeed a change in the field.

However, if you can define the set of operation to only be a change in an entire object (and no changing of id field), then surely, you can create a diff that produces the desired object structure change. It would be a custom diff algorithm of course...but it'd be quite a useful one tbh.


I think his point was that different fields should be treated differently. I.e. if you have two objects with the same ID but different descriptions then you can assume that it's the same object but with a changed description; but if you have two objects with different IDs but the same description then you should assume that the new object is completely different and the identical description is coincidental.

I don't agree that these are always the correct interpretations though. IDs could be reused (especially in a DVCS) or mistaken IDs could be corrected. This ambiguity is a fundamental limitation of the entire concept of diffing, that is reconstructing a set of operations to go from one state to another - you simply don't have the information to deduce the correct logical steps in all cases.


I love this. I think you could simplify it by generalizing. Something like immutability. These keys can’t be changed, only an object destroyed and another created. A case of that is a primary key (maybe that’s the only case).

You can always represent a change as a removal and an addition. It’s smart to actually consider when should you. “Never” and “whenever possible” don’t seem like the best answers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: