Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Which is the best method for deep cloning in JavaScript? (medium.com/tiagobertolo)
36 points by tiagobertolo on Feb 11, 2023 | hide | past | favorite | 54 comments


I only clone data using JSON.parse(JSON.stringify(someThing)), and thus avoid all the mess that comes with trying to clone anything else than primitives types.


This fails, among other scenarios, if `someThing` includes a Date object.

  >>> const z = { d: new Date() }
  undefined
  >>> typeof z.d
  "object"
  >>> const a = JSON.parse(JSON.stringify(z))
  undefined
  >>> typeof a.d
  "string"
  >>>
After the roundtrip, the property `d` is now a string, not a Date object.


Consider the Date object harmful, use ISO strings instead, and problem solved.


So how do you manipulate ISO dates as strings?


Although ISO date strings can be fairly trivial to read and manipulate with RexExp or a custom parser, depending on the complexity of the task, I would recommend using something like iso-fns.

https://iso-fns.org

It's too bad this library (and approach) never took off, but it's there to use nevertheless.

Even without iso-fns, it's not as if mid level developer can't figure out how to write a function to perform a specific operation on an ISO date string.

After over 10 years of web development, I do think this is the best known approach and is even better than storing UNIX time. Even if there were no existing libraries to help with manipulating ISO strings, I will gladly take the inconvenience in exchange for the rest of the advantages.

With ISO strings, you don't get any behavioral quirks from Date; they are totally compatible with JSON; you can store the time zone along with the time; there is nothing to stop you from changing the time zone while leaving all the other values intact; ISO strings also support durations.

Basically, they support most of what a software developer will be expected to do with times and dates but without any behavior or assumptions about the client time zone that introduce bugs like with Date. `<input type="datetime-local">` also uses ISO strings out of the box, which can be really convenient, and values from `<input type="date">` can be easily appended into an actual ISO string.


ISO datetimes are not monotonic. That’s already one good reason not to use them for storage.


All the data comes from a JSON API and thus doesn't contain Date objects to begin with.


Makes sense for the most part with these constraints then. I missed the "primitive types" portion in the original comment.

The following isn't applicable for pure JSON responses, but is applicable for primitive types:

An additional condition for bigint is necessary, which is a primitive type. Otherwise JSON.stringify throws:

  >>> const x = { i: BigInt("0x1fffffffffffff") }
  undefined
  >>> typeof x.i
  "bigint"
  >>> const b = JSON.parse(JSON.stringify(x))
  Exception: TypeError: JSON.stringify cannot serialize BigInt.
  >>>
... and for the symbol primitive, which JSON.stringifys to undefined, and thus can't be JSON.parsed.


Which is the first method they mention and just flat out tell to avoid it.

"cloneJSON is very slow and can’t do much. Please avoid it."

I mean, I also use it since forever.

But I might look into the winner cloneLib.

If it is really faster, it might be worth it, but I likely still won't trust it for anything complex.


Emphasis on _data_. I don't have a need to clone complex objects (what the article is trying to do). The data I need to clone are usually small so when it comes to performance it's negligible, not worth adding another dependency to the project.


I sure hope `someThing` is never `undefined`, for your sake.


That seems to be a logical thing to ensure has been validated before this code is run, so no need to complicate the simple example in a HN post?


> avoid all the mess that comes with trying to clone anything else than primitives types

is an odd way to suffix code that will throw a SyntaxError when handed a primitive type.


Any cloning happens on data retrieved from an API, which is always JSON, and cannot contain undefined by definition.


There's this really nifty concept called "guarding" that helps with recovering from that sort of thing.


Just chuck a check in front of it


Or NaN or Infinity.



For real.


The Chrome dev rel blog post [1] describes limitations of structuredClone:

* Prototypes: structuredClone discards the object’s prototype chain.

* Functions: structuredClone quietly discards functions.

* Non-cloneables: structuredClone throws for some common values, e.g. Node, Error.

[1] https://web.dev/structured-clone/


A lot of the problems described in this (very detailed, nice work!) comparison are not really problems so much as design decisions. Having a clone operation that copies non-enumerable properties is, depending on what you're doing, a bad thing. They're not enumerable for a reason.

Similarly, cloning things like getters means you're potentially copying over closures, which means your new cloned object now has references to its source and interacting with it may mutate the source. This is a pretty serious hazard that (IMO) justifies not cloning getters/setters, at least by default.

Cloning freeze/seal status would also lower the usability of your cloning API, because now if you wanted to make a non-frozen copy of a frozen object, you can't use the clone API.


Exactly this. It was a joy reading this article because it shows how deep the rabbit hole goes - but in 99% of cases I would still choose cloneJSON, because a) no external dependency, b) I (think I) know what it does and c) I don't care about anything that is not data and I don't care about BigInt and Symbol.

There is a reason none of the libraries tested got it "right" - nobody needs it. Or if they do, they just write their own implementation.


Right out of the gate they mention that the first approach listed doesn't clone Symbol...and then they treat this as a flaw.

I haven't used Symbol in JS much, but I was under the impression that it's _supposed_ to be a sorta "interned" value (in the sense that there exists exactly one instance in existence of each Symbol value)...which would mean that cloning them isn't a concept that makes sense.

My loose awareness of this particular JS type is leaving me quite un-confident in this understanding though, so I'd love for someone to correct me where I'm wrong so I can learn more!


I don't think it's a flaw given a symbol is unique so cloning it wouldn't be possible, and creating a new one would compound the problem by not only removing the symbol, but now adding a new one as well.

This article has a pretty good overview of what symbols are, usually a corner case, but good to know -

https://medium.com/intrinsic-blog/javascript-symbols-but-why...


I've been programming JavaScript for ~10 years now (wtf) and I've yet to encounter a case where I needed to "deep clone" an object. It has always been a code smell indicative of a need for some other structural refactoring.


It may be rare that you need to deep clone, but doing so can be a good idea nonetheless. Sometimes I want a function to not be able to mutate the original object that informs it, in which case it will receive a deep clone or return a brand new object. I find this can make complex code easier to comprehend because I can have greater confidence that return values are new and functions aren't causing side effects. Of course you can never guarantee this in JavaScript, but it's worth making a best attempt IMO.


I'm coming from a c++ and functional perspective but this seems very counter intuitive to me. Can you explain why they're bad? My assumption was it's easier to reason about objects the less they share.


If you're reaching for deep cloning, your objects are too big. You're passing too much data between functions. The one exception is "actual" data which is (de-)serializable to/from JSON without any special cases (getters, functions, etc). But if it's an object that was programmatically constructed, then it can probably be made smaller.

If your object is "deep" because of real nested dependencies, then you should ask why your user would need to mutate it in the first place. If they want to change its internal structure then this is indicative of a code smell because your object is too big and the user cannot accomplish their goal by calling functions exposed by your API.

There was a phase in the JS ecosystem a few years ago where everything absolutely had to be immutable. This was, IMO, a horrible side effect of react developers optimizing their props for diffing during reconciliation. Libraries like immer and redux produced a lot of this zealotry.

I understand the purism appeal of "immutable everything," and I won't say it has no application, but it has been dogmatically over-applied to the point that you have people googling how to deep clone their config object with a nested function assignment because they're scared of passing a mutable object, and oh my god what if my user mutates it?


But in React you never need to deep clone the object? The same principles apply to all the same breed of framework btw, vue/angular etc only updates what you r updating. You are supposed to make a new reference of the object but the fields of would be of the old references except the field you want to update. If you are deep cloning the whole thingy then the performance is actually worse because you r doing unnecessary updates. You could use spread operator to easily make a new object reference. Nowadays you have immerjs to do the grunt work for you


you'll be surprised a lot of webdevs don't know the difference between a value and a ref type


Cloning is fairly common when you're writing functional style code with immutable objects. Not "code smell".


But if you are doing more functional style, you would be copying the reference, NOT the object. Functional style makes you need LESS of deep copy. See immerjs

Also your object won't be that complicated in the first place, they would be treated as plain data object like struct or record. And you would not use those advanced features in objects(Not needed).


But when you’re using immutable data, deep cloning is not something you need for to worry about. So it kind of is an anti-pattern in this case.


Copy-on-write is a common operation on immutable data and deep cloning is how you do that in JavaScript in a safe manner.


Oh yeah, that’s true. I was thinking of RRB trees, hash array mapped tries and things like this.


Cloning is super useful and not a code smell at all in several scenarios.

My most common use is for delta detection and json-patch based APIs.

I keep the original entity and allow the user to modify a clone.

When it's time to sync, I compare it to the original to generate a patch set to send to the server with json-patch [1] semantics.

I highly recommend the excellent fast-json-patch library [2] that makes this process a breeze.

Actually, I use its deepClone method outside of json-patch scenarios, it's so well implemented.

[1] https://jsonpatch.com/

[2] https://www.npmjs.com/package/fast-json-patch


In the case where you clone an object so the user can modify it while being able to revert back to the original object.

How is that a smell? or how would you refactor it?


Agreed. And not just in JavaScript.

Yes, sometimes you will want a deep copy. These are fewer and further between than online reading about it would have you think.


Have you used React? You must copy every complex object or list everytime before you update the state.


s/React/Redux? (for example)

With setState in React, you don't necessarily have to copy to update the state. setState merges the partial object you provide with the current state.

https://reactjs.org/docs/state-and-lifecycle.html#state-upda...

In Redux reducers, however, you need a full copy.


I've avoided writing objects altogther. Never ran across a problem I needed them for.


cloning is extremelly common in js


This is some really useful info.

But I just want to point out that cloning isn’t an end in itself.

The article includes a lot of judgements but it’s all context-free. Whether a clone method is good or not depends on its suitability for a purpose. E.g., whether you want non-enumerable properties cloned (if you care at all) really depends on your specific uses. Not to mention performance is very often a concern and that isn’t covered here.


It may look like for simple use cases the JSON one works.

But one thing not included in this page is that there are things it just doesn't serialize, even when it's actual data:

  const native = { number: Number.POSITIVE_INFINITY };

  const cloned = JSON.parse(JSON.stringify(native));


  // unlike NaN, we can compare Number.POSITIVE_INFINITY to itself
  console.log(native.number === native.number);

  // but it doesn't serialize
  console.log(native.number === cloned.number);

  console.log(cloned);
Yields

  true
  false
  {number: null}
Oops.


This is because inf and nan were left out of the json spec, right? Great format


Right because the spec is not IEEE 754. JSON numbers are not floats. They’re numbers with as many decimal points as you want. It’s up to the serializer/deserializer to decide how to handle them.


The clear intention of the JSON spec is that JSON numbers should deserialise to doubles, and this is how every decoder that I've ever seen handles it.

The first decoder was the JS eval() function, so JSON was clearly intended to be a subset of JS. Ambiguity in the spec is not a licence to deviate from JS semantics, it just means that the spec is poorly written.


It was intentionally written to be very simple and achieved that goal. If your opinion is that that’s a poor choice, sure. But that’s an opinion, not a fact. Specifying a specific float format would cripple an interchange format and I think that would be a mistake. The intention here is to allow each origin and destination to decide how to fit the generic data into their representation formats.

The spec is not ambiguous. And a spec is a spec is a spec. You can implement it or not. But you can’t just decide “Enh… they didn’t mean it like that so it’s wrong to satisfy the spec. You should really satisfy it wrongly.”


  > In other languages like Java, each class is expected to implement its own clone method
This is incorrect. There is an `Object.clone()` in Java, but it's not implemented by default for most types, implementing it is fraught with complexity, and the standard book of Java advice strongly recommends avoiding it. If you indicate that your type supports cloning (using the marker interface `Cloneable`), you still only get a shallow copy by default.


Cloning functions seems like a bad idea even if it's "handled" by the cloning logic. An easy footgun: you've got a function on an object that references the object itself. You clone the whole object. The cloned object has its own copy of the function, but its closure still references the old object

Personally I just don't try to clone anything that isn't JSON-valid data, and I usually use the stringify/parse strategy. It's slow in a hot loop, but it's fast enough for the rare situations where I need to fully clone something. Most of the time, I can just use readonly types and avoid defensively cloning the whole tree. When I want to replace part of an immutable object, destructuring works fine


> This study does not contemplate shallow cloning or performance.

Hmm?

> cloneJSON is very slow and can’t do much. Please avoid it.

I always thought that javascript is fast at JSON parsing. Atleast the benchmark shows nice results: https://www.measurethat.net/Benchmarks/Show/18541/0/jsonstri...


As far as I know there is no way in JavaScript to copy a function with its closure, so fundamentally writing a perfect cloning algorithm in JavaScript is not possible.


> copy a function with its closure

By closure, do you mean the values of free variables at the time of the copy?

If so, the inability to do a perfect copy is applicable to regular functions at the top level, too. (For such functions, other global variables are free variables.)

Considering this, I think this is why copying a function is generally defined as just having a reference to the function that can later be used to invoke it. It does not include each copy of the function having its own copy of the function's free variables from the instant of the copy. Separate invocations of the multiple copies (i.e. references) of the function may affect the shared closure environment/free variables.


This is a great article but it left out the side of this that I was really interested in which is the compute performance of these different options.

I'll take a fast deep clone for my case over one that handles all cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: