> maps.SortedByKeys thats what you call overfitting kids

kbolino · on Aug 17, 2024

Whatever you think the best name for it is, it's still missing from the library.

jerf · on Aug 17, 2024

I think the problem with putting this into the standard library is that while Go may not be super focused on absolutely top-tier performance, it does generally try to avoid offering things that are unexpectedly slow or unexpectedly allocate large things. A sort-by-keys on the standard map would require allocating a full slice for the keys in the sort routine to do the sort, which would surprise people who expect build-in iterators to not immediately do that.

Plus it's in the class of things that's pretty easy to implement yourself now. There's always a huge supply of "but the library could just compose these two things for me". If you stick them all in things get bloated. You could literally have written it in the time it took to write the complaint. You got 80% of the way there as it is, I just tweaked your code a bit to turn it into an iterator: https://go.dev/play/p/agBGl_rT7XS

38 · on Aug 17, 2024

its completely pointless to make this an iterator, because you have to loop the entire map to do so, which kills any benefit of using iterators

randomdata · on Aug 18, 2024

Not completely pointless. It avoids the need to retain a full copy of all the values.

But a good demonstration of why this kind of thing isn't a good fit for the standard library.

kbolino · on Aug 18, 2024

And yet slices.Sorted was added which does exactly this already, but only for single-valued iterators.

randomdata · on Aug 18, 2024

> And yet slices.Sorted was added which does exactly this already

It does not. slices.Sorted accepts an iterator, but returns a slice.

Like the earlier comments point out, Go tries its best to give a reasonable idea of what kind of complexity is involved at the API level. slices.Sorted returning an iterator would mask that. By returning a slice, it makes clear that the entire collection of data needs to be first iterated over as the parent described.

kbolino · on Aug 18, 2024

This is a good point which likely explains why maps.Sorted doesn't exist (yet): what would it even return?

I think returning an iterator is acceptable, the docs could explain the expense of the operation, and the implementation could change in the future as needed. But that does hide some complexity.

If it ought to return a slice of entries, that opens up new problems. What is an entry? A two-member generic struct? Ok, fine, but then how do I ergonomically iterate over them, pulling out both members? There's no clear solution to that problem yet.

38 · on Aug 18, 2024

> I think returning an iterator is acceptable, the docs could explain the expense of the operation, and the implementation could change in the future as needed. But that does hide some complexity.

if a function returns an iterator, it should be iterating the input. thats impossible in this situation. you'd need to loop the entire map, then return an iterator that tricks the user into thinking they are getting better performance when they are getting the worst possible performance.

randomdata · on Aug 18, 2024

There is another, perhaps more important, reason: If you need sorted keys, the map is almost certainly the wrong data structure.

Sure, there may be some edge case situations, like where you are dealing with someone else's code where you don't have control over the structures you've been given, but:

1. The standard library doesn't appeal to edge cases.

2. The "noiser" solutions to deal with the edge case serve as a reminder that you aren't working in the optimal space.

kbolino · on Aug 18, 2024

This is a bridge that the standard library has already crossed, though. Off the top of my head, both encoding/json and text/template guarantee sorted iteration order of maps. I don't think it's an edge case at all.

Whether in particular cases, a properly ordered data structure (like a tree) should be used instead, is a valid question to ask, and thanks to the custom iterators, it'll now be more ergonomic to use. But if I usually use a particular map for its O(1) operations and only occasionally iterate over the whole thing, yet need consistent iteration order, then the built-in map still seems like the right choice, and having a standard way to iterate it is a reasonable request.

randomdata · on Aug 18, 2024

> both encoding/json and text/template guarantee sorted iteration order of maps. I don't think it's an edge case at all.

That is literally the edge case example I gave. Perhaps there is a better way to describe it than "edge case", but semantics is a silly game.

> then the built-in map still seems like the right choice, and having a standard way to iterate it is a reasonable request.

And, indeed, the standard library provides slices.Sorted(maps.Keys(m)) for exactly that. Ergonomic enough, while making the compromise being made reasonably explicit to help with readability – which is far more important than saving a few keystrokes. If typing is your bottleneck, practice will quickly solve that problem.

kbolino · on Aug 18, 2024

It's never really been about saving keystrokes, but about re-writing the same (fairly common) operation over and over again (and not necessarily the same way each time), and not being able to benefit from future optimizations.

However, as examined in a sibling thread, there doesn't seem to actually be any missed optimization which could potentially be applied here.

randomdata · on Aug 18, 2024

In what way is the operation common? We obviously would never say that there is never a use for such thing as there are clear edge cases where it is necessary, but as jerf points out, it is probably not what you actually need in most cases.

Even ignoring that in the most common case the map isn't the right structure to begin with, what even is the general case for the situations that remain? You mentioned the marshalling of arbitrary data case, but in that case you also have reflection details to worry about, and which you can optimize for with a custom implementation, and thus wouldn't likely use the built-in anyway. A sibling thread discussed the cache benefits of colocating the values with the keys if a map is exceedingly small, but as soon as the map is of any reasonable size the added overhead of the values is almost certainly going to blow the cache even where the keys alone might still fit.

All of which is to say that the best approach is highly context dependent. How do you even begin to choose which is the general case if you were to include such a function?

jerf · on Aug 18, 2024

I endorse this, as I commented in another reply under my post that the correct cache-aware answer is another data structure entirely.

But I'd also suggest that if you think you need sorted keys, double-check. I program an awful lot of things without sorted keys, and I am quite aware of the issues around sorting, and I suspect without proof that a lot of people swearing by sorted maps are imposing false ordering requirements on their code more often than they realize. The ideal solution is not need order at all.

(I am especially suspicious of extensive use of maps where the keys are sorted by insertion order. That smells... antipatternish to me.)

kbolino · on Aug 18, 2024

As I mentioned in another reply, this simple solution is not cache-friendly.

jerf · on Aug 18, 2024

The cache friendly alternative is to use a different data structure. There is no cache-friendly iterate-in-order on the standard map.

But I've got plenty of cases where this in fact is cache friendly, because the entire map fits into L2 or even L1 anyhow because it's going to have maybe 4 keys in its lifetime. Not every map has fifty million values in it. I'm always keeping at least a little bit of track about such details when I'm using maps.

kbolino · on Aug 18, 2024

I did some benchmarks, and it seems you are right that there's no (more) cache-friendly solution in general (at least, not that I could come up with). Memoizing the full entries (key and value) into a slice and then sorting that slice by key has basically the same cache-thrashing characteristics as randomly accessing the values, and is no faster (sometimes slower).

38 · on Aug 17, 2024

the point is you dont need it. by your own admission, it would save literally 0 lines of code from your current example. you need discipline when adding sugar otherwise you can ruin a language.

kbolino · on Aug 17, 2024

I said no such thing.

First, it would save one line of code (v := m[k]). Second, it would also allow an optimization. When iterating a map directly, you have both the key and the value at the same time. However, since we iterate only the keys here, we then have to look up the value anew for each key. That takes extra time and, for large maps, will thrash the CPU cache.

So the following would be both fewer lines of code and faster:

    for k, v := range maps.Sorted(m) {
        // do something with k and v
    }

Making common operations clear and concise is not mere sugar in my opinion. It not only improves the developer experience, it also clarifies the developer's intent, which enables better optimizations, and allows bugs and traps to be addressed in one place, instead of languishing in far-flung places.