0-based indexing with closed intervals is better for slicing. This shouldn't be controversial. It's because you can represent a zero interval cleanly: [3,3) is an empty interval after slot 2, representing a single cell is [3,4).
This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.
That's the one that really gets us something. It means you can do relatively complex offset math, without having to think about when you need to add or subtract an additional 1 to get your result.
I use Lua every day, and work with abstract syntax trees. I mess this up all. the. time.
Of course you can use closed intervals and stick with 1-based indexing. But for why you shouldn't, I'm going to Appeal To Authority: read Djikstra, and follow up with these.
> 0-based indexing with closed intervals is better for slicing. This shouldn't be controversial.
base is irrelevant to this, and you want (and show!) half-open, not closed, intervals.
> This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.
Yeah, those are all properties of half-open intervals, irrespective of indexing base. It would be as true of π-based indexing as it is of 0-based.
It's related to 0-based indexing in that if you you want to take/iterate over the first `N` elements, `0:N` works with 0-based indexing + close-open, but if you had 1-based and close-open, you'd need the awkward `1:N+1`.
This is why 1-based index languages normally use closed-closed intervals, so that they can use `1:N`.
I'm a die hard Julian (Julia is 1-based), but I do a lot of pointer arithmetic in my packages internally. I've come to prefer 0-based indexing, as it really is more natural there. 0-based plus close-open intervals are also nicer for partitioning an iteration space/tiling loops, thanks to the fact the parent commented pointed out on the end of one iteration being the start of the next. This is a nice pattern for partitioning `N` into roughly block_size-sized blocks:
iters, rem = divrem(N, block_size)
start = 0
for i in [0,iters)
end = start + block_size + i < rem
# operate on [start, end)
start = end
end
But that's only slightly nicer. To translate this into 1-based indexing and closed-closed intervals, you'd just substitute the `# operate` line with
# operate on [start+1, end]
the `[0,iters)` with `[1:iters]`, and `i < rem` with `i <= rem`.
1- vs 0-based indexing is bike-shedding. A simple question we can all have opinions on that's easy to argue about, when it really doesn't matter much.
Julia uses 1-based indexing, but its pointer arithmetic is (obviously) 0-based, because pointer arithmetic != indexing. Adding 0 still adds 0, and adding 1 and `unsafe_load`ing will give me a different value than if I didn't add anything at all. (This is just reemphasizing the final point made by the blog post.)
I have on occasion, and when working on custom array types I've added support for being offset with optionally compile-time known offsets.
As StrideArrays.jl matures and I write more libraries making use of it, I may use 0-based indices more often. The idea of dynamic offsets when they aren't needed bothers me, even though I've benchmarked that as being pretty meaningless.
The bigger problem is just that OffetArrays are a wrapper, and there's a Julia bug where TBAA information on wrappers often gets lost, so that the compiler reloads the pointer you're loading from on every iteration of a loop. Aside from that being slow itself, it also causes the autovectorizer to fail. This causes a severe regression when it occurs. Performance should be fine in other cases.
LoopVectorization or `@simd ivdep` should also avoid the problem.
For the most part, my preference on 1 vs 0 is weak, and for coffee other people are likely to look at i do want to make it easy to understand / not full of surprises.
The natural representation of intervals for doing arithmetic on is 0-based half-open.
Half-open because of the slicing properties, as noted in your posting and the grandparent posting.
0-based because of the simplification for converting between relative coordinate systems. Suppose you have one interval A represented as offsets within a larger interval B, and you'd like to know what A's coordinates are in the global coordinate system that B uses.
This is much easier to compute when everything uses 0-based coordinates.
I think this is a good write up, but I think the notation is still carrying some mental baggage. It's not necessary to have these open and closed brackets/parenthesis. They don't add anything, and if anything, they confuse the matter. An interval is just (1, 2) (or [1, 2] if preferred aesthetically). Since a base cannot be "on" either 1 or 2, it's not meaningful to have these open/closed interval notion. In other words, (1, 2) == [1, 2) == (1, 2] == [1, 2].
Open/closed intervals only come into play in continuous dimensions. DNA sequences, arrays in memory, et al are discrete.
On the first point, yep, completely misspoke, what you don't want is open intervals.
To the second point, as I said: Djikstra's argument for using 0 instead of 1 with half-open intervals is, to my taste, perfect. As I have nothing to add to it, I will simply defer.
For </<= intervals and 1-based indexing you have to write `for(i=1; i++; i <= N)`. So you lost nice property of having `upper - lower` number of iterations.
For half-open intervals, you have upper-lower iterations regardless of base. In C for loop terms, it's:
for (i=lower;i<upper;i++) {}
If you are using 0-based indexing, lower is zero and upper is the number of elements for a full iteration over all indexes; with 1-based indexing lower is 1 and upper is one greater than the number of elements.
Now, despite the utility of half-open intervals, most people’s intuition is around ordinal index ranges, so 0-based indexing is counterintuitive with half-open intervals because slices start at 0, and 1-based indexing is counterintuitive with them because they end at N+1.
This is because, useful or not, half-open intervals are counterintuitive, but they are worth developing the intuition for because they combine nicely.
> with 1-based indexing lower is 1 and upper is one greater than the number of elements.
Do you see +1 tweak? Also consider that "for loop" can't be expressed as </* interval and always expressed as <=/* interval. So 1-based indexing have to be either 1 <= i <= N (closed interval, bad, N - 1 != number of iterations) or 1 <= i < N + 1 (half open interval, good, but +1 tweak).
> I use Lua every day, and work with abstract syntax trees. I mess this up all. the. time.
I think this is the clincher in your argument (rather than Djikstra). I only use languages with 0-based indexing, and it seems natural to me, but I could almost be convinced that if only I used 1-based indexing regularly then I'd be fine with it too. But here you are with actual experience of using 1-based indexing regularly and you don't feel that way, which blows that idea out of the water.
A programmer “strengthens their ankle and leans forward by holding their arm out, preparing to relax the shoulder” into a bar. The bartender bursts out laughing for few sprints.
I mess it up as well, both in Lua and C. The key to success is to not calculate values by adding or subtracting them, but to use meaningful names instead. E.g. dist(a, b), holen(a, b), endidx(start, len) and so on. Forget about ever writing “1” in your code, point your finger and call operations out loud. It doesn’t eradicate errors, but at least elevates them to the problem domain level. Off by one is not exclusive to 1-based or 0-based, it is exclusive to thinking you’re smart enough to correctly shorten math expressions in your head every damn time.
(But sometimes I also feel too clever and after some thinking I’m just writing (i+l2+b+2) because it’s obvious if b already comes negative and means “from behind” in “0-based -1 is last” terms.)
True, but there are examples where 1-based indexing is easier, like returning the last element based on length. I think array[array.length] is easier to understand than array[array.length - 1].
Or the predecessor of the last element: array[array.length - 2] makes you think, whereas array[array.length - 1] is more obvious.
I think the most mathematically natural way to interpret negative indices is to threat them as numbers modulo list length. Then 0 = length, and -1 = length - 1.
Negative indices have an implied ”len(list)” in front of them. They can be seen as an application of the idea of half-open intervals (0 <= i < len(list)), so it makes sense that they’re not symmetrical.
I'm not sure on what principles you'd make a principled justification. My most common use of negative indices is for slicing the end of the list, in which context the interpretation similar to len(list) makes sense. E.g., list[:-2] does what you'd expect (the same as list[2:], except from the other end).
> implicitly taking indices mod len(list)?
Isn't this doing that (plus some bounds checking)? -1 mod 4 evaluates to 3.
Which Lua has also, for strings, and it's quite convenient.
Doesn't work for tables, though, because you can actually put something at foo[-1], which can be useful.
I find that foo[#foo+1] is more common than foo[#foo] in my code, that is to say I add to an array more often than I access the last slot. Although I typically use the idiomatic table.insert(foo, bar) instead— but that's mostly because it's slightly awkward to add the +1 all the time.
Sure. Let's take array elements in reverse, from m to m-n.
Suppose n erroneously becomes larger than m. That should
produce an error, not silently take several tailing elements.
imagine if you used list[index] where index would be calculated
if you misscalculated something and went on negative, then you'd have big exception/error, but when negative indices are viable, then the code will work fine
The fact of the matter is, for a type that can hold N distinct values, there are N + 1 different possible interval lengths. You can not represent intervals over a fixed-width type in that same type.
Did you mean open range? I've never encountered a circumstance where closed ranges are useful, though I presume they exist.
And yes, I think any typed language with a less expressive range syntax than Ada has some work to do. That still leaves open the question of the default, and I maintain that 0 <= .. < n is the correct one.
That... makes no sense? Whether you start counting a list at 0 or 1, if your language of choice supports [x,x) notation, that syntax will effect an empty interval, and [x,x+1) will effect a single cell, no matter whether your list indexes the first element as 0, or 1. The only difference is which cell [x,x+1) refers to, which is the part that keeps sparking the controversy.
Why on Earth would I want a "zero interval" slice?
It has none of the properties I want in a slice, and that a slice of literally any other length will have. In fact, it means every slice I use needs error-handling, because I might get... something that's functionally not a slice, but is still called one. If that doesn't scream "type error" at you, I don't know what would. In fact, that's precisely why 0 is a comparatively recent invention. Most list-based problems were solved just fine before then.
All these arguments are poor rationalisation for the field's inability to move on from the 8-bit era, where the loss of 1/256 of your address space mattered and compilers had bigger problems to solve than translating indices to offsets.
You want a zero-slice because it’s a simpler base case in almost any even mildly complex algorithm. Without empty lists, you need many additional branches throughout a code base to handle the “no element” case, causing more potential bugs. There’s nothing “8-bit” about cleaner code, quite the opposite.
Wikipedia says that Egyptians had a 0 as early as 1770 BC - almost 4000 years ago. If that's a "relatively recent" invention, what about computers?
0 comes up if you're doing any arithmetic at all. It's a natural and useful concept for almost anything. As long as you always can take something aways where there is something, you'll need 0. It wouldn't be very useful to make something such as a non-empty list type, to then be able to take away all elements except the last one, which must remain.
In more mathematical terms, 0 is called a neutral element (relative to the addition operation), and almost anything you might want to do (for example subtraction) requires as a consequence a neutral element.
Well, you might want a function which returns String, not Maybe(String), so that you can just concatenate, rather than handle Some(String) or None all the time.
So that's why you might want an empty String. If you have an optional element in a syntax, it can be convenient to match it with a rule using a Kleene star, and that gives you an empty Node, which would return an empty String if you request its value. And so on.
With open intervals, you have to represent that as, say, (3, 2). Which sucks.
Just a friendly correction :-). You have repeatedly confused the meaning of open and closed. "Closed" means including the end element given in the range, and "open" means excluding it.
It may make more sense to think of intervals in the real numbers. "[1,2)" doesn't have a maximum to the right - it has 1.9999....... but not 2. Thus it's "open" to the right. Whereas "[1,2]" includes its maximum (2), so its sealed.
The terms open and closed apply to any set of numbers (not just intervals), where "closed" means that the set includes the limit of any series of numbers in the set, and open means "not closed".
Much appreciated! The terminology and notation seemed backwards to me, and I basically never use it except when discussing Djikstra on HN ;-)
Visually, it looks like ) has more "room" than ], if that makes sense, and that "closed" would mean that the element defines the 'lid' of the range, while "open" means it defines the final extent.
But you can't argue with nomenclature, and extending the definition to the reals helped make it click for me, so I'm unlikely to screw it up again. Thanks!
You could imagine the curved parentheses as "cutting the corner" of the square brackets - or imagine the curve as just gently kissing the endpoint at a single point, while the flat square solidly includes it. "Open" makes sense because it satisfies the criterion that for any number in the interval, you can find a larger (or smaller) number that is also in it - there's "no end".
This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.
That's the one that really gets us something. It means you can do relatively complex offset math, without having to think about when you need to add or subtract an additional 1 to get your result.
I use Lua every day, and work with abstract syntax trees. I mess this up all. the. time.
Of course you can use closed intervals and stick with 1-based indexing. But for why you shouldn't, I'm going to Appeal To Authority: read Djikstra, and follow up with these.
https://wiki.c2.com/?WhyNumberingShouldStartAtZero https://wiki.c2.com/?WhyNumberingShouldStartAtOne https://wiki.c2.com/?ZeroAndOneBasedIndexes