Again on 0-based vs. 1-based indexing

patrec · on Jan 19, 2021

There are pros and cons to both zero and one based indexing, but in the context of Lua going with 1-based indexing was IMO a complete disaster, and one of maybe two or three significant defects that has relegated an otherwise quite elegant, small and capable language with one world class implementation to some weird niche role.

You just can't succeed in being the goto language for embedding (in C or C++) and have 1 based indexing. It's just a complete clusterfuck and you end up with all sorts of truly terrible things, like luajit or terra being sometimes zero and sometimes one based, which also interacts in hilarious ways with the length function. I think this has done even more harm than the messed up scoping (which could largely be fixed).

jmiskovic · on Jan 19, 2021

I write Lua in my free time so I have a lot of respect for language. Regarding 1-based indexing I don't care one way or another. I often switch back between C, Python, JS and Lua. At first I was bothered by differences and blamed Lua for being more verbose or using unconventional indexing. After a while I can just switch the language mindset and use 1-based indexing where appropriate.

I agree about the scoping though, it should default to local.

wahern · on Jan 20, 2021

The creator of Ruby admitted that implicit local was "the single biggest design flaw in Ruby." http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/...

I think what most people really want is to avoid implicit scoping altogether. But then you need a special keyword to access global environment symbols. That's a little awkward in Lua because the "require" function used to import libraries is just a regular function in the global environment table (which is otherwise just a plain table), and so `local string = require"string"` works because undeclared symbols become an index into the global environment table. And libraries like "string", "math", etc are also usually prepopulated.

A lot of people do `require"strict"` (or if you want to be pedantic, `local _ENV = require"strict"`) early in a Lua source file, which modifies the global environment to throw a runtime error instead of returning `nil` for undefined symbol lookups. (There are various versions of the strict library floating around; it doesn't come with stock Lua.) If would be better if this was a static check in the language itself. That, too, isn't a difficult addition (it could even be added to a strict library, or you could overload require), but defaults matter when it comes to nitpicking languages, I guess.

Maybe in Lua 5.5 or 6.0 this change will be made. Though, statically-checked scoping isn't as as straight-forward as you'd think given Lua's common use as a fast-and-loose sandbox for running user code, ad hoc business logic modules, etc, where a lot of preamble boilerplate can seem overly complex. But the upside to Lua having a loose commitment to backward compatibility is that it's much easier to experiment with these sorts of changes and keep iterating until they nail down a solid construct. For example, compare setfenv/getfenv in Lua 5.1 with the lexical _ENV construct in Lua 5.2.

patrec · on Jan 20, 2021

I have no problem switching between 1 and 0 based languages and have used both productively; as I said both have their pros and cons. If lua were mostly some standalone language, or even a glue language to the extend python is, there would not be an issue.

So if you mostly write vanilla Lua, yeah, I agree, 1 based indexing will not affect you much (unless you have some psychological hangup). But vanilla lua is to almost all extents and purposes a vastly inferior value proposition to python. And once you try to do some high performance stuff with luajit or terra or use luajit for embedding/glue purposes, the 1 based indexing will immediately become a point of major and utterly needless friction.

So one based indexing weakens Lua chief differentiator: being small, fast and very pleasant to embed or interface to C or C++ with; luajit in particular is probably without compare in this regard. So that's why it's in my view a chief contributor to Lua's failure (not in some absolute sense, but certainly relative to the engineering brilliance that went into both PUC lua and luajit in particular, especially when compared to languages with amateurish design, implementation or both -- such as early php, ruby or python all of which are vastly more successful).

dimes · on Jan 19, 2021

In Europe, 0-based indexing is used for describing the floor number, e.g the first floor is one floor above ground level. Americans may find this unnatural, but it’s perfectly normal to Europeans because they’ve grown used to the convention. It’s the same for programming languages. The vast majority of programmers have gotten used to a specific convention and it will probably stay that way until there is a compelling reason to change.

cwmma · on Jan 19, 2021

Conversely, ancient Romans used 1 based indexes for counting time, i.e. tomorrow is two days away which feels super weird to us but shows that all of these things can go either way.

lemmonii · on Jan 19, 2021

This is not the case everywhere in Europe such as in Norway and Iceland where the 1st floor is generally considered to be the ground floor. It seems to be ambiguous in Sweden.

caskstrength · on Jan 19, 2021

JFYI, xUSSR Eastern European countries count floors from 1.

physicsguy · on Jan 19, 2021

Makes sense when you consider -1 as a basement ;)

AnIdiotOnTheNet · on Jan 19, 2021

Yeah, as an American I think Europe got this one right.

BenFrantzDale · on Jan 20, 2021

I’ve seen elevators labeled that way!

Tijdreiziger · on Jan 20, 2021

I've not seen many elevators in Europe that aren't labeled that way.

ksec · on Jan 19, 2021

That is a great analogy. Thank You. Finally understand why 0-based indexing is so natural to certain type of people.

Turns out there is a Wiki entry [1] on it

[1] https://en.wikipedia.org/wiki/Storey

spacemanmatt · on Jan 19, 2021

Americans may recognize the term "ground zero" more directly from domestic sources.

dunefox · on Jan 19, 2021

I switch between Python and Julia frequently with Julia being my main language now (formerly Python). It's not as big a problem as people seem to think. If this is your biggest difficulty you're not doing difficult stuff.

klmadfejno · on Jan 19, 2021

Same. I haven't made a mistake on this issue for a long time despite switching languages with conventions frequently.

The only thing I will say is terrible is I believe julia had something you could do to make it a 0 indexed language just for you? That seemed like a terrible idea, haha. I have never seen anyone use it thankfully

eigenspace · on Jan 19, 2021

You can't make the language 0 based, that would indeed be a terrible idea. Instead, you can make array types which have whatever indexing behaviour you like.

Because of the focus on generic code, these custom array types are basically first class citizens. See for example https://github.com/JuliaArrays/OffsetArrays.jl

Almost every serious julia package is written in such a way that they can handle such arrays correctly.

craftinator · on Jan 19, 2021

I think it's less about difficulty, and more about causing bugs when you're having to switch back and forth between indexing styles. It's just a footgun; it's not difficult to not pull the trigger, but if you do it often enough, statistics start to kick in.

conradludgate · on Jan 19, 2021

My argument for 0 based indexing is working with multidimensional indices.

If I have a 3 row and 2 column matrix stored as a single array of 6 elements, I might want to index it using the row,col index.

With 0 based indices, this is fairly simple. `m[r, c] = m[2r+c]`

With 1 based `m[r, c] = m[2r+c-2]`

This is similar to the offset based argument but I think it is more related to the actual index rather than the offset.

olodus · on Jan 19, 2021

I have no opinion either way. It really is such a non problem imo.

I can definitely see the reasoning for using 1-based since you really are using cardinals (or ordinal or whatever). This is made maybe most clear looking at Haskell or other FP langs that usually have fst & snd functions.

At the same time, I could also see an array as sort of a map between ints and a obj in the array. This would allow for any number to be used there and why not just start with 0 then, since that is the "default" / first defined value of an int.

Actually haven't read Dijkstra's take on it. Will go do that.

AnIdiotOnTheNet · on Jan 19, 2021

Eh... I'm not convinced on the argument of offset != index, because the reality beneath arrays and lists is that it is offset based.

Arrays are the obvious one: assuming no associated metadata they are an address to the first (zeroth?) element.

Even things like linked lists are essentially offset based: It's an address of the first (zeroth) node. The next node is 1 node away from that.

In general I don't think it is a good idea to abstract too far away from how things actually work. And in that sense, 1-based indexing in LUA does make sense since LUA is table based and '1' here is an arbitrary identifier.

snicker7 · on Jan 19, 2021

To add: scientific-oriented languages (FORTRAN, Octave/MATLAB, R, Julia, ...) typically use 1-based indexing. I think its largely cultural.

commandlinefan · on Jan 19, 2021

“Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.” — Stan Kelly-Bootle

DarkWiiPlayer · on Jan 19, 2021

I say this a lot and I will say it again: Languages that think they're C when in reality they're not make me sad.

_y5hn · on Jan 19, 2021

I'm sure there's some type system out there that can fix this!

BugsJustFindMe · on Jan 19, 2021

The thing that people who don't really know Lua don't get is that Lua doesn't give a shit whether you start at 1 or at 0. Lua doesn't have arrays, it only has tables, and both 0 and 1 are just keys, and having a 0 key is totally a thing you can do.

Tables can be used as lists or dicts or arrays, but, critically, the way you use them is already different (see e.g. pairs vs ipairs, being able to reference string indices with the "." operator but not numeric indices, and whether using # does something stupid), so feel free to index them differently too. The only operations that care in Lua are ones that apply to either listish tables or dictish tables (insert, pairs, ipairs, the stupid footgun # operator) but are irrelevant to arrayish tables.

So if what you want is an arrayish table, go for it. Give yourself a key 0. If you want to start your array at 0 because you want to use modulous cycling, do it. If you want to start at 0 because you want to collapse multi-dimensional arrays into a single dimension, knock yourself out. Lua doesn't care.

patrec · on Jan 19, 2021

> The thing that people who don't really know Lua don't get is that Lua doesn't give a shit whether you start at 1 or at 0.

I don't really understand why this claim comes up all every time in these discussions, it's obviously not true. Like one of the most important operations with an dynamic array is figuring out it's length (and not by iterating over all of it). This breaks if you don't use 1-based.

BugsJustFindMe · on Jan 19, 2021

> and not by iterating over all of it

Lua's length implementation is a big complex iterative search because "length" isn't even defined as the length of the object but rather the length of the first contiguous sequence of non-nils, so good luck applying that premise!

The "length" function in Lua is an extremely annoying footgun, but nothing you said is an argument for indexing non-arrays starting from 0.

patrec · on Jan 20, 2021

Vanilla lua does a logarithmic search (still much better than a linear scan you'd get with ipairs), but I don't think that's typically true in luajit, although I admit I might be quite wrong about this.

Here is some relevant Luajit code: <https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_tab.c#L690... it definitely seems to do a constant time lookup for the length in some cases only falling backing to binary search if the array contains no valid length hint. Also, I'm pretty sure most of the linear algebra libs I've seen overload # for vector/matrix/array classes (and give O(1)); IIRC it's also not at all uncommon to do that if you create some C array backed data structure via luajit's FFI. Regardless of "length" the fact that e.g. ffi.new("double[3]") will be zero and not one based is already a major source of friction.

In any case, in my opinion there is not first and foremost a problem with "length" as such. I think the overalll conclusion is that the fairly bold move to have a single all-purpose hybrid container data structure was not a success and is not something that other languages should try to replicate. Even luajit fails to make this work really well, despite heroic efforts.

BugsJustFindMe · on Jan 20, 2021

> the fairly bold move to have a single all-purpose hybrid container data structure was not a success and is not something that other languages should try to replicate

I agree with this. It created more complication than it reduced. But the length operator is still additionally terrible and does something other than what most people want.

The_rationalist · on Jan 19, 2021

0 based indexing sure is a harmful and needless source of cognitive overhead in most cases, I would be nice to have a list/wiki of all common yet considered harmful language constructs/behaviors. That would give us a clear view of what a new language optimized for brain speed and accuracy would looks like

recursive · on Jan 20, 2021

Seems quite natural to me. Who determines what's overhead? To me, 1 indexing is more mental effort.

The_rationalist · on Jan 21, 2021

Humans, including devs, thinks in english. You think "take the first element of the list" not "take the zeroth element" Cognitive overhead can often be formalized as a mismatch between mapping the syntax (here zero based indexing) to the intended meaning.

Viliam1234 · on Jan 19, 2021

What about using different types of brackets for different types of indexing?

    p[0] = p{1}

The array length is the same either way, {} is syntactic sugar for subtracting 1 from the index.

Also, p«0.5» for the indecisive.