Again on 0-based vs. 1-based indexing

tragomaskhalos · on Jan 20, 2021

One thing I would hope there is consensus on is that 1-based indexing is easier to learn. My son is doing Python at school, and the "first-element-is-actually-element-zero" is something that has to be dinned into them, a sure sign that it is non-intuitive. In similar vein, as adult programmers we know that the half-open range is the most useful mechanism for expressing a sequence, but again we have to explain to kids that if they want the numbers 1 to 12 (eg to do a times table program) they must type range(1,13), which at that stage of their learning just seems bizarre. Actually I could go on at length about why Python is a terrible teaching language, but I'll stop there !

tremon · on Jan 20, 2021

It is non-intuitive, but it doesn't happen only in programming. Look at sports reporting, for example (basketball, football). When they say something happened "in the 6th minute", it happened between 05:00 and 05:59. The difficulty isn't solely with programming.

You can use the same mechanism (minutes:seconds) to explain the half-open ranges: does a range(0,30) mean from 0:00 to 30:00 or from 0:00 to 30:59? If the second half of the match starts at 30:00, does the first half really end at 30:00 or is it really 29:59:99...9?

For most basic concepts, there's always real-world examples to use so that the teaching doesn't have to be abstract.

userbinator · on Jan 20, 2021

There's another non-intuitive and also inconsistent usage of ranges that I find rather confusing: if you say that a meeting is from 9 to 10, it's one hour long. If you say that you'll be on vacation from January 20 to 21, most people seem to think that means two days.

TuringTest · on Jan 20, 2021

That's because hours are announced as points in time, while days are seen as time intervals. The duality intervals/separators is essential to the different ways to understand ranges (specially for range differences).

I once developed a shortcut notation to simplify reasoning about ranges differentiating between "point cero" and "interval between 0 to 1", to make off-by-one errors less likely, but in the end it was functionally equivalent to the popular open/closed interval notations.

TuringTest · on Jan 20, 2021

If anyone is interested in a notation for this kind of integer ranges, the two different styles in the above example could be represented like this:

   /9..10\  : "short" interval : size 1 (hour)  ==  half-open interval [9..10)
 -|20..21|- :  "long" interval : size 2 (days)  ==  closed interval   [20..21]

rytis · on Jan 20, 2021

Specifying vacation range got me in to few confusing situations, so I have a habit now announcing them as: "I'm on vacation from X to Y, returning to the office on Z"...

psychoslave · on Jan 20, 2021

1 hour meeting -> meeting start at 9 and is expected to finish when it’s no longer 9 1 day vacation -> vacation start on January 20 and is expected to finish when it’s no longer January 20.

Ultimately, it relies on conventions, but I guess now you can see how this one relies on something which has some consistent pattern.

doritosfan84 · on Jan 20, 2021

I thought the proper way to say this is January 20 through 21, but seems like most people don't follow this. Probably because it's often abbreviated as 20-21, which is often used for both cases.

teekert · on Jan 20, 2021

Yeah and kids are actually aged "1" in their second year.

improbable22 · on Jan 20, 2021

Time is a continuous variable, and kids know this pretty early on: they will happily tell you they are 5 and a half years old. Their second year starts when they are aged 1.0.

English in general makes this distinction: "He is fewer than 6 years old" is simply a mistake, while "less than 6 years old" is fine. (I think some dialects use "fewer" less often, in general, but would be pretty surprised if any prefer it here.)

aqme28 · on Jan 20, 2021

Your "first" birthday is actually your second birthday.

tremon · on Jan 20, 2021

To be pedantic, it's the first annual celebration of your birthday.

Other languages use a less confusing term for it: anniversaire (French) is just anniversary, verjaardag (Dutch) roughly means year-rollover-day, and cumpleaños (Spanish) signifies completion of another year.

tremon · on Jan 23, 2021

I just realized something: anniversaire has the same root as the other examples I gave. It comes from the latin words annus and verso, which mean "year" and "to turn over" respectively.

FigmentEngine · on Jan 20, 2021

its the same in English:

late 14c., from Old English byrddæg, "anniversary or celebration of one's birth" (at first usually a king or saint); see birth (n.) + day. Meaning "day on which one is born" is from 1570s. Birthnight is attested from 1620s. https://www.etymonline.com/word/birthday#etymonline_v_27158

rmah · on Jan 20, 2021

In korea, they'd be age 2. There (and possibly other east asian nations?) count "age" in terms of which year you're in. 1st year, 2nd year, 3rd year, etc. The way americans do school year, except it starts at birth.

jzwinck · on Jan 20, 2021

In Korea everyone advances their age on 1 January. Korean babies born three weeks ago are now age 2 (and have been for most of their lives).

klik99 · on Jan 20, 2021

When my korean friend first told me this, I honestly thought they were messing with me. I still struggle to make sense of it. Must be a helluva birthday party on Jan 1st tho

I know there are studies on how birth date affects future success in school (though IIRC I read in Gladwell, and I'm wary of accurate his read on things are), I wonder if that effect is compounded with this system?

naniwaduni · on Jan 20, 2021

It's just a count of how many calendar years contain you.

ska · on Jan 20, 2021

It's the difference between "I was born in 2010" and "I am about 10.3 years old".

Americans do this a lot too, except at a (pseudo) decade level, e.g. "are you a '90s kid?"

cgriswald · on Jan 20, 2021

> Americans do this a lot too, except at a (pseudo) decade level, e.g. "are you a '90s kid?"

This brings up other ambiguities. A "90's kid" might have been born in the 80's.

ska · on Jan 20, 2021

Yes, it’s not a clean mapping

rmah · on Jan 20, 2021

They're in their second year.

randomdata · on Jan 20, 2021

> The way americans do school year

You could say that the American school year is 0-based. Except instead of naming it zeroth grade, it is named kindergarten.

AuthorizedCust · on Jan 20, 2021

What we call “age” is merely the number of the most recent birth anniversary we’ve passed. That makes sense. While your take isn’t invalid, it sets up a less intuitive construct that isn’t aligned with the common meaning of the word.

jjaredsimpson · on Jan 20, 2021

> does the first half really end at 30:00 or is it really 29:59:99...9?

those are the same number.

Jtsummers · on Jan 20, 2021

By putting a terminal 9 there it's no longer "point 9 repeating" but "some arbitrary, but finite, number of 9s after the decimal".

thunderbong · on Jan 20, 2021

Exactly the same as 21st century meaning everything between 2000 - 2099

grawprog · on Jan 20, 2021

But back in 2000 I remember there were debates on whether the new century/millenium started in 2000 or 2001.

_0ffh · on Jan 20, 2021

Yeah, seems like the 21st century should be 2001-2100.

novaleaf · on Jan 21, 2021

24hr time is also 0th based :)

bachmeier · on Jan 20, 2021

You're supposed to start out by teaching them pointers. Once they fully understand how pointers work, you move them on to Hello World so they can write their first program. Then it will be intuitive when they see 0-based indexing. I think that's the HN consensus on teaching kids how to program.

me_me_me · on Jan 20, 2021

That's an excellent idea! We should also teach them quantum physics so instead of questioning programing language flaws they would question reality. Another problem solved.

I think we are on the track to create a perfect programmers factory.

taneq · on Jan 20, 2021

Honestly a lot of the confusion that I've seen in novice programmers over the years comes from the fact that nobody explained to them (1) that a program is a set of instructions that are executed one after the other by the computer, and/or (2) that memory is like a long line of boxes, each of which can hold a byte.

Once someone understands (2), pointers are incredibly simple and obvious.

scythe · on Jan 20, 2021

I have been surprised several times, when helping people to write "their first program" (which, in science, isn't that rare), that it can take days to internalize the relevance of ordering statements in a program. It's deep in every coder's subconscious that the bottom line depends on the top one and not the other way around, and I have had to consciously restrain myself from saying "whaaaaat?!" when I saw how my colleague tried to do something.

Jtsummers · on Jan 20, 2021

I'm glad you've learned to restrain yourself because that mistake makes sense if you're not used to imperative thinking. It doesn't benefit students to have instructors who are shocked by common/natural errors.

Math, as commonly encountered, is a collection of facts and relations. Their notion of variable is very different than in CS and programming.

To a mather (one who uses/does math), this makes sense:

  x = 7 * y
  y = 6
  therefore x = 42

To a coder (of most conventional programming languages) that's an error, y is undefined when x is assigned a value. And even if it did have a value, changing it later wouldn't change x so who knows what x could be.

It makes sense to the mather because each statement exists at the same time. To the coder, each statement is realized over a period of time. To the coder "y = 6" hasn't happened yet when "x = 7 * y". This is the part that has to be communicated to the novice with a mathematical background.

anaerobicover · on Jan 20, 2021

There are non-imperative logic programming languages (Prolog) where something like that would work. And those of us who started with imperative coding are probably just as confused by them in the other direction.

Jtsummers · on Jan 20, 2021

That's why I qualified the statement with "conventional". Prolog is not a conventional language, in the sense that most programmers don't know it (and I'd wager close to half of programmers have never even heard of it, if not more). C, C++, Java, JavaScript, PHP, Python, Perl, Ruby, even Rust are all imperative languages at their core and represent the dominant group of languages in conventional use.

If the OP was teaching people a language like Prolog then there wouldn't have been the same confusion.

jdsully · on Jan 20, 2021

Verilog and other HDL languages work this way, its not an accident humans would start this way as well as the natural world works in parallel.

Jtsummers · on Jan 20, 2021

I actually started writing a comment about that. Teaching VHDL (my experience) to most programmers is an entertaining experience because they expect effects to be immediate, not executed collectively and simultaneously but mediated by a clock or other mechanism. So in an HDL:

  x <= y;
  y <= y * 20;

These two effects actually occur simultaneously, which is different again from what mathematically inclined learners might expect since there is still mutation. It is more akin to the common notation:

  x' = y
  y' = y * 20

Where the ' indicates the next value of x or y (as opposed to the derivative). Or perhaps:

  x_n = y_{n-1}
  y_n = y_{n-1} * 20

And many programming languages don't even have a good notion for similar parallel evaluation and assignment. Python does, with:

  x, y = y, y * 20

Common Lisp has psetf which permits this:

  (psetf x y
         y (* y 20))

Languages with pattern matching/destructuring bind are your best bet for writing this directly without needing temporary variables.

TuringTest · on Jan 20, 2021

What the typical programmer doesn't even realize is that this is an artifact of Von-Neumann architectures (and their assembler language) and optimization of early compilers, not a essential property of computation.

Spreadsheets, expression-rewriting languages or any other functional programming language don't necessarily have this limitation, and declarations can be made in any order if the compiler supports it.

robto · on Jan 20, 2021

I've noticed this as well, and I've concluded that the actual surprising part for new coders is mutability. In basically every domain where non-developers invent algorithms, everything is immutable - spreadsheets, math!. Heck, even reading an article - the words you've already read don't change their meaning just because you've kept reading.

Mutability is really a result of the fact that computers are physical machines. So I think if you're going to teach software development, you should really start at the machine architecture and build up (ala NAND to Tetris) or start with an immutable-by-default language and only dip into mutation as an "advanced" concept, only to be used by experts and definitely not by beginners.

I endorse the second way, as I've seen it work very well, whereas I've seen more than one ship crash on the shore of understanding pointers...

Of course, my N is small.

dodobirdlord · on Jan 20, 2021

I once had the surprising experience of having to help a postdoc with a PhD in mechanical engineering wrap his head around a block of Python code he was reading that looked like:

    x = f0()
    x = f1(x)

He was just mentally blocked on how all of these things could be true at the same time, and/or on how a variable could be used in setting its own value, since assigning a new value to x would change the value that had been passed to f1! His intuition was that this meant some kind of recursion-to-stability was at play.

All he needed was to be informed that this could be equivalently represented as

    x = f0()
    y = f1(x)

or

    x = f1(f0())

sergeykish · on Jan 20, 2021

It is like recipe. And in general it is the way we read books.

smitty1e · on Jan 20, 2021

I was going to go with "bi-stable circuits, building up to half-adders", but you leapfrogged me there adroitly.

TuringTest · on Jan 20, 2021

It's plank-sized topological-connected vibrating goo all the way down.

jcelerier · on Jan 20, 2021

> You're supposed to start out by teaching them pointers

you can just show them a ruler and ask them where does the first centimeter starts

doodpants · on Jan 20, 2021

To which they should cleverly reply, "Aha! You said 'first' centimeter, not 'zeroth' centimeter, thus defeating your own argument."

gjm11 · on Jan 20, 2021

Someone I knew at university, who prefers starting at 0 (for mathematical, rather than computational, reasons) made the following argument/proposal.

"First" is an abbreviated form of "foremost". So if you start counting at 0, "first" corresponds to index 0. There's no real clash here yet; "first" and "one" are entirely separate terms.

"Second" is from Latin "secundus" meaning "following"; the second thing follows the first. So "second" corresponds to index 1. Again, no clash yet; "second" and "two" are basically unrelated.

"Third", though, actually derives from "three". So "third" has to correspond to index 3.

This leaves a gap, which might perhaps be filled by a new word "twifth", corresponding to index 2.

So now we have the cardinal numbers: zero, one, two, three. And the corresponding ordinals: first, second, twifth, third.

mjevans · on Jan 20, 2021

The hypothetical student's sarcastic answer isn't technically correct.

> you can just show them a ruler and > ask them where does the first centimeter start(s)

The question involves a nuance of the english language, the distinction between a measure of wholeness and the contents of that unit.

Length is a count of the completeness of a measure. The length of an interval must inherently include 0 as a unit to describe taking no units.

A 0 starting index is likewise also required to indicate skipping no storage cells.

Therefore an absolute index start must be zero based, and the Addition of duration must also include the option of zero as a means of describing a point at which to begin, but having not yet used any length.

The subtle trick is that is a point, an array of zero size; this is why the length, or alternately absolute stop point, for an array storing anything is one higher. It isn't actually zero indexed, it's a different property of measure (the size or stopping address).

A centimeter is also a wonderful example since it should graphically contain eight additional ticks for millimeter marks, and naught else, among it's inner span. This aligns wonderfully with explaining that there are targets spots for 8 bits of an octet within that single byte of indexed storage.

ycombobreaker · on Jan 20, 2021

Doesn't that just demonstrate your (edit: the teacher's) point exactly?

The first centimeter starts at 0.

The first array element starts at 0.

TuringTest · on Jan 20, 2021

No, that's the lower bound of the first array element.

nybble41 · on Jan 20, 2021

It's the offset of the first array element from the start of the array. The first element is at the start of the array, so the offset is naturally zero—there are zero elements prior to the first element. Like the markings on a ruler, array indices are cardinal numerals (0, 1, 2, …) counting the distance from a reference point, not ordinal numerals (1st, 2nd, 3rd, …) representing rank in a sequential order.

TuringTest · on Jan 20, 2021

If you're comparing arrays to centimeters on a ruler, the first element is at the center of the first cell; at zero, there's the left border of the first cell.

(Also please note that both this and my previous reply are tongue-in-cheek).

gerdesj · on Jan 20, 2021

There's a difference between a cm being the thing that exists between the lines with 0 and 1 against them and the distance that is 1cm. Indexing is within the scope of the first definition. You might as well ask where the 0 or 1th element in an array starts and finishes.

eutectic · on Jan 20, 2021

Or give the example of birthdays; in your first (initial) year of life you are 0 years old.

coldtea · on Jan 20, 2021

And they'll see that rulers are broken too, they should start with 1 instead!

k9000 · on Jan 20, 2021

To which some of us older Americans would say, "Which one is the centimeter?" It's so hard for me to think in metric.

xxs · on Jan 20, 2021

so the question is: what's worse - imperial or 1-based index?

lillecarl · on Jan 20, 2021

Imperial, because of it's vagueness people use other measurements as well. Something is x stories high. Something is a "stones throw away".

Though i do appreciate Fahrenheit for how it scales with humans. 0 is really damn cold, 100 is really damn hot, while in Celcius 0 is kinda chilly and 100 is instant death.

xxs · on Jan 21, 2021

>100 is really damn hot,

It's actually supposed to be the body temperature. Due to imprecise measurements, it's actually having a light fever. Flip note: C scales the same - 0 is the water freezing point, and then you have the same scaling as F but in 5s instead of 10s. [and you gotta remember: (f-2pow5)*5/9].

That's an old debate, yet if you are born/used to C, F is pretty horrific.

lillecarl · on Jan 21, 2021

I forgot to mention that I'm Swedish and use C daily, I just think the F scale makes more sense for humans

fomine3 · on Jan 21, 2021

100C isn't instant death, see sauna. Maybe you compare 0C air and 100C water so to be fair, 0C water is maybe death within dozen of minutes

HPsquared · on Jan 20, 2021

Very nice, I'll steal that.

stabbles · on Jan 20, 2021

That's great advice for a kid learning python. "Kid, let me tell you about pointers in C, then you'll understand we you have to write range(1, 13)"

coldtea · on Jan 20, 2021

Monads should make the list too...

Also trampolines. Kids love trampolines. From there it's a simple jump (pun intended) to something like the Y-combinator!

globular-toast · on Jan 20, 2021

I agree, but Python doesn't have pointers. So I agree with your parent that Python isn't a good language for teaching.

Unfortunately teaching languages that are good for learning, like Scheme or Pascal, went out of fashion a long time ago. For years now universities have been teaching marketable languages. Ten years ago that was Java. Now it's Python.

erostrate · on Jan 20, 2021

I can't tell if this is a joke or not.

laichzeit0 · on Jan 20, 2021

It's hilarious that people are taking this comment literally. HN comedy gold. Reminds me of when I was working with a German work college and he just could not pick up on English sarcasm.

valvar · on Jan 20, 2021

I thought it was a pretty obvious joke, but then I saw lots of people seemingly seriously agreeing, so now I'm not so sure.

userbinator · on Jan 20, 2021

There's two groups of programmers, those who started learning with low-level, and those who started with high-level languages. I'm personally in the former group, and think there are definite advantages to doing it that way. You are probably in the latter.

valvar · on Jan 20, 2021

Sure, for adults, that's fine (but I still disagree with the approach, as pointers are a notably tricky concept that really isn't necessary in the beginning). But it's like saying that before teaching a kid addition and multiplication they should understand the Dedekind–Peano axioms -- it's simply overkill and more likely to discourage them than anything else. (But it depends on age and aptitude, of course, so obviously YMMV).

FWIW I started with K&R, but times have changed and I definitely don't think it's a pedagogically sound way of teaching programming.

gjm11 · on Jan 20, 2021

Back in the mid-to-late 20th century, there was a movement to teach mathematics rather that way, though I don't think it ever got as far as teaching small children the Peano axioms. The "New Mathematics", they called it. I remember encountering talk of sets and functions in UK primary school circa 1975.

It never worked very well in practice. I expect that when you're fortunate enough to have unusually able teachers teaching unusually able pupils it might do OK -- but then, in that situation, anything will do OK.

(Tom Lehrer fans will know about this from his song "New Math", with "It's so simple, so very simple, that only a child can do it" in the refrain.)

cpcallen · on Jan 20, 2021

For the record: Lehrer-style "new math" was what I was taught at elementary school in Ontario in the 1980s. I know it was not what my parents had been taught, because they couldn't understand how we did subtraction. We did do some other bases (mainly binary, as I recall) as well as modulus.

I certainly had unusually able teachers.

userbinator · on Jan 20, 2021

I don't think it's comparable to teaching arithmetic from a highly theoretical basis, since the type of low-level beginnings I'm advocating for are starting from a concrete foundation --- logic gates from relays and such, not unlike the approach used by Petzold's book Code.

Incidentally, that book has been praised for making computation easy to understand for non-programmers, which I would consider to be evidence for starting out low-level.

but times have changed

Not for the better, if you look at the average state of software today --- and I think a lot of it can be explained by the general lack of low-level knowledge.

auxym · on Jan 20, 2021

It has to be. Right?

teamonkey · on Jan 20, 2021

Is this before or after teaching category theory?

psychoslave · on Jan 20, 2021

Just forbid them to read Principia Mathematica.

All of them.

djxfade · on Jan 20, 2021

I don't think a lot of beginner friendly languages exposes pointers to the developer

randomdata · on Jan 20, 2021

Unless they have arrays, which are conceptually not a whole lot different.

mewpmewp2 · on Jan 20, 2021

Most of the time you don't use them or think or need to think of them as pointers.

randomdata · on Jan 20, 2021

No, but pointers are thought of as array indices. If you understand arrays, you understand pointers.

mewpmewp2 · on Jan 20, 2021

What do you mean by understanding? E.g I have been using arrays productively in PHP and JavaScript, does it mean I understand them?

randomdata · on Jan 20, 2021

If, given a[n], you understand that n points to a position in array a then you understand pointers.

wiz21c · on Jan 20, 2021

pointer is an implementation detail. No one should have to learned that first.

tpush · on Jan 20, 2021

A pointer (or reference, or just 'address') is a general concept that by definition is part of any language featuring mutation.

Any language that makes use of mutation for primitive constructs (not saying that's the case here specifically) needs an understanding of "observing/causing change" pretty early on.

improbable22 · on Jan 20, 2021

> is a general concept that by definition is part of any language featuring mutation.

One counter-example is Mathematica. I can mutate any object any time I like, and doing so is extremely common. Yet under "pointer" its help tells me about some unicode symbols, and some GUI stuff about mouse pointers.

In reality, some of its lists are dense memory like C, some contain arbitrary objects (such as expressions, or high-precision numbers), but these implementation details aren't part of the language, and most users never know them.

tremon · on Jan 20, 2021

Mathematica is also not a low-level programming language. It's math-based, so it deals with abstract concepts, not concrete hardware implementations.

0-based arrays are based on hardware implementations; 1-based arrays are based on abstract collections; you could make the argument that 0-based arrays are a leaky abstraction, but the question then becomes: what abstraction? C explicitly tried not to abstract away the underlying hardware.

improbable22 · on Jan 20, 2021

If "any language" means "any language sufficiently low-level to have C-like pointers", then no contest!

Surely the point of any language at all is to abstract away some details of the underlying hardware. Both for portability and to make things easier for human brains. How much you should abstract away obviously varies, based on what you're trying to do, and on how varied the hardware is.

And it depends on how much computation & memory you can give your compiler. One reason new languages can have nicer abstractions is that they don't have to compile on 1970s hardware.

pjmlp · on Jan 20, 2021

Ada, Pascal, Modula-2,....

wiz21c · on Jan 20, 2021

lisp, smalltalk, scheme, R.

zajio1am · on Jan 20, 2021

0-based arrays/indexes are also based on abstract concepts. 1-based ordinal numbers (in both maths and language) are just tradition from time before 0 was generally accepted. Set theory (as a clean design of math foundations) also starts ordinals at 0.

tpush · on Jan 20, 2021

The specific word or implementation isn't important.

In Mathematica's case (and most other languages, sadly), presumably the concept of 'variable' is overloaded with the semantics of a pointer/reference. Meaning, you cause and observe changes to locations via a variable.

tomp · on Jan 20, 2021

Trying to program without understanding how data is stored in memory is like trying to cook without understanding how heat affects food.

cmrdporcupine · on Jan 20, 2021

At this point even assembly language is an abstraction of a simpler machine overtop of a much more complicated one, and few programmers really understand what's happening underneath. It's not like a chunk of 'memory' in your C program corresponds exactly to a physical location on a DRAM chip anymore. There's N layers in-between, not just in the OS's virtual memory model but also in the chip's caches, etc.

We're all dealing with abstractions over abstractions at this point, so it's really about finding what model of understanding works for the learner and then going from there.

oliverx0 · on Jan 20, 2021

Terrible analogy. The beauty about computer science lies in abstractions. You don’t have to understand the low level details to be productive at the higher levels, you just have to understand the interfaces. If we go down this rabbit hole, do you have to understand how the memory bus works, how the operating system is implemented, etc... before getting to hello world?

turminal · on Jan 20, 2021

There is a huge difference between explaining how indexes come from offsets and understanding operating systems. I think more emphasis should be put on the insane amounts abstractions underlying the high level language at hand when teaching programming. Explaining that 0-based indexing comes from offests seems like a good start in that direction for a kid learning python.

And when beginners won't be beginners anymore they'l have to learn how the underlying abstractions work anyway. A programmer that doesn't at least superficially understand how operating systems work is not going to be able to write performant or secure software, let alome combine those qualities.

userbinator · on Jan 20, 2021

Digital logic is a good start...

You don’t have to understand the low level details to be productive at the higher levels, you just have to understand the interfaces.

I've worked with programmers like that. They can glue a lot of things together, but when the slightest of problems appears, they are absolutely lost with how to debug. They usually resort to making pseudorandom changes until it "appears to work", and are also unable to appreciate just how fast computers are supposed to be.

TuringTest · on Jan 20, 2021

I learned physics and transistors at comp-sci. It didn't help with programming at all (though it made wonders for my understanding of the philosophy of computation).

The kind of understanding you're rooting for relies on formal logic, not electronics.

AnIdiotOnTheNet · on Jan 20, 2021

That depends entirely on the kind of programming you're trying to do. For beginners working in high level languages it is an implementation detail they need not be concerned about.

Growing up with BASIC, I didn't have any understanding of pointers or how memory worked. PEEK and POKE, which I only saw used in other people's programs and examples, were black magic as far as I could tell. It wasn't until I learned C that I actually understood them but I still managed to write a whole lot of programs without ever using them.

To be a professional programmer I'd agree though. If you don't understand how pointers work, regardless of the language you're using, you're probably not good at your job.

RaptorJ · on Jan 20, 2021

This is an unfortunate analogy given how complicated and poorly understood the heat-affects-food chemical reactions are. And yet food gets cooked.

NateEag · on Jan 20, 2021

I'd say that makes it an apt analogy.

It just makes a point opposite to the author's intent.

Someone · on Jan 20, 2021

Are you sure about the author’s intent? Unless you know the writer very, very well, I think it’s impossible to tell from written text.

NateEag · on Jan 20, 2021

You make an excellent point. I shouldn't have been sure about the author's intent.

Thank you for pointing out my error.

mewpmewp2 · on Jan 20, 2021

Exactly, you don't need to know how molecules act under the hood to be productive cooking.

IncRnd · on Jan 20, 2021

Until you need to substitute ingredients, you use a different type or rangetop, the pan metal or thickness is different, or something else changes. This was a perfect analogy. People can follow the recipe of assemling various frameworks and toolkits to create software, but they will eventually encounter difficulty.

You may not need to understand molecular differences, but knowing hows fats, proteins, and carbs work, along with how to substitute ingredients, and different theories of cooking all help to actually understand why you can cook until mysteriously things don't work the same as yesterday.

vendiddy · on Jan 20, 2021

I've been learning to cook recently and I haven't the slightest idea how heat affects food.

IncRnd · on Jan 20, 2021

It is said that each fold on a chef's hat represents a different way they know how to cook an egg. Thet need to understand when to apply low heat, high heat, add liquids, when proteins denature and when they coagulate.

Of course, following a recipe in specific ideal circumstances works, as long as nothing changes.

stabbles · on Jan 20, 2021

Lisp programmers don't care how heat affects food since the 1950s.

gct · on Jan 20, 2021

a pointer is an array index

psychoslave · on Jan 20, 2021

What? I thought first they should go through TAOCP in integral. They should be grateful if you allow them to use MMIX instead of MIX.

Dirlewanger · on Jan 20, 2021

That's how they used to do it in universities back in the 80s/90s. Start out with the hard stuff first in Intro to Computer Science. By the end of the course, you'd go from 400 people to about 50. Wish they'd still do that to keep numbers low.

tzs · on Jan 20, 2021

Although doing it "to keep numbers low" doesn't seem to me to be a good reason to do it, it is generally a good idea to have some hard stuff early in a field.

The idea is that if the hard stuff comes too late, people might not find out that they aren't really suited to that field as a career until it is too late to change majors without having to take an extra year or two of college.

The intro class is too early for that, though, because CS is useful enough for other fields that many people (perhaps even a majority of people) taking intro CS don't intended to actually go into CS. Better is to put it in the first class in a field that will mostly only be taken by people who are intending to go into that field.

TT3351 · on Jan 20, 2021

The best college courses I took were ones where the professor frontloaded all the hard assignments towards the first month of the semester. Every slacker dropped out, and for most of the semester we just had great discussions because everyone was doing the readings.

Just like DRM, make it a little harder to skate on by and you actually filter a lot of casual/non-serious people.

dotancohen · on Jan 20, 2021

You are going to get downvoted to hell for "gatekeeping" but you're right.

Instead of just graduating 50 people who "get the trade", we're now diluting them with another 350 that "made the grade". Then tech firms need wild, crazy interview processes to sort them back out. Everybody suffers, including the 350 who may have been productive and happy in another, more suitable profession.

Dirlewanger · on Jan 20, 2021

And it's weird that people see it as gatekeeping, since there's plenty of other professions out there that artificially restrict supply, and no one raises a sthink about them. People do about software because we work online and are constantly giving opinions no one asked for.

It's less about gatekeeping, and more about sustaining a healthy industry that is quickly headed for a disastrous race to the bottom. Google and friends are spending billions under the magnanimous "bringing coding to everyone" when they really just want to lower developer salaries. Not everyone needs to learn how to code, just like not everyone needs to be a doctor, or learn how to be a carpenter. I constantly wonder when will be the right time to "get out" before salaries start their inevitable side downwards. Could be a couple decades though.

dragonwriter · on Jan 20, 2021

Honestly, the problem is that we don't distinguish cardinal from ordinal numbers in programming As an offset (which is cardinal—how far something is from a starting point) or really any cardinal use, zero is the natural starring point. For ordinals, “1st” is the natural starring point. But we don't write “1st” in programming, even when we used 1-based indexing to try to match an ordinal intuition, we just write “1”. If we distiguished ordinals and allowed both cardinal offsets and ordinal indexes to be used, I think we’d be fine.

pwdisswordfish5 · on Jan 20, 2021

But zero is an https://en.wikipedia.org/wiki/Ordinal_number_(mathematics) .

The unintuitive step is apparently identifying the index with the cardinality of the collection from before the item arrives, not after it arrives – i.e. ‘the n-th element is the one that arrived after I had n elements’, not ‘the n-th element is the one such that I had n elements after it arrived’. The former identification results in 0-based ordinals, the latter leads to 1-based ordinals – and to some misconceptions about infinity, such as imagining an element ‘at index infinity’ in an infinite list, where no such need to exist.

pegasus · on Jan 20, 2021

It's only an ordinal because we've confused ourselves into saying it can be. It's the same situation as with starting to count from zero in most programming languages, and from the wikipedia page for "0th", it appears to actually be the effect of an influence of those artificial language choices onto the mathematical concept of ordinal. After all, it is in the end just a convention, and we could start calling the first element of a sequence "the zeroth" instead, but there would be no benefit for it at the language level - which is where it matters. These artificial languages we're creating should try to mold themselves to how we already use language, but the opposte is what's been happening to some degree.

nvader · on Jan 20, 2021

https://danverbraganza.com/writings/zero-based-ordinals ;)

adrian_b · on Jan 20, 2021

"1st" is not a "natural" choice for the starting point of ordinal numbers.

If anything, it is an artificial choice, because in programming it is derived from a property of most, probably of all, human languages that have ordinal numerals.

In most, probably in all, human languages the ordinal numerals are derived from the cardinal numerals through the intermediate of the counting sequence 1, 2, 3, ...

When cardinal numbers appeared, their initial use was only to communicate how many elements are in a set, which was established by counting 1, 2, 3, ...

Later people realized that they can refer to an element of a sequence by using the number reached at that element when counting the elements of the sequence, so the ordinal numerals appeared, being derived from the cardinals by applying some modifier.

So any discussion about whether 1 is more natural than 0 as the starting index goes back to whether 1 is more natural as the starting point of the counting sequence.

All human languages have words for expressing zero as the number of elements of a set, but the speakers of ancient languages did not consider 0 as a number, mainly because it was not obtained by counting.

There was no need to count the elements of an empty set, you just looked at it and it was obvious that the number was 0.

Counting with the traditional sequence can be interpreted as looking at the sequence in front of you, pointing at the right of an element and saying how many elements are at the left of your hand, then moving your hand one position to the right.

It is equally possible to count by looking at the sequence in front of you, pointing at the left of an element and saying how many elements are at the left of your hand, then moving your hand one position to the right.

In the second variant, the counting sequence becomes 0, 1, 2, 3, ...

The human languages do not use the second variant for 2 reasons, one reason is that zero was not perceived as having the same nature as the other cardinal numbers and the other reason is that the second variant has 1 extra step, which is not needed, because when looking at a set with 1 element, it is obvious that the number of elements is 1, without counting.

So neither 0 or 1 is a more "natural" choice for starting counting, but 1 is more economical when the counting is done by humans.

When the counting is done by machines, 0 as the starting point is slightly more economical, because it can be simpler to initialize all the bits or digits of an electronic or mechanical counter to the same value for 0, than initializing them to different values, for 1.

While 1 was a more economical choice for humans counting sheep, 0 is a choice that is always slightly simpler, both for hardware and software implementations and for programmers, who are less likely to do "off by one" errors when using 0-based indices, because many index-computing formulas, especially in multi-dimensional arrays, become a little simpler.

In conclusion, the choice between 0 and 1 never has anything to do with "naturalness", but it should always be based on efficiency and simplicity.

I prefer 0, even if I have programmed for many years in 1-based languages, like Fortran & Basic, before first using the 0-based C and the other languages influenced by it.

scambier · on Jan 20, 2021

> whether 1 is more natural as the starting point of the counting sequence.

Seriously?

- "Hey Timmy, how many pens do you have on your desk?"

- "Let's count them! Zero, One, Two."

- "So how many is that?"

- "Well since I ended my counting sequence with 'two', it totally makes sense to announce that I have three pens."

gjm11 · on Jan 20, 2021

I think you're conflating "natural" with "familiar".

Here are two processes Timmy could use to count the pens on his desk. (I have numbered them starting with 1, because I considerately adapt to the needs of my audience :-).)

1. Say "0". Then increment for each object. So he goes: zero, (points to pen) one, (points to pen) two, (points to pen) three.

The advantage of this is that you don't need a special case when the number of pens turns out to be zero. Hey, Timmy, how many unicorns on your desk? Timmy starts by saying "zero", looks around, no unicorns, finished: it's the same process as for pens, it just stopped earlier.

2. Number the objects from 0, and then the count is the next number after the ones you listed. So he goes: (points to pen) zero, (points to pen) one, (points to pen) two, so the number is three.

This corresponds more closely to how indexing works in computers, and matches up nicely with the so-called von Neumann construction in mathematics, but in other ways I find it less natural than #1. But I am confident that you could teach it to children, and they would find it natural, and think it weird to mix up "the number of objects" with "the number of the last object". In this case, the only thing that changes from your dialogue is that Timmy says "Well, since the next number is three, it totally makes sense to announce that I have three pens." You count the pens, then you say how many there are. Zero, one, two, three. Not zero, one, two, two as you would prefer. What are you, some kind of weirdo or something? :-)

adrian_b · on Jan 20, 2021

I did not explain in detail how a human should count, because I think that for a programmer it should have been clear that the exit criterion from the loop is in both cases to have no elements at the right of your hand.

So your Timmy has a bug in his loop code, which caused a premature exit :-)

He should have counted 0, 1, 2, 3. Like I have said, humans do not count like this, because there are 4 steps instead of 3 steps.

For humans, it is easier to have an "if" before the loop, selecting between the special case of an empty set and a non-empty set, whose elements must be counted.

Nonetheless, an electronic counter would have really counted 0, 1, 2, 3, unlike you.

scambier · on Jan 20, 2021

You wrote a lengthy post trying to explain how it could totally be possible that humans start counting at zero, yet the whole premise of it is wrong: we start counting at 1 because it's natural. You cannot assume otherwise.

> He should have counted 0, 1, 2, 3

Is that how _you_ count? You imagine some pointer going _between_ items, start at zero, and stop once you have nothing at the right of your pointer?

Or, little Timmy could also skip the "zero" part (because you know, he's not dumb nor blind, and sees that he has several pens on his desk), start at 1 while pointing _at_ the first pen, and count up to 3.

adrian_b · on Jan 20, 2021

Pointing directly to the objects that you are counting is not natural, it is just a bad habit.

People who are interrupted while they are counting and who have to restart the counting later, frequently forget whether they have already counted the pointed object, or not.

Pointing between the objects avoids these errors.

When I was young, I also had the impression that counting from 1 is natural. When I first encountered 0-based languages, I was annoyed by them, especially because I knew how Fortran compilers are implemented and I was familiar with all the tricks that are used by compilers to minimize the extra operations that are required to work with 1-based arrays.

So my initial naive thought was that maybe the implementers of C were not competent enough. However, after many extra years of experience, I believe that all the tricks for the efficient implementation of 1-based arrays are not worthwhile, because even for programmers it is more efficient to think using 0-based indices.

So now I really consider 0, 1, 2, 3 ... as a more "natural" counting sequence, unlike when I was a child.

mewpmewp2 · on Jan 20, 2021

Can you always truly count between the objects? Are you in between or are you at some object at any given time? What if you count molecules?

With memory slots is the pointer in between or at a slot?

Should pages in books also start at 0? If you are saying show me page 4 should you look between 3 and 4?

TuringTest · on Jan 20, 2021

You don't need to start at 0 to count intervals between objects, it's just easier to say "one" and point your finger to the space to the right of the leftmost item.

nybble41 · on Jan 20, 2021

> … and point your finger to the space to the right of the leftmost item.

What if there is no leftmost item? Because you skipped the first step, essentially assuming that there was at least one item and including it in your initial condition, that process fails when the set of objects to be counted is empty. Humans are good at dealing with special cases like this; they'll notice the discrepancy and report "zero" instead of following through with the predetermined process. Computers, not so much. So instead we start with a count of zero and all items in the as-yet-uncounted set as our initial condition, and increment the count by one as we move each item from the "uncounted" set to the "counted" set, which correctly handles the case of the empty set.

TuringTest · on Jan 20, 2021

This whole subthread was about what comes as a natural procedure to humans. Arguing that computers may need a different process is moot.

Anyway, to handle the exception, you could start the algorithm with a "check for the empty collection" base case; that would make it a natural definition for a computational recursive definition as well.

nybble41 · on Jan 20, 2021

My point was that the natural procedure for computers is also more natural for humans, since it involves fewer special cases. Humans are better than computers at dealing with ad-hoc special cases but that doesn't make them natural.

TuringTest · on Jan 20, 2021

Having fewer special cases doesn't make it more natural, when those special cases are intuitive and forcing a single homogeneous process feels weird.

The brain is very good at finding patterns for special cases and very bad at following repetitive orders.

sdenton4 · on Jan 20, 2021

(And then, having pointed at the pens, Timmy got confused and tried to write down his answer using his finger. Annoyed that no actual writing occurred, he cut off his finger and threw it away. The next day, he was again confused to find his desk all cluttered with crap that somewhat resembled pens, despite having so bloodily thrown away his pointer.)

IncRnd · on Jan 20, 2021

Timmy, how many pens do you have on your desk?

Answer: none

pegasus · on Jan 20, 2021

I doubt that cardinals came before ordinals. To me, at least at first glance, they both seem equally conceptually primitive.

As far as efficiency, this should be the task of the compiler. We shouldn't have to start counting with the "zeroth" just to save the compiler a few cycles (or a few mental cycles to whomever writes the compiler/interpreter). That's why we don't program in assembly after all.

But I agree with GP, we shouldn't start counting from 1 either, but from "1st". Keeping ordinal separate from cardinals would be closest to how natural language works and I bet it would eliminate all kind of errors and gotchas.

coldtea · on Jan 20, 2021

>a sure sign that it is non-intuitive

Well, 0-based is not what we use in everyday life, and even arriving at the invention of 0 took us several millenia, so I'd say its non-intuitiveness is pretty much established...

gjm11 · on Jan 20, 2021

Intuition depends on what you're used to.

The ancient Greeks didn't think 1 was a number; as late as ~1500CE Isidore of Seville was saying "one is the seed of number but not number". After all, if there's only one thing you don't need to think of it as a set of things, so there's no counting to do.

(I think in some quarters it was even argued that two isn't a number, but I forget the details. Compare the fact that some languages treat pairs of things differently in their syntax from larger groups.)

But it turns out that mathematically it's a really good idea to count 1 as a number and allow sets of size 1, and these days everyone does it and no one is very confused by it. (Well, maybe by the "sets of size 1" thing. Exhibit A: Perl. Exhibit B: Matlab.) And because we're now all used to thinking of 1 as a number, it's hard even to understand how anyone might have found it unintuitive.

If we all started counting from zero and taught our children to do likewise, intuitions 50 years from now might be somewhat different from what they are now.

gfxgirl · on Jan 20, 2021

We use it all the time. It's a measure of distance. If you haven't moved from where you started you've moved zero distance.

That's what it represents in indexing, the distance you have to travel to get to the element you want.

coldtea · on Jan 20, 2021

>We use it all the time. It's a measure of distance. If you haven't moved from where you started you've moved zero distance.

That's an offset. It's already covered in detail in TFA.

We don't deal with such offsets in programming, except in C pointer arithmetic (which infested this upon us).

teekert · on Jan 20, 2021

Well, I was born aged 0, turn 1 at the start of my second year on the planet.

coldtea · on Jan 20, 2021

You didn't have 1 year, but you had 1 day old or 2 weeks old, or 3-6 months old, etc.

People never said and still dont' say "this baby is 0 years old". They start at the first complete item of the lower level time units and progress from there...

mc32 · on Jan 20, 2021

For ages people use the unit that best captures the spirit of what they want to communicate.

So they may say one hour old, five days old, six months old, etc. a fifty year old typically doesn’t say they are a half centenarian either. Now decades are used to express an age range like so and so is a nonagenarian to avoid being specific.

improbable22 · on Jan 20, 2021

If you ask kids how old they are, you will often get "5 and a half" etc. They are, correctly, thinking of time as a continuous variable. You "turn 1" when your age reaches 1.0.

IncRnd · on Jan 20, 2021

0-based is absolutely what we use in everyday life.

How many more things can I fit in that box? How many comments are in that HN page?

I'd write more, but I'm running on empty and my pen is out of ink.

jhayward · on Jan 20, 2021

Really? What value does the 'minutes' digit of the first minute of an hour have?

userbinator · on Jan 20, 2021

The main argument I have against 1-based indexing is that it makes a lot of indexing calculations, which all programmers will need to learn at some point, even more confusing with all the additional +/-1. In other words, 1-based is easier only for the easy cases, but harder for the hard cases.

In some ways I think it's similar to the endianness debate: big endian looks natural at first glance, but when you see the expression for the total value, having the place value increase with the address just makes far more sense since it eliminates the extra length-dependent term and subtraction.

sergeykish · on Jan 20, 2021

First element found at offset 0. Like ruler.

Closed and right-open intervals in Ruby:

    (1..12).to_a
    #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

    (1...13).to_a
    #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

erhk · on Jan 20, 2021

I see myself losing hours on this typo

bregma · on Jan 20, 2021

We need a language that goes one better, like a cross between Python and Ruby that uses two spaces for closed ranges and three spaces for open.

TuringTest · on Jan 20, 2021

How about using different symbols?

   |-1..12-|.to_a ## e.g. months in a year
    #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

    /1..\13.to_a  ## e.g. hours from 1h (1AM) to 13h (1PM)
    #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

It requires compiler support, but hey, you said a new language.

ZephyrBlu · on Jan 20, 2021

I couldn't tell the difference for quite a few seconds there.

sergeykish · on Jan 20, 2021

I see children losing hours on range(1, 12) in Python, have you noticed error? Unsafe defaults introduced to reuse function as

    for i in range(len) # 0, 1, ..., len - 1

stabbles · on Jan 20, 2021

And Rust does 1..13 and 1..=12

daef · on Jan 20, 2021

And nim does 1..<13 and 1..12

which is why switching between rust and nim is a real pain in the buttox

ziml77 · on Jan 20, 2021

I feel like it's nim that made the bad choice here. Not because one meaning of .. naturally makes more sense than another but because it breaks the expectations set by other languages.

Jtsummers · on Jan 20, 2021

But Nim isn't unique. Ada is, now, approaching 40 years old since publication. It uses closed intervals:

  1..12 = [1,12]

With no syntactic notion for half-open intervals.

ziml77 · on Jan 20, 2021

I didn't realize that the syntax dated back to Ada. Though interestingly it doesn't feel off to me in the context of a 1-based index language.

henrikeh · on Jan 20, 2021

Ada is actually not 1-based. Arrays are accessed using an index type which is defined at declaration. It can be any discrete type, including ranges starting at 0 (or 1, or -1, or anything else which fits the domain).

ziml77 · on Jan 20, 2021

Interesting! I looked at a few different sites for examples of arrays in Ada and saw them all starting at 1. But that's a neat feature that it lets you pick the start and end.

Jtsummers · on Jan 20, 2021

Ada isn't even 1-based, though many of its built-in array-based types are 1-based. It's "whatever ascending, consecutive range you want"-based. Which, IMHO, is the best option. You can use 0-based where it makes sense and 1-based where it makes sense, and -100 based where it makes sense, or 'a'..'z' based where it makes sense.

And I think that last example demonstrates why the closed range notation makes a lot of sense over half-open (at least for languages that permit describing ranges of arbitrary discrete types). When talking about numbers a half-open range notation is clear and useful because numbers exist in an obvious sequence (so long as we're sticking to some subset of the real numbers, complex numbers don't have as obvious an ordering). But when talking about other ranges, like characters, how would you use a half-open range (in a clear way) to describe the characters a-z? 'a'..'{'? Is that actually clear?

What about describing the range when you want to include the last valid value? How do I describe, using the byte type itself, a range 0 through 255:

  type Byte is mod 256;
  subtype Full_Range is Byte range 0..255; -- valid Ada, closed range notation
  subtype Full_Range is Byte range 0..256; -- invalid Ada, half-open notation

I'm now describing my range of a specific type using an element that isn't part of the type. For numbers this can work. But what is the character after the last character of the set of characters?

  subtype Characters_After_And_Including_Lower_A is Character range 'a'..???;
  -- presumably we'd have a better name in a real application

In a half-open notation that's impossible to write and be explicit about the upper bound. You can only describe it with an implicit upper bound by leaving the ??? blank as in:

  subtype Characters_After_And_Including_Lower_A is Character range 'a'..;

A valid solution. Though in Ada this would be a no-go as it leaves open some ambiguity/possibility for error in the use of the `..` notation (what if the author didn't mean for it to run to the end but forgot to type the upper bound?), so a second notation would have to be formed. Perhaps a third `.` or some symbol indicating that it's open-ended:

  Character range 'a' ...; -- perhaps?

bvrmn · on Jan 21, 2021

Yes it is convenient to express subsets/subtypes as closed intervals. But it doesn't cover all CS nicely.

tragomaskhalos · on Jan 20, 2021

Citing Rust is cheating - we know that it does everything in the most elegant manner. A bit like deriding the other kids in the class because they're not as good as the star pupil :)

boxed · on Jan 20, 2021

It's only easier to learn because that's how we teach our kids. The first and most basic number is not one. It's zero.

I am doing my best in teaching my kids this way but the world around has the wrong idea. We'll see how it goes...

stabbles · on Jan 20, 2021

This might be an unpopular opinion, but I'd say it's more natural for kids to relate the word 'three' to 'third' and 'four' to 'fourth', instead of 'two' to 'third' and 'three' to 'fourth'.

bregma · on Jan 20, 2021

Yes, that's why its hard to explain to kids that a two-year-old is in their third year of life. By the time of their third year of school (second grade) they usually get it, though.

We should be careful not let such easily observed facts inhibit our efforts to dumb down curriculum.

IncRnd · on Jan 20, 2021

Kids are taught units, so they express themselves properly and are understood by others. e.g. The third item over here is at a distance of two cm from the left edge.

aden1ne · on Jan 20, 2021

I've always found the English language to simply be lacking here. In for instance Dutch there is a very clear distinction between half-open and closed intervals: "x tot y" is always half-open, "x tot en met y" is always closed. Whereas in English, "x until y" can mean both things, with all the resulting misunderstandings.

Jtsummers · on Jan 20, 2021

While it isn't used often in conversational English, Common Lisp has a nice way of describing different ranges:

  for x from 0 to 10    ;; closed, [0,10]
  for y from 0 below 10 ;; (half-open, [0,10)

"below" is unambiguous, as 10 is not below 10.

Izkata · on Jan 20, 2021

We have "to" and "through" in English to distinguish them. Unfortunately "to" isn't always as strict as I'd like...

"Until" I'd think of as being the strict "to", except something about it feels off like it couldn't just be dropped in every place to/through can be used.

sli · on Jan 20, 2021

I think it's unintuitive because it's taught in that unintuitive way of just saying "zero is first." I know we can't get too terribly deep into this stuff with esp. young children, but we can at least gently introduce them to the concept of offset v. ordinal, as the former is a much more intuitive way to actually understand array indexes. Trying to liken indexes to ordinal numbers just promotes rote memorization, which is already time which would be better spent on other things.

spiffytech · on Jan 20, 2021

I've come to feel that "arrays are pointers with offsets" is a largely-irrelevant justification for promoting zero-based indexes in today's programming landscape. Looking at the TIOBE top 10 programming languages for this month, 6 of 10 languages don't even offer a way to address memory with numeric pointers (and that climbs to 8/10 if you rule .NET as "pointers are technically available, but not commonly used and are somewhat discouraged").

If I grab JS/Java/Python/PHP, an array/list is a magic sequence of values with opaque under-the-hood operations, and my only interface to it is the item index. In that context, promoting idioms because they make mechanical sense in a language I'm not using doesn't seem compelling.

lrossi · on Jan 20, 2021

I think it’s even harder to wrap one’s head around negative indices in Python, and all the ways to do slicing. They should have added a reversed keyword instead of the current complexity.

antpls · on Jan 20, 2021

My own experience disagrees with your anecdote, but I guess it's different for everyone.

I learned programming by myself when I was a teenager, and 0-indexed never was an issue.

We learn very early on as kids at school that there are 10 digit numbers, not 9. You learn to write 0 before any other characters. On every digit pad, such as phone dial, there is a "0".

If I ask you the list of digit numbers, 0 is the first element.

There is nothing counter-intuitive about that in my opinion, even for a kid.

janderson3 · on Jan 20, 2021

Interesting. Python is my go to teaching language. It beat the snot out of learning programming with Java. As a newbie, what does "public static void main" mean? So what's your choice for a teaching language? Maybe Lua or Scheme? They seem to have few surprises which is good for teaching.

dTal · on Jan 21, 2021

BASIC, for explaining the fundamentals of imperative programming. Its weakness as a "real" language is its strength as a training language.

Then Scheme, once they feel straightjacketed with BASIC; it will teach them what abstraction is and how to break code into chunks. It should be a bit of a lightbulb moment. They also get a type system.

Finally, Julia, once they get sick of the parentheses and difficulty of building "real" programs (you'll do a lot of DIY FFI to use standard libraries in Scheme). From this they will also learn array programming and the idea of a type hierarchy.

The only trouble with this curriculum is the student will be completely ruined for ever programming in C or Java. They will not have learned to tolerate the requisite level of tedium, or even programming without a REPL.

rockostrich · on Jan 20, 2021

Kotlin or Scala if you want Java without the verbosity of Java?

kazinator · on Jan 21, 2021

> My son is doing Python at school, and the "first-element-is-actually-element-zero" is something that has to be dinned into them

Maybe kids don't use rulers now? Just tell them that arrays are like rulers.

heavenlyblue · on Jan 20, 2021

There's a huge difference between a teaching language for _kids_ and _adults_. That's pretty much the case for most of the skills you would ever teach, not just computer languages.

liveoneggs · on Jan 20, 2021

like all of the special magic strings? the bloated syntax? the impossible (for professional adults!) environmental and ecosystem issues?

_0w8t · on Jan 20, 2021

Another thing is that in C null pointer is invalid. Yet the index zero is valid. With one-based indexes this discrepancy is removed.

moocowtruck · on Jan 20, 2021

i have two daughters learning python right now and they did not have an issue with 0 based and i only explained it once.. So i don't think know if i'd say your experience is concensus

bregma · on Jan 20, 2021

People have to learn something? That's terrible.

samatman · on Jan 20, 2021

0-based indexing with closed intervals is better for slicing. This shouldn't be controversial. It's because you can represent a zero interval cleanly: [3,3) is an empty interval after slot 2, representing a single cell is [3,4).

This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.

That's the one that really gets us something. It means you can do relatively complex offset math, without having to think about when you need to add or subtract an additional 1 to get your result.

I use Lua every day, and work with abstract syntax trees. I mess this up all. the. time.

Of course you can use closed intervals and stick with 1-based indexing. But for why you shouldn't, I'm going to Appeal To Authority: read Djikstra, and follow up with these.

https://wiki.c2.com/?WhyNumberingShouldStartAtZero https://wiki.c2.com/?WhyNumberingShouldStartAtOne https://wiki.c2.com/?ZeroAndOneBasedIndexes

dragonwriter · on Jan 20, 2021

> 0-based indexing with closed intervals is better for slicing. This shouldn't be controversial.

base is irrelevant to this, and you want (and show!) half-open, not closed, intervals.

> This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.

Yeah, those are all properties of half-open intervals, irrespective of indexing base. It would be as true of π-based indexing as it is of 0-based.

celrod · on Jan 20, 2021

It's related to 0-based indexing in that if you you want to take/iterate over the first `N` elements, `0:N` works with 0-based indexing + close-open, but if you had 1-based and close-open, you'd need the awkward `1:N+1`.

This is why 1-based index languages normally use closed-closed intervals, so that they can use `1:N`.

I'm a die hard Julian (Julia is 1-based), but I do a lot of pointer arithmetic in my packages internally. I've come to prefer 0-based indexing, as it really is more natural there. 0-based plus close-open intervals are also nicer for partitioning an iteration space/tiling loops, thanks to the fact the parent commented pointed out on the end of one iteration being the start of the next. This is a nice pattern for partitioning `N` into roughly block_size-sized blocks:

  iters, rem = divrem(N, block_size)
  start = 0
  for i in [0,iters)
    end = start + block_size + i < rem
    # operate on [start, end)
    start = end
  end

But that's only slightly nicer. To translate this into 1-based indexing and closed-closed intervals, you'd just substitute the `# operate` line with

    # operate on [start+1, end]

the `[0,iters)` with `[1:iters]`, and `i < rem` with `i <= rem`.

1- vs 0-based indexing is bike-shedding. A simple question we can all have opinions on that's easy to argue about, when it really doesn't matter much.

Julia uses 1-based indexing, but its pointer arithmetic is (obviously) 0-based, because pointer arithmetic != indexing. Adding 0 still adds 0, and adding 1 and `unsafe_load`ing will give me a different value than if I didn't add anything at all. (This is just reemphasizing the final point made by the blog post.)

cbolton · on Jan 20, 2021

Do you use 0-based arrays in your Julia code (since Julia supports arbitrary first indices, e.g. with OffsetArrays)? If not, why not?

celrod · on Jan 20, 2021

I have on occasion, and when working on custom array types I've added support for being offset with optionally compile-time known offsets. As StrideArrays.jl matures and I write more libraries making use of it, I may use 0-based indices more often. The idea of dynamic offsets when they aren't needed bothers me, even though I've benchmarked that as being pretty meaningless. The bigger problem is just that OffetArrays are a wrapper, and there's a Julia bug where TBAA information on wrappers often gets lost, so that the compiler reloads the pointer you're loading from on every iteration of a loop. Aside from that being slow itself, it also causes the autovectorizer to fail. This causes a severe regression when it occurs. Performance should be fine in other cases. LoopVectorization or `@simd ivdep` should also avoid the problem.

For the most part, my preference on 1 vs 0 is weak, and for coffee other people are likely to look at i do want to make it easy to understand / not full of surprises.

jomar · on Jan 20, 2021

The natural representation of intervals for doing arithmetic on is 0-based half-open.

Half-open because of the slicing properties, as noted in your posting and the grandparent posting.

0-based because of the simplification for converting between relative coordinate systems. Suppose you have one interval A represented as offsets within a larger interval B, and you'd like to know what A's coordinates are in the global coordinate system that B uses. This is much easier to compute when everything uses 0-based coordinates.

Here is a slightly longer discussion of that in a genomics context: https://github.com/ga4gh/ga4gh-schemas/issues/121#issuecomme... and a draft of a document I wrote up (again, in a genomics context) so as never to have to have this discussion ever again: https://github.com/jmarshall/ga4gh-schemablocks.github.io/bl...

sdbrady · on Jan 20, 2021

I think this is a good write up, but I think the notation is still carrying some mental baggage. It's not necessary to have these open and closed brackets/parenthesis. They don't add anything, and if anything, they confuse the matter. An interval is just (1, 2) (or [1, 2] if preferred aesthetically). Since a base cannot be "on" either 1 or 2, it's not meaningful to have these open/closed interval notion. In other words, (1, 2) == [1, 2) == (1, 2] == [1, 2].

Open/closed intervals only come into play in continuous dimensions. DNA sequences, arrays in memory, et al are discrete.

samatman · on Jan 20, 2021

On the first point, yep, completely misspoke, what you don't want is open intervals.

To the second point, as I said: Djikstra's argument for using 0 instead of 1 with half-open intervals is, to my taste, perfect. As I have nothing to add to it, I will simply defer.

bvrmn · on Jan 20, 2021

For </<= intervals and 1-based indexing you have to write `for(i=1; i++; i <= N)`. So you lost nice property of having `upper - lower` number of iterations.

dragonwriter · on Jan 20, 2021

For half-open intervals, you have upper-lower iterations regardless of base. In C for loop terms, it's:

  for (i=lower;i<upper;i++) {}

If you are using 0-based indexing, lower is zero and upper is the number of elements for a full iteration over all indexes; with 1-based indexing lower is 1 and upper is one greater than the number of elements.

Now, despite the utility of half-open intervals, most people’s intuition is around ordinal index ranges, so 0-based indexing is counterintuitive with half-open intervals because slices start at 0, and 1-based indexing is counterintuitive with them because they end at N+1.

This is because, useful or not, half-open intervals are counterintuitive, but they are worth developing the intuition for because they combine nicely.

bvrmn · on Jan 20, 2021

> with 1-based indexing lower is 1 and upper is one greater than the number of elements.

Do you see +1 tweak? Also consider that "for loop" can't be expressed as </* interval and always expressed as <=/* interval. So 1-based indexing have to be either 1 <= i <= N (closed interval, bad, N - 1 != number of iterations) or 1 <= i < N + 1 (half open interval, good, but +1 tweak).

quietbritishjim · on Jan 20, 2021

> I use Lua every day, and work with abstract syntax trees. I mess this up all. the. time.

I think this is the clincher in your argument (rather than Djikstra). I only use languages with 0-based indexing, and it seems natural to me, but I could almost be convinced that if only I used 1-based indexing regularly then I'd be fine with it too. But here you are with actual experience of using 1-based indexing regularly and you don't feel that way, which blows that idea out of the water.

Lio · on Jan 20, 2021

I've messed up array indexing under both systems. I prefer 0-based but only due to familiarity if I'm being honest.

Leads me to the old joke:

There are only 2 difficult things in computing. Naming things, cache invalidation and off by one errors.

wruza · on Jan 20, 2021

A programmer “strengthens their ankle and leans forward by holding their arm out, preparing to relax the shoulder” into a bar. The bartender bursts out laughing for few sprints.

I mess it up as well, both in Lua and C. The key to success is to not calculate values by adding or subtracting them, but to use meaningful names instead. E.g. dist(a, b), holen(a, b), endidx(start, len) and so on. Forget about ever writing “1” in your code, point your finger and call operations out loud. It doesn’t eradicate errors, but at least elevates them to the problem domain level. Off by one is not exclusive to 1-based or 0-based, it is exclusive to thinking you’re smart enough to correctly shorten math expressions in your head every damn time.

(But sometimes I also feel too clever and after some thinking I’m just writing (i+l2+b+2) because it’s obvious if b already comes negative and means “from behind” in “0-based -1 is last” terms.)

WA · on Jan 20, 2021

True, but there are examples where 1-based indexing is easier, like returning the last element based on length. I think array[array.length] is easier to understand than array[array.length - 1].

Or the predecessor of the last element: array[array.length - 2] makes you think, whereas array[array.length - 1] is more obvious.

robertlagrant · on Jan 20, 2021

This is why Python has negative indices :) Then it's just my_list[-1] or my_list[-2].

KarlKode · on Jan 20, 2021

But shouldn't it be [-0] to get the last element if [0] denotes the first element?

codeflo · on Jan 20, 2021

I think the most mathematically natural way to interpret negative indices is to threat them as numbers modulo list length. Then 0 = length, and -1 = length - 1.

bvrmn · on Jan 20, 2021

C programmers can give you a couple of advises about "correct" modulo on negative numbers.