This is what I got on ChatGPT today. I assume it used GPT4: Prompt ChatGPT Actua...

ogogmad · on April 22, 2023

You probably used GPT-3.5. That said, I didn't manage to get GPT-4 to calculate 36,330 * 26,951 correctly. I suggested casting out 9s, casting out 11s, doing long multiplication, reversing the digits - nothing.

I have a theory that it does arithmetic badly because the logic goes right-to-left, when LLMs write left-to-right. If the digits were to be reversed, it might not make as many mistakes. I ran out of attempts before I could test this properly.

IX-103 · on April 22, 2023

Did you adjust the prompt to ask it how a famous mathematician would answer the question? Or what a calculator would say the answer is?

Sometimes LLMs get math wrong because people got math wrong on the training data and so they match the error frequency (https://learnprompting.org/docs/basics/roles).

roflyear · on April 22, 2023

Yes that's exactly what people are saying here. It's not a criticism of the tool it's an example of what the tool is and how it functions.

newswasboring · on April 22, 2023

I just had the following conversation on chatGPT

> Human: what is 36,330 26,951 * (input is expressed in reverse post fix notation)

> chatGPT: To evaluate this expression using reverse Polish notation, we need to use a stack to keep track of the intermediate results. Here's how we can evaluate the expression:

1. Read the first number "36,330" and push it onto the stack. 2. Read the second number "26,951" and push it onto the stack. 3. Read the operator "", pop the top two numbers (26,951 and 36,330) from the stack, multiply them, and push the result (971,914,830) onto the stack. 4. The stack now contains only one number, which is the final result.

Therefore, 36,330 26,951 = 971,914,830 in reverse Polish notation.

I dont think this is about notation.

ogogmad · on April 22, 2023

You should ask it to show its working. The problem is that it uses a fixed amount of thinking for each character - and it writes left to right - while long multiplication is computed right to left. Past a certain number of digits, the amount of "thinking" needed to compute long multiplication in the wrong order (left to right) increases without bound -- accuracy suffers as a result.

A tangential point: Note that multiplication of decimal numbers _has_ to start with the rightmost digit to be done accurately. Consider x = 2/3 + 1/3 = 0.6(6) + 0.3(3), all digits repeating infinitely many times, so there's no rightmost digit: Notice that whichever digit you choose for x before the decimal point (either 0 or 1) could be wrong if you change any digit of 1/3. This is called the Table Maker's Dilemma.

The second paragraph is a tangent. My point was made in the first paragraph.

You said: "I dont think this is about notation" - Why did you ever think it was?

newswasboring · on April 23, 2023

I was thinking along the lines of maybe it reads the digits as individual tokens and reading from left to right it get confused. That's why I thought it might be about notation. I have learned it's definitely not the case (read a bit about what tokens actually mean)

I also tried the show your work methods. I will not paste my conversations here as they are so long but even with showing it's work it doesn't give the right answer. Two funny things I noticed

1. It either uses a completely wrong method to calculate it, or it shows the right "kind" of work and then gets the wrong answer

2. If I tell it it was wrong and it will just agree with me. I multiply two huge numbers and tell it no the answer is actually 42, it will just apologize. Then when I ask the reason why it thinks it's right it will give the most bullshit things lol. Once it even said "I read the number wrong, the second number should be x I read y". The thing is if I had actually given x in the input it would have gotten the answer right.

It's all very confusing.

jameshart · on April 22, 2023

That’s not ‘basic arithmetic’. Most humans would need a pen and paper to solve it.

GPT’s attention window is not equivalent to it being able to ‘write stuff down’ - it’s its equivalent of being able to remember the foregoing few minutes of conversation.

roflyear · on April 22, 2023

Stop redefining things to support your argument. "The basic operations under arithmetic are addition and subtraction, division and multiplication"

jameshart · on April 22, 2023

Right, but when you say an LLM can’t do basic arithmetic you’re presumably talking about mental arithmetic, right?

roflyear · on April 22, 2023

optimalsolver · on April 22, 2023

Unless you're using Plus (black icon), you're using GPT-3.5 (green icon).

ravenstine · on April 22, 2023

Interesting. I didn't use as large of numbers as you did, but in the low thousands, and ChatGPT w/ GPT-4 got them correct 3 times in a row.

What's also interesting, though not necessarily surprising, is how relatively close the incorrect answers are in your examples. It's not like the model will randomly spit out an answer like 42 if you ask it to multiply something like 36,330 by 26,951.

MacsHeadroom · on April 22, 2023

> I assume it used GPT4

You did not. You have to explicitly select it from the dropdown which is only available on paid Plus accounts.