You probably used GPT-3.5. That said, I didn't manage to get GPT-4 to calculate 36,330 * 26,951 correctly. I suggested casting out 9s, casting out 11s, doing long multiplication, reversing the digits - nothing.
I have a theory that it does arithmetic badly because the logic goes right-to-left, when LLMs write left-to-right. If the digits were to be reversed, it might not make as many mistakes. I ran out of attempts before I could test this properly.
> Human: what is 36,330 26,951 * (input is expressed in reverse post fix notation)
> chatGPT: To evaluate this expression using reverse Polish notation, we need to use a stack to keep track of the intermediate results. Here's how we can evaluate the expression:
1. Read the first number "36,330" and push it onto the stack.
2. Read the second number "26,951" and push it onto the stack.
3. Read the operator "", pop the top two numbers (26,951 and 36,330) from the stack, multiply them, and push the result (971,914,830) onto the stack.
4. The stack now contains only one number, which is the final result.
Therefore, 36,330 26,951 = 971,914,830 in reverse Polish notation.
You should ask it to show its working. The problem is that it uses a fixed amount of thinking for each character - and it writes left to right - while long multiplication is computed right to left. Past a certain number of digits, the amount of "thinking" needed to compute long multiplication in the wrong order (left to right) increases without bound -- accuracy suffers as a result.
A tangential point: Note that multiplication of decimal numbers _has_ to start with the rightmost digit to be done accurately. Consider x = 2/3 + 1/3 = 0.6(6) + 0.3(3), all digits repeating infinitely many times, so there's no rightmost digit: Notice that whichever digit you choose for x before the decimal point (either 0 or 1) could be wrong if you change any digit of 1/3. This is called the Table Maker's Dilemma.
The second paragraph is a tangent. My point was made in the first paragraph.
You said: "I dont think this is about notation" - Why did you ever think it was?
I was thinking along the lines of maybe it reads the digits as individual tokens and reading from left to right it get confused. That's why I thought it might be about notation. I have learned it's definitely not the case (read a bit about what tokens actually mean)
I also tried the show your work methods. I will not paste my conversations here as they are so long but even with showing it's work it doesn't give the right answer. Two funny things I noticed
1. It either uses a completely wrong method to calculate it, or it shows the right "kind" of work and then gets the wrong answer
2. If I tell it it was wrong and it will just agree with me. I multiply two huge numbers and tell it no the answer is actually 42, it will just apologize. Then when I ask the reason why it thinks it's right it will give the most bullshit things lol. Once it even said "I read the number wrong, the second number should be x I read y". The thing is if I had actually given x in the input it would have gotten the answer right.
That’s not ‘basic arithmetic’. Most humans would need a pen and paper to solve it.
GPT’s attention window is not equivalent to it being able to ‘write stuff down’ - it’s its equivalent of being able to remember the foregoing few minutes of conversation.
Interesting. I didn't use as large of numbers as you did, but in the low thousands, and ChatGPT w/ GPT-4 got them correct 3 times in a row.
What's also interesting, though not necessarily surprising, is how relatively close the incorrect answers are in your examples. It's not like the model will randomly spit out an answer like 42 if you ask it to multiply something like 36,330 by 26,951.