> The simplest example is “list all of the presidents in reverse chronological order of their ages when inaugurated”.
This question is probably not the simplest form of the query you intend to receive an answer for.
If you want a descending list of presidents based on their age at inauguration, I know what you want.
If you want a reverse chronological list of presidents, I know what you want.
When you combine/concatenate the two as you have above, I have no idea what you want, nor do I have any way of checking my work if I assume what you want. I know enough about word problems and how people ask questions to know that you probably have a fairly good idea what you want and likely don’t know how ambitious this question is as asked, and I think you and I both are approaching the question with reasonably good faith, so I think you’d understand or at least accommodate my request for clarification and refinement of the question so that it’s less ambiguous.
Can you think of a better way to ask the question?
Now that you’ve refined the question, do LLMs give you the answers you expect more frequently than before?
Do you think LLMs would be able to ask you for clarification in these terms? That capability to ask for clarification is probably going to be as important as other improvements to the LLM, for questions like these that have many possibly correct answers or different interpretations.
> I think it “understood” the question because it “knew” how to write the Python code to get the right answer.
That’s what makes me suspicious of LLMs, they might just be coincidentally or accidentally answering in a way that you agree with.
Don’t mean to nitpick or be pedantic. I just think the question was really poorly worded and might have a lot of room for confirmation bias in the results.
> List of US Presidents with their ages at inauguration
That’s what the python script had at the top. I guess I don’t know why you didn’t ask that in the first place.
Edit: you’re not the same person who originally posted the comment I responded to, and I think I came off a bit too harshly here in text, but don’t mean any offense.
It was a good idea to ask to see the code. It was much more to the point and clear what question the LLM perceived you asking of it.
The second example about buckets was interesting. I guess LLMs help with coding if you know enough of of the problem and what a reasonable answer looks like, but you don’t know what you don’t know. LLMs are useful because you can just ask why things may not work or don’t work in any given context or generally speaking or in a completely open ended way that is often hard to explain or articulate for non-experts, making troubleshooting difficult as you might not even know how to search for solutions.
You might appreciate this link if you’re not familiar with it:
I was demonstrating how bad that LLMs are at simple math.
If I just asked a list of ages in order, there was probably some training data for it to recite. By asking for it to reverse it, it was forcing the LLM to do math.
I also knew the answer was simple with Python.
On another note, with ChatGPT 4, you can ask it to verify its answers on the internet and to provide sources
You’re also scarface_74? Not that there’s anything wrong with sockpuppets on HN in the absence of vote manipulation or ban evasion that I know of, I just don’t know why you’d use one in this manner, hence my confusion. Karma management?
I saw a blue icon of some kind on the link you shared but didn’t click it.
No worries, that was somewhat ambiguous to me also, and confusing. I thought you might be a different person who had edited their comment after receiving downvotes. I mean, it’s reasonable to assume in most cases that different usernames are different people. Sorry to make you repeat yourself!
Maybe email [email protected] to ask about your rate limits as I have encountered similar issues myself in the past and have found dang to be very helpful and informative in every way, even when the cause is valid and/or something I did wrong. #1 admin/mod on the internet imo
The simplest example is “list all of the presidents in reverse chronological order of their ages when inaugurated”.
Both ChatGpt 3.5 and 4 get the order wrong. The difference is that I can instruct ChatGPT 4 to “use Python”
https://chat.openai.com/share/87e4d37c-ec5d-4cda-921c-b6a9c7...
You can do similar things to have it verify information by using internet sources and give you citations.
Just like with the Python example, at least I can look at the script/web citation myself