Some of the failure modes in LLMs have been fixed by augmenting LLMs with extern...

aspenmayer · on May 13, 2024

> The simplest example is “list all of the presidents in reverse chronological order of their ages when inaugurated”.

This question is probably not the simplest form of the query you intend to receive an answer for.

If you want a descending list of presidents based on their age at inauguration, I know what you want.

If you want a reverse chronological list of presidents, I know what you want.

When you combine/concatenate the two as you have above, I have no idea what you want, nor do I have any way of checking my work if I assume what you want. I know enough about word problems and how people ask questions to know that you probably have a fairly good idea what you want and likely don’t know how ambitious this question is as asked, and I think you and I both are approaching the question with reasonably good faith, so I think you’d understand or at least accommodate my request for clarification and refinement of the question so that it’s less ambiguous.

Can you think of a better way to ask the question?

Now that you’ve refined the question, do LLMs give you the answers you expect more frequently than before?

Do you think LLMs would be able to ask you for clarification in these terms? That capability to ask for clarification is probably going to be as important as other improvements to the LLM, for questions like these that have many possibly correct answers or different interpretations.

Does that make sense? What do you think?

JustExAWS · on May 13, 2024

(I seemed to have made the HN gods upset)

I tried asking the question more clearly

I think it “understood” the question because it “knew” how to write the Python code to get the right answer. It parsed the question as expected

The previous link doesn’t show the Python. This one does.

https://chat.openai.com/share/a5e21a97-7206-4392-893c-55c531...

LLMs are generally not good at math. But in my experience ChatGPT is good at creating Python code to solve math problems

aspenmayer · on May 13, 2024

> I think it “understood” the question because it “knew” how to write the Python code to get the right answer.

That’s what makes me suspicious of LLMs, they might just be coincidentally or accidentally answering in a way that you agree with.

Don’t mean to nitpick or be pedantic. I just think the question was really poorly worded and might have a lot of room for confirmation bias in the results.

JustExAWS · on May 13, 2024

I reworded the question with the same results in the second example.

But here is another real world example I dug up out of my chat history. Each iteration of the code worked. I actually ran it a few days ago

https://chat.openai.com/share/4d02818c-c397-417a-8151-7bfd7d...

aspenmayer · on May 13, 2024

> List of US Presidents with their ages at inauguration

That’s what the python script had at the top. I guess I don’t know why you didn’t ask that in the first place.

Edit: you’re not the same person who originally posted the comment I responded to, and I think I came off a bit too harshly here in text, but don’t mean any offense.

It was a good idea to ask to see the code. It was much more to the point and clear what question the LLM perceived you asking of it.

The second example about buckets was interesting. I guess LLMs help with coding if you know enough of of the problem and what a reasonable answer looks like, but you don’t know what you don’t know. LLMs are useful because you can just ask why things may not work or don’t work in any given context or generally speaking or in a completely open ended way that is often hard to explain or articulate for non-experts, making troubleshooting difficult as you might not even know how to search for solutions.

You might appreciate this link if you’re not familiar with it:

https://buckets.grayhatwarfare.com/

scarface_74 · on May 14, 2024

I was demonstrating how bad that LLMs are at simple math.

If I just asked a list of ages in order, there was probably some training data for it to recite. By asking for it to reverse it, it was forcing the LLM to do math.

I also knew the answer was simple with Python.

On another note, with ChatGPT 4, you can ask it to verify its answers on the internet and to provide sources

https://chat.openai.com/share/66231d7f-9eb1-4116-9903-f09a42...

JustExAWS · on May 13, 2024

I am the same person. I mentioned that in my original reply. That’s what I was trying to imply by this comment

> (I seemed to have made the HN gods upset)

I could see the Python in the original link when I asked. It shows up as a clickable link. It doesn’t show when you share it. I had to ask it.

aspenmayer · on May 13, 2024

You’re also scarface_74? Not that there’s anything wrong with sockpuppets on HN in the absence of vote manipulation or ban evasion that I know of, I just don’t know why you’d use one in this manner, hence my confusion. Karma management?

I saw a blue icon of some kind on the link you shared but didn’t click it.

JustExAWS · on May 13, 2024

I said the reason why, twice now

> I seemed to have made the HN gods upset.

My other account is rate limited for some odd reason. I looked back at my comments and I don’t see anything I said controversial.

The blue link is the Python code that was generated. I guess it doesn’t show in the app.

aspenmayer · on May 13, 2024

No worries, that was somewhat ambiguous to me also, and confusing. I thought you might be a different person who had edited their comment after receiving downvotes. I mean, it’s reasonable to assume in most cases that different usernames are different people. Sorry to make you repeat yourself!

Maybe email hn@ycombinator.com to ask about your rate limits as I have encountered similar issues myself in the past and have found dang to be very helpful and informative in every way, even when the cause is valid and/or something I did wrong. #1 admin/mod on the internet imo

wintermutestwin · on May 13, 2024

It is pretty awesome that you only have to prompt with “use python”