Hacker Newsnew | past | comments | ask | show | jobs | submit | more LifeIsBio's commentslogin

About a year ago I did some work collecting interesting blogs from HN users and shared it here:

https://news.ycombinator.com/item?id=32291993


It is exciting! As others in the thread are saying, the cost of individuals with severe rare diseases are also very high.

Here’s a recent attempt at quantifying the costs across all rare disease:

https://chiesirarediseases.com/assets/pdf/chiesiglobalraredi...


That’s exactly what happened. :)



Here's a thread where I fed all of his questions to ChatGPT-4.

https://news.ycombinator.com/item?id=36014796

It seems like his graduate student did him a great disservice by feeding the questions to 3.5


This should be the top comment.

Not only by providing the correct SotA, but also noting that the graduate student, probably at an expensive University, was so "cheap" as not to buy the cheap tools for their research. Imagine physicists from the 1900s working without tools and not being able to do experiments because "we would have to buy radium so let's try with free iron that I have instead". "Radioactivity is not a thing".


Yes, totally, especially given this was written only a month ago!

  The student referred me to a recent arXiv paper 2303.12712 [cs.CL] about GPT-4, which is apparently behind a paywall at the moment but does even better than the system he could use (https://chat.openai.com/).
I wonder the graduate student considered paying the $20 and/or asking Knuth to pay.


The game “20 questions” is probably the hardest I’ve seen chatGPT fail.

What’s interesting about the game is that, at first pass, there’s no ambiguity. All questions need to be answered with “Yes” or “No”. But many questions asked during the game actually have answers of “it depends”.

For example, I was thinking of “peanut butter” and chatGPT asked me “Does it fit in your hand?” as well as “Is it used in the kitchen?”. Given my answers, chatGPT spent the back half of its questions on different kitchen utensils. It never once considered backing up and verifying that there wasn’t some misunderstanding.

I played three games with it, and it made the same mistake each time.

Of course, playing the game via text loses a lot of information relative to playing IRL with your friends. In person, the answerer would pause, hum, and otherwise demonstrate that the question asked was ambiguous given the restrictions of the game.

Regardless, it was clear that chatGPT wasn’t accounting for ambiguity.


> It never once considered backing up and verifying that there wasn’t some misunderstanding.

Of course not; ChatGPT doesn't "consider". It doesn't think, it doesn't know. It can't identify that there was a misunderstanding of its own volition.

All ChatGPT does is use a (very sophisticated!) statistical analysis to generate text that conforms to an expectation of what a human response to a similar prompt might look like. It has been trained well in so far as it is able to produce prompts that seem like a human may have written them, but it doesn't reveal cognitive processes like "reconsidering" because it doesn't have any.


Wow never heard this comment before


Comments of that nature will continue so long as there are people who don't understand how language models work (or choose to misrepresent them).


20-some years ago, I had this "20 questions" handheld electronic game that was eerily good at winning. I imagine it was a bunch of well-programmed tables of data, but in any case, it's certainly possible for a machine to do well at this game.

I think the more we see ChatGPT do things like "oh, I know this game -- I'm going to run a 20-year-old 20 Questions subroutine that is not part of my neural network language model to generate responses", it will become even more impressive.


> I think the more we see ChatGPT do things like "oh, I know this game -- I'm going to run a 20-year-old 20 Questions subroutine that is not part of my neural network language model to generate responses", it will become even more impressive.

Agreed. Incidentally I’ve built a little toy version of a runtime for exactly this purpose - there’s a translation layer that’s given a bunch of available “APIs” (fed through the LLM context), and breaks down a high level goal into a structured series of API calls.

the runtime parses these API calls, and natively executes some (e.g. run a program, write to the file system) and others result in LLM invocations.

I’m sure OpenAI and crew are way ahead of me here, of course. I’m excited to see what the future holds in this field.


The first AI-style program I ever wrote (about 25 years ago. Yes, I'm old) played 20 questions, but it would "learn" from prior games, so the more you played, the better it performed.

It got extremely good after a few hundred games.


Yeah, ChatGPT could integrate Akinator[0] and trivially be great at the game. Without the help, though, It's a good, revealing benchmark for the LLMs ability.

[0] https://en.akinator.com


LLM for the foreseeable future function most reliably as a user interface layer for other system. I use GPT to “translate” natural language down into the API calls that get real data and it works great. I’d never trust it beyond that.


You trained it with "this phrase means this command" examples? How do you make it use your custom API? (Or you are not using your custom API?)


Basically yeah, just a pretty detailed set of prompts and then “turn the next message into an api call” and it basically works perfectly.

When I first heard the term “prompt engineer” I rolled my eyes, but now that I’ve gotten into it I see it’s really an art form.


"Green Glass Door" also completely stumped it. It just could not deduce that the trick was semantic at the word representation level, rather than something related to the object that the word describes.

What's funny about 20 questions is that Akinator has been absolutely slaying it for like 20 years now.


What happens if you answer with something approximating the hemming and hawing rather than a straight yes or no? You can encode that into text, it's just less common outside of very informal chat conversations.


I just did a 20-questions with it, and was surprised by how bad gpt4 did. Then for fun, I turned it around and had me be the guesser. It's weird and surreal to play 20-questions when you know that the clue-giver doesn't have an answer in their mind (or more literally, there isn't a single answer in any stateful form while you play), but is instead just eventually saying "yes that's what I was thinking of" when it's statistically appropriate.


With the code execution plugin, one could theoretically ask chatgpt to generate a salted hash of their answer at the start that's revealed at the end to prove it was correct.

Without any plugins, chatgpt will happily return sha hashes and salts when I asked it to play rock paper scissors this was. The only trouble was, the hashes were totally wrong.


i love your example, i wonder if this kind of game can be implemented in future training scenarios

we as humans understand ambiguity so much easier because we learn to speak and interact before we write, and writing ambiguity is way less obvious if you've never experienced it


I'm not sure I would think "food" when someone says they "use [it] in the kitchen". You "use" food? (Used in cooking != used in kitchen, imo)


I use food (including peanut butter) in cooking. I cook in the kitchen. Therefore peanut butter is a thing I use in the kitchen. Seems correct and proper to me.

The ambiguity as I see it is that the kitchen isn't the only place I use peanut butter. I've eaten it (which I think counts as "using") in other rooms. I've even made peanut-butter sandwiches (properly "using" it) in the living room before.


That's his whole point. It's possible to consider it technically correct, but it's a red herring.


Well, the alleged point is challenged. If playing this game, the questioner must constantly verify that the other party is using the language properly, you'll exhaust that 20 q limit rather quickly.

- is it used in the kitchen?

- yes.

- [well, kitchen appliances, here we go ..] is it ..?

...

- [aha. meat intelligence no speak proper English?] Is this thing you use in kitchen edible?

- Oh, yeah.

- [oh dear. we can not let meat machines govern this planet...]


I use peanut butter as an ingredient for sandwiches, usually in my kitchen.


Yes. You use edible things in preparing or cooking food (which may happen in the kitchen). 'Use' maps to food prep (the act) but never to prep location. Only in cases where the thing has both general edible and food preparation usage -- "I use honey extensively in the kitchen" for example -- does "use" and "edible" make sense.


But peanut butter has general edible and food preparation usage quite similar to honey, doesn't it? You can spread it on a slice of bread to eat directly or use it as a baking ingredient, but you probably wouldn't eat it by the spoonful straight from the container. (Or maybe that's how people usually eat peanut butter, I kind of don't want to know.)


guilty as charged: spoon + jar = happy mouth.


Yes, I do.


"He saw that gas can explode."

This ambiguous sentence stuck in my head some 30 years ago, when the AI was popular at that time.

There was a research paper discussing the issue of ambiguity.


Right -- although many things that are ambiguous in text are disambiguated in actual speech, so the problems that arise with audio speech are not wholly the same as with text.

A classic example is the word "record", which has first syllable stress as a noun, but second syllable stress as a verb. "I bought a RECord" vs "Please reCORD the music".

(in the dominant American dialect; I don't recall about other dialects/countries)


An interesting reprint in 2003

https://www.drdobbs.com/parallel/understanding-natural-langu...

"Computers still cannot understand natural language as well as young children can. Why is it so hard?"

Source: AI Expert, May 1987


I haven't seen anyone mention Anvil[1] yet, but it lets you "Build web apps with nothing but Python." and is lovely tool that I've successfully used for a handful of side projects.

But as someone who feels most at home with Python, I always love to see new competition in this space.

[1] https://anvil.works/


EDIT: My following statement about self hosting is incorrect. You can, infact, self host.

This looks wonderful, but the inability to self host is a killer from the solo developer point of view. Being limited to 50,000 database rows on the free account isn't ideal.


Anvil has self-hosting! Just "pip install anvil-app-server" :)

https://anvil.works/open-source

(I'm a founder)


Well, I stand corrected! I was looking through the mobile site and didn't spot it. This makes me want to look a bit harder.

When self hosting, do you gain or miss out on paid features?


I totally thought Anvil had self-hosting. I was seriously considering it for my next project. Now not so much.


One of the founders corrected my statement. Apparently you can self host.


Nice! I haven't seen this one before. Will definitely take a look. Thanks for posting.


I’ve run into this exact situation multiple times. Searching the whole history would be revolutionary.


FWIW you can do this with Sourcegraph today, searching over both diffs (code changes)[0] and commit messages[1]:

[0] https://sourcegraph.com/search?q=context:global+repo:%5Egith...

[1] https://sourcegraph.com/search?q=context:global+repo:%5Egith...

(I work there, just commenting on my own though. we're all pretty happy to have competition, more awareness of code search, etc.)


I've found that people have wildly different definitions of "systems engineering", but this one is lovely. And also, very close to my own. ;)

https://jessimekirk.com/blog/whats_a_systems_engineer/


I've been using Resh[0] for the past 6 months or so. A rich and queryable shell history is a massive boost in day-to-day productivity. The syncing described here is a pretty cool feature.

[0]: https://github.com/curusarn/resh


In addition, hiSHtory also supports fish for anyone who uses fish!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: