I had a similar take until about a week ago. A friend showed me his workflow wit...

orzig · on March 15, 2024

I also had to build intuition for when it will be appropriate versus not. It's hard to describe but one very positive signal is certainly "will any hallucination be caught in <30s"? Even in ChatGPT Plus you can have it write its own unit tests and run them in the original prompt (even in the profile's Custom Instructions so you don't have to type it all the time).

So a mistake was using it for something where runtime performance on dozens of quirky data files was critical; that nearly set my CPU on fire. But str>str data cleanup, chain of simple API calls, or some a one-off data visualization? chef kiss

jmull · on March 15, 2024

> to write every line for basic stuff you've done 1000x before

There are ways to avoid writing basic stuff you've done 1000x before that are better than LLMs though...

Put it in a well-thought-out function or package or other form of shared/reusable code. You can validate it, spend the time to make sure it covers your edge cases, optimize it, test it, etc. so that when you go to reuse it you can have confidence it will reliably do what you need it to do. LLM-generated code doesn't have that.

(When you think about how LLMs are trained and work, you realize they are actually just another form of code reuse, but one where there are various transformations to the original code that may or may not be correct.)

Where LLMs shine for coding is in code-completion. You get the LLM output in little chunks that you can immediately review correctly and completely, in the moment: "yeah that's what I want" or "no, that's no good" or "ok, I can work with that". Not surprising, since predicting completion is what LLMs actually do.

dkjaudyeqooe · on March 15, 2024

I don't know exactly how you use it, but this isn't my experience at all. If you ask a LLM anything too specific, that isn't obvious and a common issue/discussion ( something that I almost never need to do), it just makes up nonsense to fill the space.

Equally, if you ask it general questions it misses information and is almost always incomplete, leaving out slightly more obscure elements. Again, I need comprehensive answers, I can come up with incomplete ones myself.

What's really obvious to me when I use it is that it's a LLM trained on pre-existing text, that really comes through in the character of its answers and its errors.

I've very glad others find them useful and productive, but for me they're disappointing given how I want to use them.

orzig · on March 15, 2024

That's fair, it might not be for you. In 'old school ML', for a binary classifier, there's the concept of Precision (% of Predicted Positive that's ACTUALLY Positive) and Recall (% of ACTUALLY Positive that's Predicted to be Positive).

It sounds like you want perfect Precision (no errors on specific Qs) and perfect Recall (comprehensive on general Qs). You're right that no model of any type has ever achieved that on any large real-world data, so if that's truly the threshold for useful in your use cases, they won't make sense.

dkjaudyeqooe · on March 15, 2024

I just want something useful. I'm not talking perfection, I'm talking about answers which are not fit for purpose. 80% of the time the answers are just not useful.

How are you supposed to use LLMs if the answers they give are not salvageable with less work than answering the question yourself using search?

Again, for some people it might be fine, for technical work, LLMs don't seem to cut it.

samstave · on March 15, 2024

Sorry if this is sophmoric, but when you said "you have to have clarity of thought" - what jumped to mind was the phrase "you have to speak to the code"... I thought it encapsulated your clarity of thought quite saliently for me.

throwup238 · on March 15, 2024

You must be one with the code. You must be the code.