Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find that Claude 3.5 does a very admirable job. I only rarely see incorrect code and more frequently see recommendations for older libraries or obsolete versions of real libraries as compared to hallucinated libraries.

I've been playing with Claude 3.7 thinking and, perhaps unsurprisingly, I find it overthinks the problem or tries to do far more than I really want it to for any prompt. I expect that I'm just using the wrong tool, and probably should just use Claude 3.7.

Of course in all of this, I'm using the LLM in a "junior" capacity and I'm not giving it giant multi-faceted problems to solve: I'm giving it relatively narrow problems to solve at any one time and am guiding it through that process.



In any LLM it helps to use simple tricks to give it a few extra prompts like 'No blabbling' and 'Keep your code example simple and to the point without adding anything exrta' still but there's all sorts of research on optimization for getting better results from just how you frame your interaction with it in different and more exacting and concise wording:

https://github.com/jxzhangjhu/Awesome-LLM-Prompt-Optimizatio...

Really a lot of stuff you'd find in any university/college English course for academic writing style for getting your point across clearly applies as well:

https://alum.mit.edu/succinct-writing-guide

https://owl.purdue.edu/owl/general_writing/academic_writing/...

https://writingcenter.unc.edu/tips-and-tools/conciseness-han...

Personally though I've been avoiding its use in code and keeping it to where it shines. I've said this over and over. It's so good for writing drivel like copy and product descriptions and instagram posts and SEO-able text content. Stuff that people have been used to sounding kinda fake for decades if not over a century by now, that I have no heart to write myself but I can now literally just tell a robot to "increase engagement" and it shows in $$$.

Where I've found limits in what it can generate with code is with complex concurrency stuff that you really need to have knowledge about yourself to be able to prove isn't going to crash. The kind of stuff that you might pick up Elixr or Go for, specifically, I have always found it hits serious problems generating that kind of stuff. You need serious engineers who know stuff like TLA+, coq, spin etc to get that right if you're making systems that peoples lives or finances might depend on. I worry there's a lot of generated code being put out there in production which is not taking these things into consideration and people are just like 'wow it compiles, ship it'




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: