Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> When I'm writing the code myself, it's basically a ton of "plumbing" of loops and ifs and keeping track of counters and making sure I'm not making off-by-one errors and not making punctuation mistakes and all the rest. It actually takes quite a lot of brain energy and time to get that all perfect.

All of that "plumbing" affects behavior. My argument is that all of the brain energy used when checking that behavior is necessary in order to check that behavior. Do you have a test for an off by one error? Do you have a test to make sure your counter behaves correctly when there are multiple components on the same page? Do you have a test to make sure errors don't cause the component to crash? Do you have a test to ensure non utf-8 text or binary data in a text input throws a validation error? Etc etc. If you're checking all the details for correct behavior, the effort involved converges to roughly the same thing.

If you're not checking all of that plumbing, you don't know whether or not the behavior is correct. And the level of abstraction used when working with agents and LLMs is not the same as when working with a higher level language, because LLMs make no guarantees about the correspondence between input and output. Compilers and programming languages are meticulously designed to ensure that output is exactly what is specified. There are bugs and edge cases in compilers and quirks based on different hardware, so it's not always 100% perfect, but it's 99.9999% perfect.

When you use an LLM, you have no guarantees about what it's doing, and in a way that's categorically different than not knowing what a compiler does. Very few people know all of the steps that break down `console.log("hello world")` into the electrical signals that get sent to the pixels on a screen on a modern OS using modern hardware given the complexity of the stack, but they do know with as close as is humanly possible to 100% certainty that a correctly configured environment will result in that statement outputting the text "hello world" to a console. They do not need to know the implementation because the contract is deterministic and well defined. Prompts are not deterministic nor well defined, so if you want to verify it's doing what you want it to do, you have to check what it's doing in detail.

Your basic argument here is that you can save a lot of time by trusting the LLM will faithfully wire the code as you want, and that you can write tests to sanity check behavior and verify that. That's a valid argument, if you're ok tolerating a certain level of uncertainty about behavior that you haven't meticulously checked or tested. The more you want to meticulously check behavior, the more effort it takes, and the more it converges to the effort involved in just writing the code normally.



> If you're checking all the details for correct behavior, the effort involved converges to roughly the same thing.

Except it doesn't. It's much less to verify the tests.

> That's a valid argument, if you're ok tolerating a certain level of uncertainty about behavior that you haven't meticulously checked or tested.

I'm a realist, and know that I, like all other programmers, am fallible. Nobody writes perfect code. So yes, I'm ok tolerating a certain level of uncertainty about everybody's code, because there's no other choice.

I can get the same level of uncertainty in far less time with an LLM. That's what makes it great.


> Except it doesn't. It's much less to verify the tests.

This is only true when there is less information in those tests. You can argue that the extra information you see in the implementation doesn't matter as long as it does what the tests say, but the amount of uncertainty depends on the amount of information omitted in the tests. There's a threshold over which the effort of avoiding uncertainty becomes the same as the effort involved in just writing the code. Whether or not you think that's important depends on the problem you're working on and your tolerance for error and uncertainty, and there's no hard and fast rule for that. But if you want to approach 100% correctness, you need to attempt to specify your intentions 100% precisely. The fact that humans make mistakes and miscommunicate their intentions does not change the basic fact that a human needs to communicate their intention for a machine to fulfill that intention. The more precise the communication, the more work that's involved, regardless of whether you're verifying that precision after something generates it or generating it yourself.

> I can get the same level of uncertainty in far less time with an LLM. That's what makes it great.

I have a low tolerance for uncertainty in software, so I usually can't reach a level I find acceptable with an LLM. Fallible people who understand the intentions and current function of a codebase have a capacity that a statistical amalgamation of tokens trained on fallible people's output simply do not have. People may not use their capacity to verify alignment between intention and execution well, but they have it.

Again, I'm not denying that there's plenty of problems where the level of uncertainty involved in AI generated code is acceptable. I just think it's fundamentally true that extra precision requires extra work/there's simply no way to avoid that.


> I have a low tolerance for uncertainty in software

I think that's what's leading you to the unusual position that "This is only true when there is less information in those tests."

I don't believe in perfection. It's rarely achieved despite one's best efforts -- it's a mirage. What we can realistically look for is a statistical level of reliability that tests help achieve.

At the end of the day, it's about delivering value. If you can on average deliver 5x value with an LLM because of the speed, or 1.05x value because you verified every line of code 3 times and avoided a rare bug that both the LLM and you didn't think about testing (compared to the 1x value of a non-perfectionist developer), then I know which one I'm choosing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: