Hacker Newsnew | past | comments | ask | show | jobs | submit | nojito's commentslogin

Not for org/enterprise licenses.

There is virtually no consequences or accountability when big-tech companies share private data. For crying out loud, they were caught red handed sharing private data from their EU endeavors.

If even sovereign states with clear laws forbidding such behavior can't keep those companies in check, no enterprise/b2b can.


all you need is a simple skills.md and maybe a couple examples and codex picks up my custom toolkit and uses it.

whats your custom toolkit

I have dozens of clis that are custom built for codex to use.

It's eerie how similar these trends are to early phone/text usage limits.

This is just ai slop. If you follow what the actual designers of Claude/GPT tell you it flys in the face of building out over engineered harnesses for agents.


I agree with this. There is not a lot of harnesses/wrapping needed for Claude Code.


You don't need a harness beyond Claude Code, but honestly it's foolish to think you shouldn't be building out extra skills to help your workflow. A TDD skill that does red-green-refactoring is using Claude Code exactly as how it's meant to be used. They pioneered skills.


Yep, not saying we don't need skills. Just harnesses.


Works better than standard claude / gpt, which doesn't do red-green-refactor. Doesn't seem like slop when it meaningfully changes the results for the better, consistently. Really is a game-changer. You should consider trying it.


I do do TDD but using skills in this way is an anti-pattern for a multitude of reasons.


I don't think just saying it's an anti-pattern for a multitude of reasons and then not naming any is sufficiently going to convince anyone it's an anti-pattern.

This is in fact precisely what skills is meant for and is the opposite of an anti-pattern, but more like best practice now. It's explicitly using the skills framework precisely how it was meant to be used.


>THe AI won't get the perfect system in one shot, far from it! And especially not from sloppy initial requirements that leave a lot of edge (or not-so-edge) cases unadressed. But if you have a good requirement to start with, you have a chance to correct the AI, keep it on track; you have something to go back to and ask other AI, "is this implementation conforming to the spec or did it miss things?"

This is an antiquated way of thinking. If you ramp up the number of agents you're using the auto-correcting and reviewing behavior kicks in which makes for much less human intervention until the final code review.


Yes, but what about the "spec-review"? Isn't that even more important? Is the system doing what we (and its users) need it to be doing?


>how do you expect to stay good at reviewing code if you never write it?

What exactly does "writing code" mean?

Are you telling me I have to write for loops and if elses forever?


>I've never seen it develop something more than trivial correctly.

This is 100% incorrect, but the real issue is that the people who are using these llms for non-trivial work tend to be extremely secretive about it.

For example, I view my use of LLMs to be a competitive advantage and I will hold on to this for as long as possible.


The key part of my comment is "correctly".

Does it write maintainable code? Does it write extensible code? Does it write secure code? Does it write performant code?

My experience has been it failing most of these. The code might "work", but it's not good for anything more than trivial, well defined functions (that probably appeared in it's training data written by humans). LLMs have a fundamental lack of understanding of what they're doing, and it's obvious when you look at the finer points of the outcomes.

That said, I'm sure you could write detailed enough specs and provide enough examples to resolve these issues, but that's the point of my original comment - if you're just writing specs instead of code you're not gaining anything.


I find “maintainable code” the hardest bias to let go of. 15+ years of coding and design patterns are hard to let go.

But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms. So maintainable has to evolve from good human design patterns to good AI patterns.

Specs are worth it IMO. Not because if I can spec, I could’ve coded anyway. But because I gain all the insight and capabilities of AI, while minimizing the gotchas and edge failures.


> But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms. So maintainable has to evolve from good human design patterns to good AI patterns.

How do you square that with the idea that all the code still has to be reviewed by humans? Yourself, and your coworkers


I picture like semi conductors; the 5nm process is so absurdly complex that operators can't just peek into the system easily. I imagine I'm just so used to hand crafting code that I can't imagine not being able to peek in.

So maybe it's that we won't be reviewing by hand anymore? I.e. it's LLMs all the way down. Trying to embrace that style of development lately as unnatural as it feels. We're obv not 100% there yet but Claude Opus is a significant step in that direction and they keep getting better and better.


Then who is responsible when (not if) that code does horrible things? We have humans to blame right now. I just don’t see it happening personally because liability and responsibility are too important


For some software, sure but not most.

And you don’t blame humans anyways lol. Everywhere I’ve worked has had “blameless” postmortems. You don’t remove human review unless you have reasonable alternatives like high test coverage and other automated reviews.


We still have performance reviews and are fired. There’s a human that is responsible.

“It’s AI all the way down” is either nonsense on its face, or the industry is dead already.


> But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms

I don't find that LLMs are any more likely than humans to remember to update all of the places it wrote redundant functions. Generally far less likely, actually. So forgive me for treating this claim with a massive grain of salt.


Yes to all of these.

Here's the rub, I can spin up multiple agents in separate shells. One is prompted to build out <feature>, following the pattern the author/OP described. Another is prompted to review the plan/changes and keep an eye out for specific things (code smells, non-scalable architecture, duplicated code, etc. etc.). And then another agent is going to get fed that review and do their own analysis. Pass that back to the original agent once it finishes.

Less time, cleaner code, and the REALLY awesome thing is that I can do this across multiple features at the same time, even across different codebases or applications.


To answer all of your questions:

yes, if I steer it properly.

It's very good at spotting design patterns, and implementing them. It doesn't always know where or how to implement them, but that's my job.

The specs and syntactic sugar are just nice quality of life benefits.


You’d be building blocks which compound over time. That’s been my experience anyway.

The compounding is much greater than my brain can do on its own.


Building on OpenClaw is a mistake.

The real advantage is https://github.com/badlogic/pi-mono


OpenClaw is actually built on top of pi-mono (for its agent runtime, models, and tools):

https://docs.openclaw.ai/concepts/agent#pi-mono-integration

https://github.com/openclaw/openclaw/blob/main/docs/pi.md


Context.

Why would I use an MCP when I can use a cli tool that the model likely trained on how to use?


Can you be more specific about “context”?

And not everything has a CLI, but in any case, the comment I was replying to was suggesting building my own CLI, which presumably the LLM wasn’t trained on.

Maybe my understanding of MCP is wrong, my assumption is that it’s a combination of a set of documented tools that the LLM can call (which return structured output), and a server that actually receives and processes those tool calls. Is that not right? What’s the downside?


>You get to think through a problem and figure out why decision A is better than B. Learning about various domains and solving difficult problems is in itself a reward.

So just tell the LLM about what you're thinking about.

Why do you need to type out a for loop for the millionth time?


(a) it's relaxing and pleasing to do something like typing out a for loop. The repetition with minor variation stimulates our brains just the right amount. Same reason why people like activities like gardening, cooking, working on cars, Legos, and so on. (b) it allows you to have some time to think about what you're doing. The "easy" part of coding gives you a bit of breathing room to plan out the next "hard" section.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: