Hacker Newsnew | past | comments | ask | show | jobs | submit | mulmboy's commentslogin

People more often say that to save face by implying the issue you identified would be reasonable for the author to miss because it's subtle or tricky or whatever. It's often a proxy for embarrassment

When mature, funtional adults say it, the read is "wow, I would have missed that, good job, you did better than me".

Reading embarrassment into that is extremely childish and disrespectful.


What I'm saying is that a corporate or professional environment can make people communicate in weird ways due to various incentives. Reading into people's communication is an important skill in these kinds of environments, and looking superficially at their words can be misleading.

Because it's a good heuristic for a functional and resilient team. People don't usually means it literally, more like "if I disappeared it should be pretty painless for the team to continue along for a month or so and to find and onboard a replacement".

LLMs aren't like you or me. They can comprehend large quantities of code quickly and piece things together easily from scattered fragments. so go to reference etc become much less important. Of course though things change as the number of usages of a symbol becomes large but in most cases the LLM can just make perfect sense of things via grep.

To provide it access to refactoring as a tool also risks confusing it via too many tools.

It's the same reason that waffling for a few minutes via speech to text with tangents and corrections and chaos is just about as good as a carefully written prompt for coding agents.


> AI seems to have caught up to my own intelligence even in those narrow domains where I have some expertise. What is there left that AI can’t do that I would be able to verify?

The last few days I've been working on some particularly tricky problems, tricky in the domain and in backwards compatibility with our existing codebase. For both these problems GPT 5.2 has been able to come to the same ideas as my best, which took me quite a bit of brain racking to get to. Granted it's required a lot of steering and context management from me as well as judgement to discard other options. But it's really getting to the point that LLMs are a good sparring partner for (isolated technical) problems at the 99th percentile of difficulty


You steered a sycophantic LLM to the same idea that you had already had & think that's worth bragging about?


I'm well ware that they can be sycophantic, and I structure things to avoid that like asking "what do you think of this problem" and seeing the idea fall out rather than providing anything that would suggest it. In one of these two cases it took an idea that I had inkling of, fleshed it out, and expanded it to be much better than I had.

And I'm not bragging. I'm expressing awe, and humility that I am finding a machine can match me on things that I find quite difficult. Maybe those things aren't so difficult after all.

By steering I mean more steering to flesh out the context of the problem and to find relevant code and perform domain-specific research. Not steering toward a specific solution.


Is it just me or is codex slow?

With claude code I'll ask it to read a couple of files and do x similar to existing thing y. It takes a few moments to read files and then just does it. All done in a minute or so.

I tried something similar with codex and it took 20 minutes reading around bits of file and this and that. I didn't bother letting it finish. Is this normal? Do I have something misconfigured? This was a couple of months ago.


What do these look like?


  1. Take every single function, even private ones.
  2. Mock every argument and collaborator.
  3. Call the function.
  4. Assert the mocks were  called in the expected way.
These tests help you find inadvertent changes, yes, but they also create constant noise about changes you intend.


These tests also break encapsulation in many cases because they're not testing the interface contract, they're testing the implementation.


Juniors on one of the teams I work with only write this kind of tests. It’s tiring, and I have to tell them to test the behaviour, not the implementation. And yet every time they do the same thing. Or rather their AI IDE spits these out.


You beat me to it, and yep these are exactly it.

“Mock the world then test your mocks”, I’m simply not convinced these have any value at all after my nearly two decades of doing this professionally


> Everything it does can be done reasonable well with list comprehensions and objects that support type annotations and runtime type checking (if needed).

I see this take somewhat often, and usually with similar lack of nuance. How do you come to this? In other cases where I've seen this it's from people who haven't worked in any context where performance or scientific computing ecosystem interoperability matters - missing a massive part of the picture. I've struggled to get through to them before. Genuine question.


It does largely avoid the issue if you configure to allow only specific environments AND you require reviews before pushing/merging to branches in that environment.

https://docs.pypi.org/trusted-publishers/adding-a-publisher/

For a malicious version to be published would then require full merge which is a fairly high bar.

AWS allows similar


As we're seeing, properly configuring github actions is rather hard. By default force pushes are allowed on any branch.


Yes and anyone who knows anything about software dev knows that the first thing you should do with an important repo is set up branch protections to disallow that, and require reviews etc. Basic CI/CD.

This incident reflects extremely poorly on PostHog because it demonstrates a lack of thought to security beyond surface level. It tells us that any dev at PostHog has access at any time to publish packages, without review (because we know that the secret to do this is accessible from plain GHA secret which can be read from any GHA run which presumably run on any internal dev's PR). The most charitable interpretation of this is that it's consciously justified by them because it reduces friction, in which case I would say that demonstrates poor judgement, a bad balance.

A casual audit would have revealed this and suggested something like restricting the secret to a specific GHA environment and requiring reviews to push to that env. Or something like that.


Nobody understands github. I guess someone at microsoft did but they probably got fired at some point.

You can't really fault people for this.

It's literally the default settings.


Along with a bunch of limitations that make it useless for anything but trivial use cases https://docs.claude.com/en/docs/build-with-claude/structured...

I've found structured output APIs to be a pain across various LLMs. Now I just ask for json output and pick it out between first/last curly brace. If validation fails just retry with details about why it was invalid. This works very reliably for complex schemas and works across all LLMs without having to think about limitations.

And then you can add complex pydantic validators (or whatever, I use pydantic) with super helpful error messages to be fed back into the model on retry. Powerful pattern


Yeah, the pattern of "kick the error message back to the LLM" is powerful. Even more so with all the newer AIs trained for programming tasks.


Big missing piece - what was the impact of the degraded quality?

Was it 1% worse / unnoticeable? Did it become useless? The engineering is interesting but I'd like to see it tied to actual impact


Significant, check any Claude related thread here over the last month or the Claude Code subreddit. Anecdotally, the degradation has been so bad that I had to downgrade to a month old version - which has helped a lot. I think part of the problem is there as well (Claude Code).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: