Hacker Newsnew | past | comments | ask | show | jobs | submit | alkonaut's commentslogin

This would be _extremely_ valuable for desktop dev when you don't have a DOM, no "accessibility" layer to interrogate. Think e.g. a drawing application. You want to test that after the user starts the "draw circle" command and clicks two points, there is actually a circle on the screen. No matter how many abstractions you make over your domain model, rendering you can't actually test that "the user sees a circle". You can verify your drawing contains a circle object. You can verify your renderer was told to draw a circle. But fifty things can go wrong before the user actually agrees he saw a circle (the color was set to transparent, the layer was hidden, the transform was incorrect, the renderer didn't swap buffers, ...).

I had claude build a backdoor command port in the Godot application I'm working on. Using commands, Claude can interact with the screen, dump the node tree, and take screen shots. It works pretty well. Claude will definitely iterate over layout issues.

Have you written this up anywhere? I have dropped my projects due to work/family commitments but see this as potentially removing some of the friction involved.

No. I just told Claude to do it and after a couple of iterations it was working.

This is a good point. For anything without a DOM, screenshot diffing is basically your only option. Mozilla did this for Gecko layout regression testing 20+ years ago and it was remarkably effective. The interesting part now is that you can feed those screenshots to a vision model and get semantic analysis instead of just pixel diffing.

Yes agree. Web only for now since it runs on headless Chromium. Desktop and mobile are the #1 request though. For mobile the path would be driving an iOS Simulator or Android emulator. For native desktop, probably accessibility APIs or OS-level screenshots. Definitely on my radar, will see if anyone wants to contribute since I am doing this on my free time.

MAUI is a Microsoft project, this is not?

This isn't Microsoft. This is an open source project.

.NET has been on Linux for ages.



The most infuriating part about git's default behavior is that it's so ignorant about what actual reality users live in.

For example: when merging or rebasing it's really important to know what I did myself, vs what someone else did. Yet it has a really opaque left/right or mine/theirs representation which even switches meaning depending on the operation you are doing.

This isn't even a fundamental diff/patch issue it's just that git shrugs and assumes you want to perform some abstract operation on a DAG of things rather than, you know, rebase your code onto that of your colleagues.


Even for me (a software developer who reads these articles) it's really hard to actually know whether the software is any good. Are there unlockable features? Are there subscriptions with reasonable costs? What happens if I don't have a subscription? How often are updates shipped? What's the general consensus around the quality of the system as a whole?

It took decades for people to land on - in fairness some times very handwavy -generalizations like "Japanese cars are reliable", "German cars are well built", "French cars are...french".

All this is now on its head. The landscape changes very quickly and you don't even recognize the brands. A Chinese maker of vacuum cleaners might have sold more cars than VW in 2025 and yet you never heard of them. A reputable car manufacturer like Honda could be a complete novice when it comes to EVs and so on.

Even though software is extremely important for how cars work, we still don't have easy comparisons. It's mentioned in reviews/tests of cars, but it's mostly "Yeah it feels snappy and modern, 7/10" and no real meat in the comparison. I wish there was an WLTP comparison scheme for car software which made it easy to compare.


I definitely have 100% pass rate on our tests for most of the time (in master, of course). By "most of the time" I mean that on any given day, you should be able to run the CI pipeline 1000 times and it would succeed all of them, never finding a flaky test in one or more runs.

In the rare case that one is flaky, it's addressed. During the days when there is a flaky test, of course you don't have 100% pass rate, but on those days it's a top priority to fix.

But importantly: this is library and thick client code. It should be deterministic. There are no DB locks, docker containers, network timeouts or similar involved. I imagine that in tiered application tests you always run the risk of various layers not cooperating. Even worse if you involve any automation/ui in the mix.

Obviously there are systems it depends on (Source control, package servers) which can fail, failing the build. But that's not a _test_ failure.

If the build it fails, it should be because a CI machine or a service the build depends on failed, not because an individually test randomly failed due to a race condition, timeout, test run order issue or similar


If one is flaky, then you are below 100% friend.

That's not what I mean. I mean that anything but 100% is a "stop the world this is unacceptable" kind of event. So if there is a day when there is a flaky test, it must be rare.

To explain further

There is a difference between having 99.99% test pass every day (unacceptable) which is also 99.99% tests passing for the year, versus having 100% tests passing on 99% of days, and 99% tests on a single bad day. That might also give 99.99% test pass rate for the year, but here you were productive on 99/100 days. So "100.0 is the normal" is what I mean. Not that it's 100% pass on 100% of days.

Having 99.98% tests pass on any random build is absolutely terrible. It means a handful of tests out of your test suite fail on almost _every single CI run_. If you have 100% test pass as a validation for PR's before merge, that means you'll never merge. If you have 100% test pass a validation to deploy your main branch that means you'll never deploy...

You want 100% pass on 99% of builds. Then it doesn't matter if 1% or 99% of tests pass on the last build. So long as you have some confidence that "almost all builds pass entirely green".


"most of the time" != 100% pass rate

Read my other response. It's about having 100% be the normal. There is a difference between having 99.99% all of the time, and having 100% all of the time and 99% in rare occasions.

So "100% most of the time" actually makes sense, and is probably as good as you might hope to get on a huge test suite.


This falls for the famous "hours of planning can save minutes of coding". Architecture can't (all) be planned out on a whiteboard, it's the response to the difficulty you only realize as you try to implement.

If you can agree what to build and how to build it and then it turns out that actually is a working plan - then you are better than me. That hasn't happened in 20 years of software development. Most of what's planned falls down within the first few hours of implementation.

Iterative architecture meetings will be necessary. But that falls into the pit of weekly meeting.


That's actually one thing that always prevented me from following the standard pathway of "write a design document first, get it approved, then execute" during my years in Google.

I cannot write a realistic non-hand-wavy design document without having a proof of concept working, because even if I try, I will need to convince myself that this part and this part and that part will work, and the only way to do it is to write an actual code, and then you pretty much have code ready, so why bother writing a design doc.

Some of my best (in terms of perf consequences) design documents were either completely trivial from the code complexity point of view, so that I did not actually need to write the code to see the system working, or were written after I already had a quick and dirty implementation working.


That’s why I either started with the ports and adapters pattern or quickly refactored into it on spikes.

You don’t have to choose what flavor of DDD/Clean/… you want to drink, just use some method that keeps domains and use cases separate from implementation.

Just with shapes and domain level tests, the first pass on a spec is easier (at least for me) and I also found feedback was better.

I am sure there are other patterns that do the same, but the trick is to let the problem domain drive, not to choose any particular set of rules.

Keeping the core domain as a fixed point does that for me.


I am very similar in this respect, however once I get to a place where I am implementing something very similar to something in my past, it becomes easier to draft a doc first because I have been down that path before

It’s a muscle you can exercise, and doing so helps you learn what to focus on so it’ll be successful. IME a very successful approach is to focus on interfaces, especially at critical boundaries (critical for your use case first, then critical for your existing design/architecture).

Doing this often settles the design direction in a stable way early on. More than that, it often reveals a lot of the harder questions you’ll need to answer: domain constraints and usage expectations.

Putting this kind of work upfront can save an enormous amount of time and energy by precluding implementation work on the wrong things, and ruling out problematic approaches for both the problem at hand as well as a project’s longer term goals.


I've worked waterfall (defense) and while I hated it at the time I'd rather go back to it. Today we move much faster but often build the wrong thing or rewrite and refactor things multiple times. In waterfall we move glacially but what we would build sticks. Also, with so much up front planning the code practically writes itself. I'm not convinced there's any real velocity gains in agile when factoring in all the fiddling, rewrites, and refactoring.

> Most of what's planned falls down within the first few hours of implementation.

Not my experience at all. We know what computers are capable of.


> I've worked waterfall and while I hated it at the time I'd rather go back to it. Today we move much faster but build the wrong thing or rewrite and refactor things multiple times.

My experience as well. Waterfall is like - let's think about where we want this product to go, and the steps to get there. Agile is like ADHD addled zig zag journey to a destination cutting corners because we are rewriting a component for the third time, to get to a much worse product slightly faster. Now we can do that part 10x faster, cool.

The thing is, at every other level of the company, people are actually planning in terms of quarters/years, so the underlying product being given only enough thought for the next 2 weeks at a time is a mismatch.


It’s possible to manage the quarterly expectations by saying “we can improve metric X by 10% in a quarter”. It’s often possible to find an improvement that you’re very confident of making very quickly. Depending on how backwards the company is you may need to hide the fact that the 10% improvement required a one line change after a month of experimentation, or they’ll fight you on the experimentation time and expect that one line to take 5 minutes, after which you should write lots more code that adds no value.

Agile isn’t a good match for a business that can only think in terms of effort and not learning+value. That doesn’t make agile the problem.


My experience in an agile firm was that they hired a lot of experienced people and then treated them like juniors. Actively allergic to thinking ahead.

To get around the problem that deliverables took more than a few days, actual tasks would be salami sliced down into 3 point tickets that simply delivered the starting state the next ticket needed. None of these tickets being completed was an actual user observable deliverable or something you could put on a management facing status report.

Each task was so time boxed, seniors would actively be upbraided in agile ceremonies for doing obvious next steps. 8 tickets sequentially like - Download the data. Analyze the data. Load a sample of the data. Load all the data. Ok now put in data quality tests on the data. OK now schedule the daily load of the data. OK now talk to users about the type of views/aggregations/API they want on the data. OK now do a v0 of that API.

It's sort of interesting because we have fully transitioned from the agile infantilization of seniors to expecting them to replace a team of juniors with LLMs.


I have so many thoughts:

Depending on the reality, either that company doesn't understand agile very well, or you didn't understand the importance of the small steps.

A plan is not made agile by being split into many small sequential steps; what would make this agile is learning from each step and being prepared to scrap steps 2-8 if step 1 turns out to be enough. Usually this attitude results in splits that make more sense and do add user value.

OTOH I've seen many experienced folks get tripped up because it's easy to get consumed and not evaluate work vs the customer value when you're in the middle of a big task.

For example on an internationalisation project a dev thought: "Every translation key is handled the same way in Rails, let me just do them all at once"; spent weeks with the appearance of no progress because they were working through many cases often slightly more complicated than imagined. They said out loud ~ "I'm not working just for the sake of a task board, the work needs to be done, let's be better than box ticking, it's all one logically consistent piece of work".

I had to interrupt to point out that most of the pages were either about to be deleted or were only needed later. Meanwhile we had tons of work that needed this person's attention on things that were of immediate importance.

It's also important to work in a way that a high number of PRs is not a penalty. It's a smell if we're motivated to reduce the number of PRs because shipping PRs feels difficult.


> and then treated them like juniors

You shouldn't put juniors in a strict short time box either. At least not for long.

People don't grow if they can't think about the results of their work. If if your juniors can't grow, you could as well not hire any.


Heh, sounds like Goodhart's law gone wild at that place.

Maybe. It might also have nothing to do with ill-conceived attempts at evaluation. I sometimes suggest teams work in steps that small (usually because they don't have the experience to handle something bigger) and it has nothing to do with evaluating the team. It has everything to do with them learning to move precisely and avoid mistakes.

Yes - how to complete story points without actually solving any problems

I think the bigger issue is that Waterfall is often not "Waterfall".

Sure there's a 3000 row excel file of requirements but during development the client still sees the product or slides outlining how the product works and you still had QA that had to test stuff as you made it. Then you make changes based on that feedback.

While Agile often feels like it's lost the plot. We're just going to make something and iterate it into a product people like versus figuring out a product people will like and designing towards it.


There's an abstraction level above which waterfall makes more sense, and below which [some replacement for agile but without the rituals] makes more sense.

I think Qs to ask are.. if the nature of user facing deliverable tasks are longer than a sprint, the tasks have linear dependencies, there are coordination concerns, etc

Sprints are just ritual though. The others... if you're that low I'd say you're past waterfall since you have well defined tasks while I feel a waterfall like approach is more for initial architecture.

Agile largely came about because we thought about where we wanted the product to go, and the steps to get there, and started building, and then it turned out that the way we thought we wanted to go was wrong, and all of that planning we did was completely wasted.

If you work in an environment where you definitely do know where you want the product to go, and the customer doesn't change their mind once they've seen the first working bits, then great. But I've never worked in that kind of environment.


It helps to at least write down requirements. And not requirements in that "it must use Reddis", but customer, user, performance, cost, etc requirements.

A one page requirements document is like pulling teeth apparently.


Oh yes, you want a vague idea of where you're going, and concrete plans for the next step. If you can't even get one-page requirements then something has gone very badly wrong.

> Today we move much faster but often build the wrong thing or rewrite and refactor things multiple times. In waterfall we move glacially but what we would build sticks.

That's an interesting observation. That's one of the biggest criticisms of waterfall: by the time you finish building something the requirements have changed already, so you have to rewrite it.


there is a difference between the requirements changing and the poor quality, quickly made implementation proves to be inadequate.

agile approaches are based on the quick implementations, redone as needed.

my favorite life cycle: 1> Start with requirements identification for the entire system. 2> Pick a subset of requirements to implement and demonstrate (or deliver) to the customer. 3> Refine the requirements as needed. 4> go to 2

The key is you have an idea of overall system requirements and what is needed, in the end, for the software you are writing. Thus the re-factoring, and re-design due to things not included in the sprint do not occur. (or occur less)


This approach also accounts for the truism that "the customer doesn't know what they want until they don't see it in the final product".

> I'm not convinced there's any real velocity gains in agile when factoring in all the fiddling, rewrites, and refactoring.

That’s not the point. The point is to end up with something actually useful in the end. If the artifact I deliver does not meet requirements, it does not really matter how fast I deliver it.

The reason waterfall methodology falls flat so often is not long delivery times, but ending up with completely the wrong thing.


> If the artifact I deliver does not meet requirements, it does not really matter how fast I deliver it.

I don’t know. The faster you deliver the wrong thing, the sooner you can discover your mistake and pivot.


You summarized agile. That is the whole point: short feedback cycles. You can view it as a series of short, self-regressive waterfalls.

> > Most of what's planned falls down within the first few hours of implementation.

> Not my experience at all. We know what computers are capable of.

You must not work in a field where uncertainty is baked in, like Data Science. We call them “hypotheses”. As an example, my team recently had a week-long workshop where we committed to bodies of work on timelines and 3 out of our 4 workstreams blew up just a few days after the workshop because our initial hypotheses were false (i.e. “best case scenario X is true and we can simply implement Y; whoops, X is false, onto the next idea”)


Wait, are you perhaps saying that... "it depends"? ;-)

Every single reply in this thread is someone sharing their subjective anecdotal experience..

There are so many factors involved in how work pans out beyond planning. Even a single one of us could probably tell 10 different stories about 10 different projects that all went differently.


Yeah, which is also why I tried not to* speak prescriptively, unlike some other comments in this thread…

Comparing the same work done between agile and waterfall I can accept your experience of what sounds like an org with unusually effective long term planning.

However the value of agile is in the learning you do along the way that helps you see that the value is only in 10% of the work. So you’re not comparing 100% across two methodologies, you’re comparing 100% effort vs 10% effort (or maybe 20% because nobody is perfect).

Most of the time when I see unhappiness at the agile result it’s because the assessment is done on how well the plan was delivered, as opposed to how much value was created.


I think it also depends on how people think. I might be able to sit can't sit in a meeting room/white board/documentation editor and come up with what the big problems is (where pain points in implementation will occur, where a sudden quadratic algorithm pops up, where a cache invalidation becomes impossible, ...) even if I stare at this white board or discuss with my peers for days.

But when I hammer out the first 30 minutes of code, I have that info. And if we just spent four 2-hour meetings discussing this design, it's very common that I after 30 minutes of coding either have found 5 things that makes this design completly infeasible, or maybe 2 things that would have been so good to know before the meeting, that the 8 hours of meetings just should not have happened.

They should have been a single 2 hour meeting, followed by 30 minutes of coding, then a second 2 hour meeting to discuss the discoveries. Others might be much better than me of discovering these things at the design stage, but to me coding is the design stage. It's when I step back and say "wait a minute, this won't work!".


Agile is for when you don't know what you're making and you're basically improvising. People forget that.

Correct, and it was applied top-down to teams that do larger infrastructure / implementations in known areas / etc.

There are costs to pouring out a cement foundation without thinking through how many floors your building is going to be in advance.


But if you don't know what you are making, it is the only option!

Pair programming 100% of also works. It's unfortunately widely unpopular, but it works.

I also think we're going to see a resurgence of either pair programming, or the buddy system where both engineers take responsibility for the prompting and review and each commit has 2 authors. I actually wrote a post on this subject on my blog yesterday, so I'm happy to see other people saying it too. I've worked on 2-engineer projects recently and it's been way smoother than larger projects. It's just so obvious that asynchronous review cycles are way too slow nowadays, and we're DDoSing our project leaders who have to take responsibility for engineering outcomes.

For anything complicated or wide in scope, we've found it much more productive to just hop on a call and pair.

The problem is that you can only meaningfully pair program with programmers. The people involved in architexture/design meetings might not be programmers. The questions that arise when 2 programmers work might not be resolvable without involving the others.

Nonsense. I pair all the time with stakeholders. If you strip out all of the cucumber nonsense this is essentially what BDD is - fleshing out and refining specs by guiding people through concrete, written example scenarios.

I also often pair with infrastructure people on solving a problem - e.g. "im trying to do x as per the docs, but if you look at my screen i get an 1003 error code any idea what went wrong?".

Or, people on a different team whose microservice talks to mine when debugging an issue or fleshing out an API spec.

It's true that this isnt possible in plenty of organizations due to the culture, but lots of organizations are broken in all sorts of ways that set piles of cash on fire. This one isnt unique.


you're missing the context of this thread. for the purpose of code quality/review, it can only work if the other person is a programmer.

I interpreted what they meant as "pairing doesnt work with non coders doing non coding design/architecture/requirements".

Not "pair programming doesnt work with non programmers doing pure programming" coz it doesnt make much sense why you'd even attempt to do that. They dont care and they will get in the way.


Maybe it's time to do pair agentic engineering? Have two engineers at the screen, writing the prompts together, and deciding how to verify the results.

I’ve started pair programming with Claude and it’s been pretty fun. We make a plan together, I type the code and Claude reviews it. Then we switch.

You’ve made the analogy but I don’t think you’re actually doing an analogous thing. I think you’re just talking about code review.

You are exactly correct. As to why it’s unpopular, I believe it’s just that no one has given it a fair try. Once you have done it for at least 20 hours a week for a few weeks you will understand that typing is not and has never been the bottleneck in programming. If you have not tried it then you cannot have an opinion.

> You are exactly correct. As to why it’s unpopular, I believe it’s just that no one has given it a fair try. Once you have done it for at least 20 hours a week for a few weeks you will understand that typing is not and has never been the bottleneck in programming. If you have not tried it then you cannot have an opinion.

I haven't tried pair programming except in very ad-hoc situations, but doing it all the time sounds utterly exhausting. You're taking programming, then layering on top of it a level of constant social interaction over it, and removing the autonomy to just zone out a bit when you need to (to manage stress).

Basically, it sounds like turning programming into an all-day meeting.

So I think it's probably unpopular because most software engineers don't have the personalty to enjoy or even tolerate that environment.


Yeah, I’d have a mental breakdown within weeks if I had to pair more than an hour a day, max (even that much, consistently, would probably harm my quality of life quite a bit—a little every now and then is no big deal, though). No exaggeration, it’d break me in ways that’d take a while to fix.

[edit] I’m not even anti-social, but the feeling of being watched while working is extremely draining. An hour of that is like four hours without it.


Well as the person you are replying to said, it's hard to have an opinion when you haven't actually tried it. I don't find it like that at all. Also, it doesn't mean you get NO solo time. Pairs can decide to break up for a bit and of course sometimes people aren't in leaving your team with an odd number of people, so some _has_ to solo (though sometimes we'd triple!)

But it's something you have to work at which is definitely part of the barrier. Otherwise, saying it sucks without giving it a real try is akin to saying, "I went for a run and didn't lose any weight so I feel that running is exhausting with no benefit."


> Well as the person you are replying to said, it's hard to have an opinion when you haven't actually tried it. I don't find it like that at all.

I don't need to try pair programming because I know how that level of constant social interaction makes me feel.

> Otherwise, saying it sucks without giving it a real try is akin to saying, "I went for a run and didn't lose any weight so I feel that running is exhausting with no benefit."

No, what you're doing is sort of like if you're raving about the beach, and I say I don't like bright sun, and you insist I need to try the beach to have an opinion on if I like it or not.


I wouldn't call "work" social interaction but I get ya. It's my biggest pet peeve of this industry: it has a whole lot of people who just don't want to talk to anyone. It is what it is, though.

> I wouldn't call "work" social interaction but I get ya.

IMHO, social interaction is anything where you interact with other people.

> It's my biggest pet peeve of this industry: it has a whole lot of people who just don't want to talk to anyone.

That's very black and white thinking. I like talking to other people, but too much of it is draining. Every day spending all-day or even a half-day working directly with someone else? No thanks.


It's not black and white because that is my whole point: you have to push through the terribleness at the beginning to start feeling the benefits, and most people aren't willing to. I'm a _massive_ introvert myself, btw. But like, I'm not trying to convince you of anything.

I agree. The main reason people give for not liking it is that they say _they_ find it exhausting. _Everyone_ finds it exhausting, at least at first. That mostly stops being the case after a while, though. It can still be tiring but it found it to be a good kind of tiring because we were getting so much done. The team I used to pair on worked incredibly quickly that we started doing 7 hour days and no one noticed (although eventually we came clean).

I find it depressing and dystopian that people are now excited about having a robot pair.


  > Most of what's planned falls down within the first few hours of implementation.
Planning is priceless. But plans are worthless.

I am stealing that quote.

I can not claim it )) I think that Eisenhower said it first.

This might be true for tech companies, but the tech department I am in at a large government could absolutely architecture away >95% of 'problems' we are fixing at the end of the SDLC.

“Everyone has a plan until they get punched in the mouth" - Mike Tyson

Agreed completely. I’ve worked at a couple places that want to design session everything to death and then meticulously convert that design by committee into story requirements to the point that no actual engineering is even needed from the engineer. On top of that, the usual problem occurs - turns out there actually was a lot of unknowns, and now that 2-4 hours you spent with 5-10 other people meticulously crafting the story and execution plan has been completely wasted as the requirements and design shift by extension. It infuriates me to no end that others within the org don’t see how frequently we do redo these meticulously written stories and what a waste of time that is.

I think this makes an assumption early on which is that things are serialized, when usually they are not.

If I complete a bugfix every 30 minutes, and submit them all for review, then I really don't care whether the review completes 5 hours later. By that time I have fixed 10 more bugs!

Sure, getting review feedback 5 hours later will force me to context switch back to 10 bugs ago and try to remember what that was about, and that might mean spending a few more minutes than necessary. But that time was going to be spent _anyway_ on that bug, even if the review had happened instantly.

The key to keeping speed up in slow async communication is just working on N things at the same time.


Not sure, but there might be a misunderstanding here:

The value of your bug fix is cashed out only when it reaches the customer, not when you have finished implementing it.

There is a cost of delay for value to reach the customer, and we want that delay to be as short as possible.

So it doesn't matter if you fix 10 bugs because your 10th bug is going to reach production 5x10 hours (that's an exageration, but you get the point) after you had fixed it (which is why the article mentions latency and not touch time)

You can tell me "yes but I also participate in the code review effort in parallel). Yes, but then you are not fixing 10 bugs, you are fixing less and reviewing more, and reviews take longer than implementation (especially with LLMs now in the loop).

It's because of the pretty counter-intuitive Little's Law : the more in-progress you have in parallel, the slower it will get for each item to be completed.


I made a queuing theory calculator for this: https://joshmoody.org/blog/number-of-agents/#number-of-agent...

Although you'll have to mentally replace the word "agent" with "PR" for it to make sense in this context. The math is the same. It all boils down to how much those context switches costs you. If it's a large cost, then you can get a huge productivity boost by increasing review speed.

In the "show calculations" section, the amount of wasted time caused by context switching is the delta between the numbers in the phrase "T_r adjusted from 30.0 to 35 minutes". That number is increases as context switching cost and "average agent time" (AKA "average PR review time") goes up.


Yeah I'd argue that the beginner friendly version of the rule is probably "Never use exact == or != for floating point variables" and the slightly more advanced one is "Don't use it unless the value you are comparing to is the constant 0.0".

Before isnan() the Fortran test for NaN was (x .ne. x), assuming an IEEE 754 implementation.

I wish that (still) worked reliably, but it can unfortunately get one into trouble with some compilers and some optimization modes that assume that NaNs are undefined behavior.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: