Hacker Newsnew | past | comments | ask | show | jobs | submit | derefr's commentslogin

You're essentially describing what a silicon engineer would call independent "clock domains" (the stations) and "clock-domain-crossing signals" (the workpieces.) And, indeed, you would also tend to handle clock-domain-crossing signals by sticking an async FIFO between the two clock domains.

This approach works well, I agree. But I keep wishing that I could invert it. The architecture I feel like I keep yearning for, is a traditional CLI program that encodes most workflow knowledge/decisions as real code; but which does "just a little bit of coding agent invocation" during one specific workflow step.

Not sure how to accomplish this. Anyone have any suggestions? Are there libraries for this yet? (And how would they even work? It feels like, to do this right, there would have to be some background service that CLI software could expect to interact with via a well-known local IPC socket — similar to how e.g. the docker daemon works. But I'm unaware of any coding agent software/frameworks that expose such an IPC capability...)


I’m building this! It was originally designed for human accessibility for interactive CLIs, but it turned out to be really useful for giving agents the ability to follow structured workflows.

It runs as a background terminal that the agent can observe, and then exposes all interaction options as structured commands that can be run from the foreground CLI which then update the state of the background terminal via IPC. My hope is to establish a sort of “ARIA for terminals” standard to improve accessibility for both humans and agents. Email in profile, ping me if you’re interested in giving it a spin (just have plugins for Inquirer + Commander right now, hoping to broaden to other frameworks & TUIs soon).


I reverted this due to impending billing changes, but Claude and most LLM providers to my knowledge do offer a way to directly fire a prompt to the LLM in a "headless" or non-interactive mode. Specifically "claude -p <your_prompt_here>" is the way to do it with Claude Code. It allows for using the agent to do a one-off command with a given structured prompt. Originally Lathe would use this from the Go application to allow you to extend a tutorial directly from the UI without directly interacting with the LLM.

You'd have to exec out, so it's alittle clunkier than an IPC, but I think you could achieve what you want with it.


That's almost it, yes.

But in my experience, to actually get where they're going quickly (as opposed to spending hours and hundreds of dollars stumbling around in the dark), coding agents generally need more interactive hand-holding than that. If you just fire off one non-interactive session and wait for it to come to a stop, the problem usually isn't fully+correctly solved at the point at the LLM decides to "finish." And if you then start another non-interactive session to continue the work, the new session will have lost the old session's state/memory/context, and so will stumble through many of the same mistakes / misapprehensions.

What you really want, for a CLI program with a "use coding agent to do X" workflow-step, is for the CLI program to play the role of a human in a temporary durable coding-agent conversation session: prompting the agent; then waiting for it to finish responding (and side-effecting); then either asking the agent itself to evaluate an "am I done yet" predicate with a constrained output syntax; or having the CLI program do its own out-of-band validation of the changes made to the shared state by the agent; where, in either case, if the agent isn't "done yet", then the workflow step must continue poking it — or prompt the human to make a decision on how to proceed (possibly involving providing direct input to the LLM, but this is not ideal; ideally the CLI "abstracts away" the need for the end-user to understand the intricacies of the conversation the program is having with the LLM. Even more ideally, the conversation just whizzes by and the human doesn't have to think about an LLM being involved at all.)

Basically, think of this not as the CLI program saying to an agent "answer me this question" or "edit this file for me", but rather, the CLI program popping open a mini "guided + 99%-of-the-time automated" TUI coding-agent micro-IDE "inside" the workflow, in about the same way that git pops open your EDITOR inside `git commit`.


> Basically, think of this not as the CLI program saying to an agent "answer me this question" or "edit this file for me", but rather, the CLI program popping open a mini "guided + 99%-of-the-time automated" TUI coding-agent micro-IDE "inside" the workflow, in about the same way that git pops open your EDITOR inside `git commit`.

Isn't this simply having your mechanistic script call `claude "Prompt that is well honed to provide a mini, guided, 99%-of-the-time automated LLM action to $THE_THING"`? And, possibly including some `--allowed-tools`?


I think the GP used the word "tuned" incorrectly / to make the wrong point here.

A general-purpose OS is one to which you can build a stack on top of it for any use-case you can think of, and it will cope with whatever stack you lay on it about equally well, because it hasn't been forced into a particular shape where it's much better at some things but much worse at other things. A "jack of all trades, master of none" OS.

Microsoft would call all consumer and server editions of Windows "general-purpose OSes." But Windows Datacenter Edition and Windows IoT Core would be non-general-purpose OSes — the former only exists to run hypervisors/SANs, and it doesn't support "stripping off" that layer, so if you used it for anything else, that layer would always be there, bloating things up; and the latter only exists to run on embedded devices, and it doesn't support "adding back" the extra frameworks and services regular Windows has, that would be required to use it for "more" than embedded use-cases.

An OS being "tuned" for a particular substrate (what the OS is good at running on), meanwhile, has nothing to do with the OS's use-case (what can be run well on the OS.)

An analogy: each mobile OEM's spin of Android only works on that OEM's own phones, because that OEM's phones have the required hardware wired to the right SoC pins, and the Android spin ships with a BSP that defines a device tree that matches that expected wiring. Thus, those OEM Android spins are "tuned for" those phones.

But in the end, they're all just Android phones, and they can all do the same things. All of these Android spins are "general-purpose OSes." They're all made to enable you to put any Android software you like on top of them, and run it just fine. (Contrast Android spins made by industrial vendors specifically for automotive or kiosk use-cases, where a given car company or kiosk manufacturer then produces a hardware-customized-and-tuned spin of that already-appliance-purposed spin. You wouldn't use a car-infotainment Android upstream for other use-cases; you'd have to undo all the car-infotainment stuff.)

Azure Linux is exactly like a phone-OEM "tuning" of Android (and unlike a vertical-specific Android spin.) Azure Linux is also like, for another example, the vendor-specific Linux "distros" [really, tunings] that ship as (usually binary-only) images for various Single-Board Computers.

In all three cases, a "tuned" fork of an OS is still intended to run anything a user might want to run on the platform the "tuned" fork was forked off of. It exposes a general-purpose surface to the developer — just one that happens to do some of the general-purpose things you ask it to do, more performantly than a non-"tuned" OS would on the same hardware/substrate.

And, in all three cases, the "tuned" fork accomplishes that by relying on device-specific knowledge and capabilities (i.e. drivers, device-tree entries, kernel patches, etc) that have been burned into the "tuned" fork rather than upstreamed. There's still a HAL between you and that stuff; your workload doesn't need to know the "tuned" fork has been tuned. It just benefits automatically, from the OS having a deeper understanding of the hardware/substrate.


> Yeah of course, it's a Linux distribution.

That is not a given. There are Linux distributions that run anywhere but are not general-purpose. For example, the various "immutable" Linux distros that exist solely to be used as Kubernetes nodes to host containers.


I get the sense that these disassembly/decompilation projects believe that some types of IP-laden asset data can be shipped embedded into the project — not necessarily "legally", but in that they'll likely get away with doing so indefinitely — as long as:

1. those assets are stored in proprietary formats that only the game code itself understands, and

2. no tool exists in the project to extract the assets from these proprietary formats into open formats, unless that tool itself exists only in source-code form in the codebase, and requires the ROM as an input to compile it (even if in the case of such a tool, the ROM is doing nothing but serving as a "key" to unlock compilation.)

Basically, if you have to prove you have your own copy of the IP in order to make their embedded copy of the IP "legible", then it's very hard to construct an evidence-based DMCA takedown order that actually makes any coherent point about the project "distributing" said IP.

That being said, shipping assets like this at all, even if you "can get away with it", is ultimately just a kind of laziness / shortcut-taking. They do it because there's either no clear/simple/obvious way to automatically extract the given asset data from the ROM (e.g. because the relevant data is split up into various data planes + metadata bits that are stored "exploded" all over the ROM), so they just did it once by hand, committing the results; or because there's no clear/simple/obvious way to store the extracted asset data such that a regular compiler/assembler natively understands how to embed it into the binary in the particular form it was found in the original ROM. (Remember, re-assembling/compiling to the original ROM is always the test these projects use to ensure their disassembly/decompilation is preserving semantics. So they need to replicate every weird layout quirk the original dev tooling imposed upon the original ROM. And sometimes the original dev tooling included special-purpose domain-specific asset-codegen tools that aren't part of regular compiler toolchains.)

What these projects should actually be doing, is taking on the schlep: writing the extract tooling anyway, even if it's just "copy these bytes from here and these bytes from there, and spit them out as hex in an .asm file with this header"; and/or writing matching asset-codegen tooling to the tooling that likely existed in the platform SDK, to run before compile/assemble time, converting the extracted ROM asset files into a form (probably a bunch of little assembly files) that will land in the right places when linked back together to form the original ROM.

And, to be clear, they mostly do do this! These projects are very good at doing this!

But sometimes — especially on a larger project with many contributors — one or two things like this aren't audited properly, and fall through the cracks. Or they start out as temporary "bootstrap" approaches made during a private phase of development to get things working + compiling to a correct image; and then not all of those get cleaned up before the repo gets made public.


Perhaps I'm mistaken but the project doesn't need a copy of the original ROM at all right?

To be clear; I don't really understand the law around this - my own country is based on case law which means that even if I wanted to open source some of my reverse engineered games (I have a few private partial implementations of some old defunct game engines in-progress), the distinct lack of prior cases means, sadly, it's prudent not to release them at all while the companies are still active.


I must be using LLMs very differently than y'all, because I can't think of a single thing I would rely on an LLM that's "dumb as a stump" to do for me.

To me, LLMs are for asking research questions + exploring design spaces + pointing at codebases to investigate bugs. And those all benefit from the model being as "smart" (in terms of both fluid intelligence and burned-in knowledge) as possible.

I'm guessing there exist problems where "intelligence past a certain point" doesn't matter, so these medium-sized models can match the performance of the bigger models. But what problems might those be?


Things that are tedious but simple but I'm unfamiliar with.

"Go add a gh action to compile and deploy this thing and run its tests" is one I've found it's good at. Yes I know how to make a gh pipeline but it's always a hassle to remember what goes where.

Cranking out unit tests is okay. It's good at summarizing things so it's not half bad at writing jsdoc/xmldoc comments.


Can you say more? I don't have any memory of Qualcomm-related scandals(?), but I just read the news; I've never really been a user of their chips.

> The game changer is the unified 128 GB memory. That is the path Apple took years ago. Instead of separate memory for the CPU and GPU, everything shares a single pool. It is increasingly popular.

> The memory is not as fast as dedicated GPU memory, but it is cheap enough while delivering enough bandwidth to run AI models locally.

So, the reason "dedicated GPU memory" is fast, isn't because it's "dedicated"; it's because the types of memory built into GPU cards — GDDR and HBM — are designed for throughput over latency.

Which is to say, GDDR and HBM memory could be shared with the CPU in UMA while still being "fast" (for GPU use-cases.) In fact, the PS4/5 and Xbox 360 / One X / Series consoles have UMA architectures that use GDDR memory as their main memory, with no regular DDR memory to be found.

What I don't understand: why don't we see UMA architectures where there's both regular DDR and GDDR/HBM memory mapped into the address space of the CPU+GPU? That seems like the best of both worlds: you'd have some memory that's "tuned" for random-access CPU usage (regular DDR), and some memory that's "tuned" for streaming GPU usage (GDDR/HBM), but either type of memory can still be put to the use it wasn't "tuned" for, just with slightly-worse performance.

I guess you'd need to do a bit of software work:

1. a bit of work in the OS kernel / malloc library to get CPU workloads to "prefer" allocating DDR memory over the GDDR/HBM memory until they've exhausted DDR memory (or maybe not, if you just tell the kernel the GDDR/HBM memory is something like a zswap thinpool);

2. and a bit of work in supported ML frameworks, to teach them about a hybrid strategy between UMA "allocate anywhere, it's all the same" and NUMA "keep assets in VRAM if possible; if you spill assets to RAM, then they must stream into VRAM on access" (i.e. "at allocation time, allocate as if the system were NUMA, VRAM first then spilling to RAM; but at execution time, use the UMA codepaths, no need to copy RAM into VRAM.")

...but once that's done, it's done.


Theoretically, maybe? But they are completely different interfaces so it would surely get complicated. It's also approaching the current behavior in non-unified memory systems where you have two pools of memory with different performance characteristics. You'll realistically want the CPU to always use low latency memory and the GPU to use high bandwidth memory with very little moving between them.

Keeping in mind, though, that this is a jellybean part. You're supposed to be able to order "a" 5532 without specifying the supplier, because many vendors produce "a" 5532, and they're all the same. Different vendors' 5532s are supposed to be able to be treated as the same SKU — literally dumped into co-mingled stock in warehouses — with no ill consequence!

(And yes, until TI's recent move, that was true of the 5532. All the other vendors' 5532s had matching datasheet specs, including the 22V max input voltage. Because a design that was built for "a" 5532 was usually built to run it up to 100%; and that a vendor couldn't offer their part as a swap-in if it couldn't do that.)

But now, if your purchasing department (or the supplier they purchase from) happens to order TI 5532s — or if the warehouse they're sourcing from has comingled any TI 5532s into the general 5532 stock — then your product is now broken, with no real recourse except to change your entirely supply chain to one that specifically excludes TI.


The EEVBlog[1] video about this has a nice example of only a single chinese manufacturer offering the same stuff as TI now does, even with the same PNP instead of NPN topology. All the others are comparable to the original.

1: https://youtu.be/22ZmmZ67SMY


Would be nice to call it a 5532a or something like that.

> Different vendors' 5532s are supposed to be able to be treated as the same SKU — literally dumped into co-mingled stock in warehouses — with no ill consequence!

That may be true for a small webshop or a brick-and-mortar electronics store (what few of those still exist). Or be true for end users / manufacturers of equipment that includes such a part.

But (afaik) that's not how it works for large reputable distributors like Mouser, Digikey & co. You don't order a generic "5532" there, you order a 5532 from <specified manucturer> there. Part from manufacturer A may, or may not be interchangeable with same-numbered part from manufacturer B. There's even some parts that have same # but very different function between manufacturers. In other words: buyers, designers do your homework.

Likewise in a design, if you specify "5532" that should read as "any manufacturer's 5532 should do". If not (or unsure / untested), one should specify the part including its manufacturer. Or a list of acceptable manufacturer/part# combo's.

Ofcourse changing the spec significantly for a jellybean part like discussed here (and one with many 2nd sources), that's just evil. Change a part like that, give it its own part #.


Dave Jones also did some videos on the LMV321 used in his uCurrent Gold project. The AS5X variant caused issues while others worked fine.

Looking now, a document source suggests the AS5X variant in the parts list... but it's explained in the video around 19:30

https://www.youtube.com/watch?v=1VlKoR0ldIE


>You're supposed to be able to order "a" 5532 without specifying the supplier

This is not true.

>because many vendors produce "a" 5532

This is true, in the sense of a "5532-type part". But you will note that all the 5532 variants have different manufacturer's part numbers (prefixes and suffixes) to prevent this confusion. They don't just do that for branding.

>and they're all the same.

This is emphatically and trivially not true, and it tells me you haven't done the work of carefully comparing data sheet specs across suppliers. Try it, you'll learn something.

>Different vendors' 5532s are supposed to be able to be treated as the same SKU — literally dumped into co-mingled stock in warehouses — with no ill consequence!

That might happen somewhere, but authorized distributors do not do this and volume manufacturers do not do this. You might have an internal part number with an authorized suppliers list that includes more than one variant of 5532 that has been vetted for production.

>And yes, until TI's recent move, that was true of the 5532. All the other vendors' 5532s had matching datasheet specs

Again, emphatically and trivially not true. Take a careful look at the NJM and On Semi data sheets. Spec by spec. Do the work and be amazed.

>the warehouse they're sourcing from has comingled any TI 5532s into the general 5532 stock

Authorized distributors do not do this. It gets hairy when you're sourcing NOS from grey market dealers for old designs or in severe part crunches like 2020-2022 era, but that's a different story.

>no real recourse except to change your entirely supply chain to one that specifically excludes TI

This concept is backwards. You would have an internal part number for 5532-type op amp, and it would have an authorized vendors list that would only include vetted parts. "Any 5532 but TI" is asking for trouble from someone else.

And parts do change or get updated and if you are buying from authorized distributors for production you and your supply chain and quality people will get product change notices. At that point it's your job (or the component engineer's, if you're fortunate enough to have one) to validate the new version or find a suitable alternate.


I imagine it's a lot like FPGAs:

- the hardware you need for a production use-case is relatively small, because production {models, bitstreams} have been heavily size-optimized, stripping out everything not needed to get a good result for the target use-cases

- but the hardware you need when tinkering/learning how to design {compute kernels, IP blocks} in the first place, must be quite a bit more powerful / higher-capacity, because your experiments will intentionally be the opposite of optimized: they'll be built for legibility / introspectability / debuggability at every level, which massively inflates and de-optimizes the resulting {model, bitstream}.

(And, to be clear here, "running someone else's finished model, which was designed and optimized to be used on something like a 4090, against your own prompt" is a kind of experimenting, which is cheap, in the same way that "deploying someone else's pre-baked FPGA bitstream, that was designed and synthesized for a $20 target FPGA, onto your own instance of that $20 FPGA, and then feeding your own input signals to it" is cheap. But that's not the kind of experimenting you'd be doing in this course while learning to design your own models!)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: