More

Firfi · 2026-04-10T16:28:58 1775838538

In my view, the edge cases like so (I think peasant railgun even mentioned in dm handbook) are more of a community problem than the game's. If it can be called a problem, of course - some tables enjoy those shenanigans, some don't.

What players and DMs are forgetting more often than not is the wording somewhere in the start of dm book: dm can overrule any rule. [to facilitate the game mood and direction that the table has agreed upon] [and a larger overarching problem is probably that there's often no such agreement before the game]

bluefirebrand · 2026-04-10T16:42:02 1775839322

> [and a larger overarching problem is probably that there's often no such agreement before the game]

Agreed

I personally think Rule 0 enables bad DMs a lot more frequently than good ones. I think it's a bad rule

Firfi · 2026-04-10T16:22:26 1775838146

That's the plan! D&D combat can be a slog sometimes, and when it is, that kills a lot of fun for me as a story-first approach adept. I'd really just ask about that or that rule from a chatbot, or have a list of weighted actions presented to me at my turn. That's where I'm moving towards - a good spec is hopefully what should enable that direction. Hopefully...

Firfi · 2026-04-10T16:03:57 1775837037

Sorry for that, everyone. I did use the AI to help me with structure and English. I thought I'd proofread and edited that enough to be readable, but apparently it still smells. I'll update the wording soon.

nz · 2026-04-10T16:30:20 1775838620

Or you can just write in your native language, and let us machine-translate it? Just a thought. We are, perhaps, letting ourselves be held back by norms that no longer bear any load.

Firfi · 2026-04-10T16:32:35 1775838755

That's a great idea, in fact. I'll try it out next time. Maybe even a mix, because I do sometimes want to be very specific about some expressions and experiment with wordplay

Firfi · 2026-04-07T13:15:41 1775567741

Dungeons & Dragons rules are a spec spanning thousands of pages, not formalized, but thoroughly tested by the community. Moving them to a formal specification language (Quint) was an obvious next step. It worked and proved to also be a great LLM self-checker.

anentropic · 2026-04-10T12:50:47 1775825447

Fantastic, I'd been daydreaming about doing similar for a while!

Do I understand correctly that the Quint code is not needed 'at runtime', that it's there for model-based testing of the XState implementation?

Firfi · 2026-04-10T15:49:53 1775836193

Right. Quint is not used in runtime and is not supposed to be. It's a strong testing layer. But there's much more to it. My bigger idea is that I would generate whatever implementation from it, hopefully, with an agentic loop - the MBT test is a natural feedback harness to leave coding overnight. So dnd-rust at some point, maybe? If someone develops a game, they would be able to generate a core logic in rust for bevy, in c# for unity, in (whatever it's used there) for godot. That's in an ideal world.

anentropic · 2026-04-11T11:12:46 1775905966

Yes that's exactly what I started thinking, this is great work - thank you!

Firfi · 2025-06-16T15:10:00 1750086600

Thank you (twice!) for reading it. The idea wrapped in a scandalous topic indeed, but the PR process ambiguity was also what was the last straw for me to write it.

Firfi · 2025-06-09T16:14:05 1749485645

I actually do recommend reviewing manually - it's just very convenient to see when a person wrote something (then much more scrutiny can be applied) vs. when the work was outsourced to AI. I feel like there is another application though, but didn't mention it for it's not that clear to me yet: you can yet again estimate whether a new programmer can actually code or if they 10x YOLO their way slowly bringing codebase maintainability down.

Firfi · 2025-06-09T15:34:29 1749483269

After some vibe coding frustrations, ups and downs, I found that splitting the code explicitly into well-curated, domain-heavy guidance code and code marked “slop” can solve a lot of frustration and inefficiency.

We can be honest in our PR, “yes, this is slop,” while being technical and picky about code that actually matters.

The “guidance” code is not only great for preserving knowledge and aiding the discovery process, but it is very strong at creating a system of “checks and balances” for your AI slops to conform to, which greatly boosts vibe quality.

Helps me both technically (at least I feel so) with guiding claude code to do exactly what I want (or what we agreed to!) and psychologically because there's no detachment from the knowledge of the system anymore.

PaulHoule · 2025-06-09T16:01:14 1749484874

Lately I've been thinking "there is no such thing as an application, there are only screens" in the context of HTMX-enhanced web applications.

If your persistence layer and long-term data structures are solid you can accept shoddy coding in screens (e.g. a small bundle of http endpoints.) From that viewpoint you modernize an application a screen at a time and if you don't like a shoddy screen you create a new screen. From that viewpoint you vibe code screens but schemas and updating are carefully handwritten code, though I think deterministic code generation from a schema is the power tool for that.

SoftTalker · 2025-06-09T16:20:19 1749486019

Problem is that what "actually matters" isn't always obvious, at least not to everyone.

When they built Citicorp Center, the contractor bolted the steel insstead of welding it. It was thought to be an implementation detail. Bolting was cheaper, and nobody thought it actually mattered. Until the actual engineer who designed it looked more carefully and discovered that as a result the building was more vulnerable to wind loads. Expensive rework was required to open up the interior walls and weld all the bolted connections.

Firfi · 2025-06-09T16:36:25 1749486985

It seems to me we have to find out how to figure out "what matters" to have the benefits that 10x vibe coder bros promise. I think we still have to review (despite my clickbait title), it's just different things that we are looking for in slop, and different type/amount of mental strain required. For more important libs, I guess we can "overshoot" a bit and put more time into vetting vibe code (and making it the guardrail code). While in the "next revolutionary React Todo App" the balance could be much farther towards vibe...

mkleczek · 2025-06-09T15:48:28 1749484108

What is the measured LoC ratio of well-curated to "slop" code?

Firfi · 2025-06-09T15:59:29 1749484769

Just feeling and experience, really. For me, if I spent time with the vibe code snippet and improved it until I can say "yes I would've written this" it's not slop anymore, even if it was written by Claude initially.

On the contrary, if I glanced over the code and could say "ok it doesn't look terrible, no obvious `rm -rf` and all", even if I changed a couple obvious mistakes, I still consider it vibe.

mkleczek · 2025-06-09T16:29:47 1749486587

I was more asking to assess the actual gain.

So the question really is: in your experience how much code requires careful review and re-prompting vs leaving it as "not terrible".

Asking because my experience is that in practice LLMs are no better than juniors - ie. it is more effective to just write the thing by myself instead of multiple rounds of reviewing and re-prompting which does not really achieve what I really want.

Firfi · 2025-06-09T16:53:25 1749488005

That's one of my biggest frustrations - I wasted a lot of time on reprompting. I was making myself stick to 100% LLM approach for a while, in order to learn.

I can't say for everyone, but for me it's hit-and-miss: if LLM starts with "Oh, sorry, you're right" that's a STRONG signal I have to take over right now or rethink the approach, or I get into the doom spiral of reprompting and waste half a day on something I could've done myself by that point, with only difference that after half a day with a coding agent I discovered no important domain or technical knowledge.

So, "how much" to me depends so very much on seemingly random factors, including the time of the day when Antropic decides to serve their quantised version instead of a normal one. On non-random too, like how difficult the domain area is, how well you described it in the prompt, and how well you crafted your system queries. And I hate it very much! At this point, I'm trigger-happy to take over the control and write the stuff that LLM can't in the "controlling package" and tell it to use it as an example / safety check.

mkleczek · 2025-06-09T17:32:34 1749490354

> how well you described it in the prompt, and how well you crafted your system queries.

This part is the most frustrating in discussions about LLMs. Since there are no criteria to measure the quality of your prompting there is really no way to learn the skill. Assessing prompting skills based on the actual results is wrong as it does not isolate the model capabilities.

Hence the whole thing looks a lot like an ancient shamanism.

Firfi · on Jan 9, 2025

Hey folks, sometimes on PRs, I find that people aren't well-versed in the intricacies of code branching. So, I wrote a short article to educate fellow engineers and point to it sometimes so I don't have to explain things over and over.

Specifically, many people apparently don't know yet about exhaustiveness and the statement/expression dichotomy and how TS type system can help with it.

Firfi · on Sept 20, 2024

An app with typescript validators features tests. The test case is based on real use cases but is kept generic. Tests more advanced stuff like algebraic data types, template literals, nominal types, and recursive types. It's not a performance/size benchmark; there's a really nice performance benchmark app that exists already (I linked it there).

Firfi · on April 16, 2024

Hey, sharing my colleague's @juanArias8 article on Solana Mobile here. He'll answer questions/comments in this thread if any popup.