Hacker Newsnew | past | comments | ask | show | jobs | submit | danabramov's commentslogin

I agree with your parent that the AI writing style is incredibly frustrating. Is there a difficulty with making a pass, reading every sentence of what was written, and then rewriting in your own words when you see AI cliches? It makes it difficult to trust the substance when the lack of effort in form is evident.

My suspicion is that the problem here is pretty simple: people publishing articles that contain these kinds of LLM-ass LLMisms don't mind and don't notice them.

I spotted this recently on Reddit. There are tons of very obviously bot-generated or LLM-written posts, but there are also always clearly real people in the comments who just don't realize that they're responding to a bot.


I think it's because LLMs are very good at tuning into the what the user wants the text to look like.

But if you're outside that and looking in the text usually screams AI. I see this all the time with job applications even those that think they "rewrote it all".

You are tempted to think the LLMs suggestion is acceptable far more than you would have produced it yourself.

It reminds me of the Red Dwarf episode Camille. It can't be all things to all people at the same time.


People are way worse at detecting LLM written short form content (like comments, blogs, articles etc) then they believe themselves to be...

With CVs/job applications? I guarantee you, if you'd actually do a real blind trial, you'd be wrong so often that you'd be embarrassed.

It does become detectable over time, as you get to know their own writing style etc, but it's bonkas people still think they're able to make these detections on first contact. The only reason you can hold that opinion is because you're never notified of the countless false positives and false negatives you've had.

There is a reason why the LLMs keep doing the same linguistic phrases like it's not x, it's y and numbered lists with Emojis etc... and that's because people have been doing that forever.


It's is RLHF that dominates the style of LLM produced text not the training corpus.

And RLHF tends towards rewarding text that first blush looks good. And for every one person (like me) who is tired of hearing "You're making a really sharp observation here..." There are 10 who will hammer that thumbs up button.

The end result is that the text produced by LLMs is far from representative of the original corpus, and it's not an "average" in the derisory sense people say.

But it's distinctly LLM and I can assure you I never saw emojis in job applications until people started using Chatgpt to right their personal statement.


> There is a reason why the LLMs keep doing the same linguistic phrases like it's not x, it's y and numbered lists with Emojis etc... and that's because people have been doing that forever.

They've been doing some of these patterns for a while in certain places.

We spent the first couple decades of the 2000s to train ever "business leader" to speak LinkedIn/PowerPoint-ese. But a lot of people laughed at it when it popped up outside of LinkedIn.

But the people training the models thought certain "thought leader" styles were good so they have now pushed it much further and wider than ever before.


>They've been doing some of these patterns for a while in certain places.

This exactly. LLMs learned these patterns from somewhere, but they didn't learn them from normal people having casual discussions on sites like Reddit or HN or from regular people's blog posts. So while there is a place where LLM-generated output might fit in, it doesn't in most places where it is being published.


Yeah, even when humans write in this artificial, punched-to-the-max, mic-drop style (as I've seen it described), there's a time and a place.

LLMs default to this style whether it makes sense or not. I don't write like this when chatting with my friends, even when I send them a long message, yet LLMs always default to this style, unless you tell them otherwise.

I think that's the tell. Always this style, always to the max, all the time.


Also with CVs people already use quite limited and establish language, with little variations in professional CVs. I image LLMs can easily replicate that

> people publishing articles that contain these kinds of LLM-ass LLMisms don't mind and don't notice them

That certainly seems to be the case, as demonstrated by the fact that they post them. It is also safe to assume that those who fairly directly use LLM output themselves are not going to be overly bothered by the style being present in posts by others.

> but there are also always clearly real people in the comments who just don't realize that they're responding to a bot

Or perhaps many think they might be responding to someone who has just used an LLM to reword the post. Or translate it from their first language if that is not the common language of the forum in question.

TBH I don't bother (if I don't care enough to make the effort of writing something myself, then I don't care enough to have it written at all) but I try to have a little understanding for those who have problems writing (particularly those not writing in a language they are fluent in).


> Or translate it from their first language if that is not the common language of the forum in question.

While LLM-based translations might have their own specific and recognizable style (I'm not sure), it's distinct from the typical output you get when you just have an LLM write text from scratch. I'm often using LLM translations, and I've never seen it introduce patterns like "it's not x, it's y" when that wasn't in the source.


That is true, but the “negative em-dash positive” pattern is far from the only simple smell that people use to identify LLM output. For instance certain phrases common in US politics have quickly become common in UK press releases do to LLM based tools being used to edit/summarise/translate content.

What is it about this kind of post that you guys are recognizing it as AI from? I don't work with LLMs as a rule, so I'm not familiar with the tells. To me it just reads like a fairly sanitized blog post.

It's not like we are 100% sure, it's possible a real human would be writing like this. This particular style of writing wasn't as prevalent before, it was something more niche and distinct. Now all the articles aren't just looking like a fairly sanitized blog posts - they are all looking the same.

I see this by far the most on Github out of all places.

I am seeing it more and more here as well to be honest.

I called one out here recently with very obvious evidence - clear LLM comments on entirely different posts 35 seconds apart with plenty of hallmarks - but soon got a reply "I'm not a bot, how unfair!". Duh, most of them are approved/generated manually, doesn't mean it wasn't directly copy-pasted from an LLM without even looking at it.

Will do better next time.

Great that you are open to feedback! I wish every blogger could hear and internalize this but I'm just a lowly HN poster with no reach, so I'll just piss into the wind here:

You're probably a really good writer, and when you are a good writer, people want to hear your authentic voice. When an author uses AI, even "just a little to clean things up" it taints the whole piece. It's like they farted in the room. Everyone can smell it and everyone knows they did it. When I'm half way through an article and I smell it, I kind of just give up in disgust. If I wanted to hear what an LLM thought about a topic, I'd just ask an LLM--they are very accessible now. We go to HN and read blogs and articles because we want to hear what a human thinks about it.


Seconding this. Your voice has value. Every time, every time, I've seen someone say "I use an LLM to make my writing better" and they post what it looked like before or other samples of their non-LLM writing, the non-LLM writing is always what I'd prefer. Without fail.

People talk about using it because they don't think their English is good enough, and then it turns out their English is fine and they just weren't confident in it. People talk about using it to make their writing "better", and their original made their point better and more concisely. And their original tends to be more memorable, as well, perhaps because it isn't homogenized.


I'm particularly fond of your fart analogy. It successfully captures the current AI zeitgeist for me.

[flagged]


I appreciate the support for the author, but the dismissal of critics as non-content producers misses that he's replying to Dan Abramov, primary author of the React documentation, and a pretty good intro Javascript course, among other things.

That reply was from Dan Abramov, feel free to go see how little work and writing he's doing.

Your comment on HN, 6 days ago:

>No one actually wants to spend their time reading AI slop comments that all sound the same.

Lol. Lmao even.


But they "wrote" it in 10% of the time. It implies there are better uses of their time than writing this article.

Then there are better uses of my time than reading it.

There is surely no difficulty, but can you provide an example of what you mean? Just because I don't see it here. Or at least like, if I read a blog from some saas company pre-LLM era, I'd expect it to sound like this.

I get the call for "effort" but recently this feels like its being used to critique the thing without engaging.

HN has a policy about not complaining about the website itself when someone posts some content within it. These kinds of complaints are starting to feel applicable to the spirit of that rule. Just in their sheer number and noise and potential to derail from something substantive. But maybe that's just me.

If you feel like the content is low effort, you can respond by not engaging with it?

Just some thoughts!


It's incredibly bad on this article. It stands out more because it's so wrong and the content itself could actually be interesting. Normally anything with this level of slop wouldn't even be worth reading if it wasn't slop. But let me help you see the light. I'm on mobile so forgive my lack of proper formatting.

--

Because it’s not just that agents can be dangerous once they’re installed. The ecosystem that distributes their capabilities and skill registries has already become an attack surface.

^ Okay, once can happen. At least he clearly rewrote the LLM output a little.

That means a malicious “skill” is not just an OpenClaw problem. It is a distribution mechanism that can travel across any agent ecosystem that supports the same standard.

^ Oh oh..

Markdown isn’t “content” in an agent ecosystem. Markdown is an installer.

^ Oh no.

The key point is that this was not “a suspicious link.” This was a complete execution chain disguised as setup instructions.

^ At this point my eyes start bleeding.

This is the type of malware that doesn’t just “infect your computer.” It raids everything valuable on that device

^ Please make it stop.

Skills need provenance. Execution needs mediation. Permissions need to be specific, revocable, and continuously enforced, not granted once and forgotten.

^ Here's what it taught me about B2B sales.

This wasn’t an isolated case. It was a campaign.

^ This isn't just any slop. It's ultraslop.

Not a one-off malicious upload.

A deliberate strategy: use “skills” as the distribution channel, and “prerequisites” as the social engineering wrapper.

^ Not your run-of-the-mill slop, but some of the worst slop.

--

I feel kind of sorry for making you see it, as it might deprive you of enjoying future slop. But you asked for it, and I'm happy to provide.

I'm not the person you replied to, but I imagine he'd give the same examples.

Personally, I couldn't care less if you use AI to help you write. I care about it not being the type of slurry that pre-AI was easily avoided by staying off of LinkedIn.


> being the type of slurry that pre-AI was easily avoided by staying off of LinkedIn

This is why I'm rarely fully confident when judging whether or not something was written by AI. The "It's not this. It's that" pattern is not an emergent property of LLM writing, it's straight from the training data.


I don't agree. I have two theories about these overused patterns, because they're way over represented

One, they're rhetorical devices popular in oral speech, and are being picked up from transcripts and commercial sources eg, television ads or political talking head shows.

Two, they're popular with reviewers while models are going through post training. Either because they help paper over logical gaps, or provide a stylistic gloss which feels professional in small doses.

There is no way these patterns are in normal written English in the training corpus in the same proportion as they're being output.


> Two, they're popular with reviewers while models are going through post training. Either because they help paper over logical gaps, or provide a stylistic gloss which feels professional in small doses.

I think this is it. It sounds incredibly confident. It will make reviewers much more likely to accept it as "correct" or "intelligent", because they're primed to believe it, and makes them less likely to question it.


Its prevalence in contexts that aren't "LinkedIn here's what I learnt about B2B sales"-peddling are an emergent property of LLM writing. Like, 99% of articles wouldn't have a single usage of it pre-LLMs. This article has like 6 of them.

And even if you remove all of them, it's still clearly AI.

People have hated the LinkedIn-guru style since years before AI slop became mainstream. Which is why the only people who used it were.. those LinkedIn gurus. Yet now it's suddenly everywhere. No one wrote articles on topics like malware in this style.

What's so revolting about it is that it just sounds like main character syndrome turned up to 11.

> This wasn’t an isolated case. It was a campaign.

This isn't a bloody James Bond movie.


I guess I just dont get the mode everyone is in where they got the editor hats on all the time. You can go back in time on that blog 10+ years and its all the same kind of dry, style guided, corporate speak to me, with maybe different characteristics. But still all active voice, lots of redundancy and emphasis. They are just dumb-ok blogs! I never thought it was "good," but I never put attention on it like I was reading Nabakov or something. I get we can all be hermeneuts now and decipher the true AI-ness of the given text, but isn't there time and place and all that?

I guess I too would be exhausted if I hung on every sentence construction like that of every corporate blog post I come across. But also, I guess I am a barely literate slop enjoyer, so grain of salt and all that.

Also: as someone who doesn't use the AI like this, how can it become beyond the run of the mill in slop? Like what happened to make it particularly bad? For something so flattening otherwise, that's kinda interesting right?


Everyone has hated "LinkedIn-guru here's what I learnt about B2B sales"-speak for many years. Search HN for LinkedIn speak, filter by date before 2023. Why would people stop hating it now? That's the style it's written in. Maybe you just didn't know that people hated it, but most always have. I'm sure that some people hate it only because it's AI, but seriously, it's been a meme for years.

Thank you. I am in the confusing situation of being extremely good at interpreting the nuance in human writing, yet extremely bad at detecting AI slop. Perhaps the problem is that I'm still assuming everything is human-written, so I do my usual thing of figuring out their motivations and limitations as a writer and filing it away as information. For example, when I read this article I mostly got "someone trying really hard to drive home the point that this is a dangerous problem, seems to be over-infatuated with a couple of cheap rhetorical devices and overuses them. They'll probably integrate them into their core writing ability eventually." Not that different from my assessment of a lot of human writing, including my own. (I have a fondness for em-dashes and semicolons as well, so there's that.)

I haven't yet used AI for anything I've ever written. I don't use AI much in general. Perhaps I just need more exposure. But your breakdown makes this particular example very clear, so thank you for that. I could see myself reaching for those literary devices, but not that many times nor as unevenly nor quite as clumsily.

It is very possible that my own writing is too AI-like, which makes it a blind spot for me? I definitely relate to https://marcusolang.substack.com/p/im-kenyan-i-dont-write-li...


For what it’s worth, the point of React is that you can just fix that Radio component to be an input (if that makes sense) and it’ll just be an input.

React gives you boxes to put stuff into but you decide what to put into them. Then React ensures that you can change what’s in those boxes without breaking anything. That’s the power of component abstraction.


> That’s the power of component abstraction.

Yes. But React isn’t the only way to do components. Unfortunately, to the inexperienced, it is.


What are some much better ways to do components?

React with is so prevalent because it's a deep local optimum.


So is a span or div element? What am I missing here?


The parent comment is seemingly blaming React for the decisions of Shadcn for some reason.

There’s nothing about React that requires you to overcomplicate your DOM (unlike many other UI frameworks).


The point I wanted to emphasize is that even if you do overcomplicate your DOM, the component abstraction is what allows you to fix it in one place. Don't like what's in your component — add `return <input />`, bam! It's fixed across the entire app now.


And how is the surrounding JS code, like the event handlers, and the CSS of the component supposed to still work now? A radio input will need at the very least additional CSS to remove the native appearance. Unlikely that was set already --> it's not that easy.


The idea is that the component's API is not the DOM. Usually this means your data should flow in a certain way: top-down.

Application code is not supposed to use the DOM as the source of truth for some boolean state that the checkbox is an input for.

You don't usually read a component's state from outside (here: the "checked" property).

Instead you define an API where data only flows top-down.

When your checkbox component follows this paradigm, it is "controlled", and if it contains a standard HTML input, that input's "checked" DOM object property is bound to the data passed into the component ("props"). Clicking it won't check it anymore until you add an "onClick" callback and pass a function into this callback that will make the "checked" prop change.

The checkbox is now "controlled" and it's state was "lifted up" (meaning that it is determined not by the checkbox component itself).

"controlled" means you tell React to always force the "checked" DOM property to be the same as the "checked" prop you pass into the component. You do this by assigning to the reflected "checked" HTML attribute in JSX.

When your components only use this "top-down" data flow, they're "pure" in React lingo. Because they look like pure functions: props => DOM fragment. The machinery behind the scenes means they're not actually that (something has to coordinate the rendering).

But if you don't use internal state (e.g. useState hook) or global stores, these "impure" parts are React internals only, and you can have a mental model that views the component like a pure function.

This makes it easier to connect it with other components in a tree.

For example:

HTMLInputElement.checked can be true without a "checked" attribute being in the markup.

If you want to have some text next to it that says "checked / not checked" you have to wire stuff, and this stuff depends on your markup.

If you have a "controlled" checkbox, you have a tree, not only for the markup, but also for the data: the boolean "checked" state can now be declared one level above both the info text and the checkbox. Then the info text doesn't care at all about events anymore.

And the checkbox component only uses a callback that is also independent from the exact markup structure (e.g. a selector for the HTML input element).

You don't need to read from the checkbox to update the text. You feed both with a boolean and both can be "pure" components. The checkbox gets a "onClick" callback and it's checked state is no longer internal, it's "controlled".

The wiring you have to do instead of the regular DOM events (which would read the input's state) is now to use your "onClick" callback to toggle your boolean.

Internally, in the component, you do whatever you need to read and write to the DOM. But usually that just means "what markup do I return".

Input elements and reflected attributes such as "checked" are already a relatively complex case.

And, you can escape the recommended top-down data flow by many means (refs, context, accessing centralized "stores" from within the component...), but that's often where it gets ugly. But you need to do it often when your app gets bigger (centralized data stores), or when you implement things like UI libraries (refs).


Which also allows to create an overcomplicated jack-of-all-trades component? After all, it's fun and can be justified via the "write once" argument.


It never happens if you enable the lint rule.


I remember how react team's message, around the time hooks were introduced, was how hooks were going to save us from the tyranny of `this`, which people presumably found confusing.

I often think back to that message, while adding things in a dependency array. Especially those things that I know won't change (e.g. the redux `dispatch` function pulled from the context), but the linter doesn't. Or while being admonished by the linter for reading from a ref, or writing to it.


Different apps can have different notions of a profile so you'd probably have one per app.


Yes. I describe this in this part of the article: https://overreacted.io/a-social-filesystem/#:~:text=One%20ch...

It's basically event sourcing. You listen to the data you care about from the network and update the local index (DB). There are also tools like Tap (https://docs.bsky.app/blog/introducing-tap) that do the plumbing work and let you backfill automatically.


Ah, I'm sorry, I somehow skipped over that bit entirely. Need more coffee, I think


Fine-grained permissions are already shipped btw, though the documentation could be better.


I think most people's mental model is that they should be able to change their handle / display name / avatar freely, and their posts would display the new versions. So those aren't a part of the post itself.

That said, you could create an AT app that displays a version of the post using the profile at the time. You'd just need to index all profile changes into a local database, and then query that database for the "profile at that time" to display the post. So what you're describing is possible—it just requires a different aggregation. The source of truth, however, should be denormalized and reflect most recent data.


I'm using "filesystem" a bit loosely here.

The important parallel I was going for was "file format" as interface between apps (= lexicons being an interface between social apps).

If you want details on the actual data structures, check https://atproto.com/specs/repository.


It makes the landscape competitive. There isn't really a notion of "moving" to a "platform". It's more like if Twitter sucks, some team can spin out an alternative that has the existing content but takes different product decisions. And then some people can try this alternative and use it without leaving the existing network. So, it allows more experimentation in the market without having to solve the cold start problem. The lifecycle of products becomes more fluid. It's easy to spin something up, and you can also shut things down without permanently killing them.


See https://www.pfrazee.com/leaflets/3lzhui2zbxk2b for some recent thoughts from a team member.


Bnewbold's comment on GitHub is still the best resource. It's far more specific in scope and details. Paul's note is more theoretical / philosophical



yes, that is the one everyone is using as a guiding light for where we think Bsky is headed for their private data needs, which is group-shared / not e2ee because key mgmt is an unsolved problem at scale. They need to support creators with subscriber only content. E2EE will emerge from the messaging space first, i.e. Signal / MLS like

Bsky is likely to put out a short-term gap filler this year for personal-private data, which kind of already exists today for Bsky only. This work would make that same feature in the per-user repo available to all apps. Right now it's hard coded to a specific NSID and single db row


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: