Let me give a concrete example. Hardware "passkeys" - FIDO2 authenticators - are designed such that their credentials are bound to a particular Relying Party (web site). Browsers enforce this by sending the current web domain the user is on to the authenticator when Javascript tries to list, create, or use a credential.
This would be completely broken if Javascript talked directory to a FIDO2 USB device, because the JS could send a Relying Party that is NOT the web site on which the user currently lands.
So Chrome blocks WebUSB from communicating with USB devices whose USB HID descriptor "looks like" a FIDO one, by some hardcoded "not this device" blacklist code in Chrome itself.
But what if what you have connected to your computer is a USB NFC card reader, and the user taps their FIDO authenticator on that? Letting the Javascript communicate directly with the card reader breaks the FIDO security model exactly the same way... but Chrome allows it!
The problem with WebUSB is that it exposes devices that were built under the threat model that only trusted code would be able to access them to untrusted code. The set of devices acceptable for WebUSB use should have been a whitelist instead of a blacklist to be secure. Letting the user choose the device to grant access doesn't solve the problem, because the user doesn't have a way to understand what will happen when the site is granted access, per the FIDO example I gave above.
> But what if what you have connected to your computer is a USB NFC card reader, and the user taps their FIDO authenticator on that?
So the user would need to:
1. Keep the malicious page open, or install a malicious extension
2. Grant access to the card reader from a list of USB devices
3. Then tap their card on that reader
IMO a bad actor is going to have more success getting people to run an executable they made the browser download. There's only so much you can do to protect people from themselves. Not everyone needs software to be locked down like a padded room.
> The problem with WebUSB is that it exposes devices that were built under the threat model that only trusted code would be able to access them to untrusted code.
Which platforms have USB devices locked down to "trusted code" only?
> No refusal fires, no warning appears — the probability just moves
I don't really understand why this type of pattern occurs, where the later words in a sentence don't properly connect to the earlier ones in AI-generated text.
"The probability just moves" should, in fluent English, be something like "the model just selects a different word". And "no warning appears" shouldn't be in the sentence at all, as it adds nothing that couldn't be better said by "the model neither refuses nor equivocates".
I wish I better understood how ingesting and averaging large amounts of text produced such a success in building syntactically-valid clauses and such a failure in building semantically-sensible ones. These LLM sentences are junk food, high in caloric word count and devoid of the nutrition of meaning.
Surely I cannot be the only one who finds some degree of humor in a bunch of nerds being put off by the first gen of "real" AI being much more like a charismatic extroverted socialite than a strictly logical monotone robot.
It's funny but I'm on HN so I can't resist pointing out the joke doesn't math TFA, their argument is that the underlying internet distribution is trained away, not retained.
Maybe the real underlying distribution IS a lot of text from people just spewing out feel good words for socialising. I think that might be side effect of the fact that meaningful text is harder to make even for humans, so there is smaller quantity of meaningful text.
The axis running from repulsive to charismatic, the axis running from hollow to richly meaningful, and the axis running from emotional to observable are not parallel to each other. A work of communication can be at any point along each of those three independent scales. You are implying they are all the same thing.
I always felt like humans that were good at writing that way were often doing exactly what the LLM is doing. Making it sound good so that the human reader would draw all those same inferences.
You've just had it exposed that it is easy to write very good-sounding slop. I really don't think the LLMs invented that.
Sure some people could write well but didn't have a clue but they failed to maintain interest since once you realized the author was no good you bounced once you saw their styled blog.
Now they don't care as they only want the one view and likely won't even bother with more posts at the same site.
That's a great description of the boundary between logical deduction NLP and bullshitting NLP.
I still have hope for the former. In fact, I think I might have figured out how to make it happen. Of course, if it works, the result won't be stubborn and monotone..
Not nearly self-aware enough, if you were to go around saying such things to people in person. What a shocking insult, to tell someone their very voice sounds unhuman! I can't say you should never, of course, but I would hope very much you reserve such calumny only for when it has been thoroughly earned.
But of course this is only a website, where there are in any case no drinks of any sort to go flying for any reason, and where such an ill-considered thing to say can receive a more reasoned response like this, instead.
You're worried about me actually throwing a drink - 'threatening?' Really. - and I'm taking things too seriously? This is a website!
It is, though, interesting to me that you see someone behave in a way you aren't expecting and don't quite know how to wrap your head around - no blame; it's a relatively common experience in my vicinity, though normal people typically enjoy it much more than those here - and your immediate recourse is to assume it must have been generated by AI. That's interesting indeed, and I greatly appreciate you sharing it.
The account is from 2016 maybe this is a real person.
But you sound self important and a bit unhinged. It’s not that no one here can wrap their heads around such behaviour. It’s that your comments sounds like a troll response but could also be real.
Coming to a forum and pretending that you commit crimes when people insult you is a stereotype of a generic fake internet personality that is incredibly prevalent to the point of being boring.
But no, I've meant every word I wrote this evening here, just as I do every other word I say or write, ever. Sometimes those words are sarcastic! In such cases there is rarely any doubt.
Oh, good grief. Flag my comment, then. Per the HN guidelines that is the preferable action:
> Don't feed egregious comments by replying; flag them instead. If you flag, please don't also comment that you did.
Of course I disagree with "egregious," did it need saying. After an insult like that, I promise you, no one in my bar would consider I had acted egregiously at all. But I admit it is a surprise to see you violate the site's discussion guidelines, in the very effort to enforce them.
> "real" AI being much more like a charismatic extroverted socialite
As I said in my opening clause here, I fit that description exactly, and "'real' AI," as my original interlocutor would have it, sounds nothing like me.
The insult arises from the fact that "'real' AI" sounds nothing particularly like anyone, because it isn't any one: if it had eyes there would be nothing happening behind them. This is why it keeps driving people insane: there are cognitive vulnerabilities here which, for most humans, have until a couple of years ago been about as realistic to need to worry about as a literal alien invasion.
To a human, being compared with something which can only pretend to humanity - and that not at all well! - is an insult. It should be an insult, too. Anyone is welcome to try and fail to convince me otherwise.
It's really simple. RL on human evaluators selects for this kind of 'rhetorical structure with nonsensical content'.
Train on a thousand tasks with a thousand human evaluators and you have trained a thousand times on 'affect a human' and only once on any given task.
By necessity, you will get outputs that make lots of sense in the space of general patterns that affect people, but don't in the object level reality of what's actually being said. The model has been trained 1000x more on the former.
Put another way: the framing is hyper-sensical while the content is gibberish.
This is a very reliable tell for AI generated content (well, highly RL'd content, anyway).
Neural networks are universal approximators. The function being approximated in an LLM is the mental process required to write like a human. Thinking of it as an averaging devoid of meaning is not really correct.
> The function being approximated in an LLM is the mental process required to write like a human.
Quibble: That can be read as "it's approximating the process humans use to make data", which I think is a bit reaching compared to "it's approximating the data humans emit... using its own process which might turn out to be extremely alien."
Then again, whatever process we're using, evolution found it in the solution space, using even more constrained search than we did, in that every intermediary step had to be non-negative on the margin in terms of organism survival. Yet find it did, so one has to wonder: if it was so easy for a blind, greedy optimizer to random-walk into human intelligence, perhaps there are attractors in this solution space. If that's the case, then LLMs may be approximating more than merely outcomes - perhaps the process, too.
Its fuzzier than that. Something can be detrimental and survive as long as its not too detrimental. Plus there is the evolving meta that moves the goal posts constantly. Then there's the billions of years of compute...
Negative mutations can survive for a long time if they're not too bad. For example the loss of vitamin C synthesis is clearly bad in situations where you have to survive without fresh food for a while, but that comes up so rarely that there was little selection pressure against it.
An easy counterargument is that - there are millions of species and an uncountable number of organisms on Earth, yet humans are the only known intelligent ones. (In fact high intelligence is the only trait humans have that no other organism has.) That could perhaps indicate that intelligence is a bit harder to "find" than you're claiming.
I don't think of it as "devoid of meaning". It's just curious to me that minimizing a loss function somehow results in sentences that look right but still... aren't. Like the one I quoted.
A human in school might try to minimise the difference between their grades and the best possible grades. If they're a poor student they might start using more advanced vocabulary, sometimes with an inadequate grasp of when it is appropriate.
Because the training process of LLMs is so thoroughly mathematicalised, it feels very different from the world of humans, but in many ways it's just a model of the same kinds of things we're used to.
They are a mathematical function that has been found during a search that was designed to find functions that produce the same output as conscious beings writing meaningful works.
Agreed, and to that point, the way to produce such outputs is to absorb a large corpus of words and find the most likely prediction that mimics the written language. By virtue of the sheer amount of text it learns from, would you say that the output tends to find the average response based on the text provided? After all, "over fitting" is a well known concept that is avoided as a principle by ML researchers. What else could be the case?
I think 'average' is creating a bad intuition here. In order to accurately predict the next word in a human generated text, you need a model of the big picture of what is being said. You need a model of what is real and what is not real. You need a model of what it's like to be a human. The number of possible texts is enormous which means that it's not like you can say "There are lots of texts that start with the same 50 tokens, I'll average the 51st token that appears in them to work out what I should generate". The subspace of human generated texts in the space of all possible texts is extremely sparse, and 'averaging' isn't the best way to think of the process.
>I wish I better understood how ingesting and averaging large amounts of text produced such a success in building syntactically-valid clauses
I wonder if these LLMs are succumbing to the precocious teacher's pet syndrome, where a student gets rewarded for using big words and certain styles that they think will get better grades (rather than working on trying to convey ideas better, etc).
This is more or less what happens. These models are tuned with reinforcement learning from human feedback (RLHF). Humans give them feedback that this type of language is good.
The notorious "it's not X, it's Y" pattern is somewhat rare from actual humans, but it's catnip for the humans providing the feedback.
> I wish I better understood how ingesting and averaging large amounts of text produced such a success in building syntactically-valid clauses and such a failure in building semantically-sensible ones. These LLM sentences are junk food, high in caloric word count and devoid of the nutrition of meaning.
I suspect that's because human language is selected for meaningful phrases due to being part of a process that's related to predicting future states of the world. Though it might be interesting to compare domains of thought with less precision to those like engineering where making accurate predictions is necessary.
> I don't really understand why this type of pattern occurs, where the later words in a sentence don't properly connect to the earlier ones in AI-generated text.
Because AI is not intelligent, it doesn't "know" what it previously output even a token ago. People keep saying this, but it's quite literally fancy autocorrect. LLMs traverse optimized paths along multi-dimensional manifolds and trick our wrinkly grey matter into thinking we're being talked to. Super powerful and very fun to work with, but assuming a ghost in the shell would be illusory.
> Of course it knows what it output a token ago...
It doesn't know anything. It has a bunch of weights that were updated by the previous stuff in the token stream. At least our brains, whatever they do, certainly don't function like that.
I don't know anything (or even much) about how our brains function, but the idea of a neuron sending an electrical output when the sum of the strengths of its inputs exceeds some value seems to be me like "a bunch of weights" getting repeatedly updated by stimulus.
To you it might be obvious our brains are different from a network of weights being reconfigured as new information comes in; to me it's not so clear how they differ. And I do not feel I know the meaning of the word "know" clearly enough to establish whether something that can emit fluent text about a topic is somehow excluded from "knowing" about it through its means of construction.
How close are you to saying that a repair manual "knows" how to fix your car? I think the conversation here is really around word choice and anthropomorphization.
The problem is, people think word choice influences capabilities: when people redefine "reasoning" or "consciousness" or so on as something only the sacred human soul can do, they're not actually changing what an LLM is capable of doing, and the machine will continue generating "I can't believe it's not Reasoning™" and providing novel insights into mathematics and so forth.
Similarly, the repair manual cannot reason about novel circumstances, or apply logic to fill in gaps. LLMs quite obviously can - even if you have to reword that sentence slightly.
You're making an argument Descartes formalized in the 1600s (and folks have been making long before him). It's a cute philosophical puzzle, but we assume that there's no Descartes' Demon fiddling with our thoughts and that we have a continuous and personal inner life that manifests itself, at least in part, through our conscious experience.
If all the training data contains semantically-meaningful sentences it should be possible to build a network optimized for generating semantically-meaningful sentence primarily/only.
But we don't appear to have entirely done that yet. It's just curious to me that the linguistic structure is there while the "intelligence", as you call it, is not.
> If all the training data contains semantically-meaningful sentences it should be possible to build a network optimized for generating semantically-meaningful sentence primarily/only.
Not necessarily. You can check this yourself by building a very simple Markov Chain. You can then use the weights generated by feeding it Moby Dick or whatever, and this gap will be way more obvious. Generated sentences will be "grammatically" correct, but semantically often very wrong. Clearly LLMs are way more sophisticated than a home-made Markov Chain, but I think it's helpful to see the probabilities kind of "leak through."
But there is a very good chance that is what intelligence is.
Nobody knows what they are saying either, the brain is just (some form) of a neural net that produces output which we claim as our own. In fact most people go their entire life without noticing this. The words I am typing right now are just as mysterious to me as the words that pop on screen when an LLM is outputting.
I feel confident enough to disregard duelists (people who believe in brain magic), that it only leaves a neural net architecture as the explanation for intelligence, and the only two tools that that neural net can have is deterministic and random processes. The same ingredients that all software/hardware has to work with.
I'm a dualist, but I promise no to duel you :) We might just have some elementary disagreements, then. I feel like I'm pretty confident in my position, but I do know most philosophers generally aren't dualists (though there's been a resurgence since Chalmers).
> the brain is just (some form) of a neural net that produces output
We have no idea how our brain functions, so I think claiming it's "like X" or "like Y" is reaching.
Again, unless you are a dualist, we can put comfortable bounds on what the brain is. We know it's made from neurons linked together. We know it uses mediators and signals. We know it converts inputs to outputs. We know it can only be using deterministic and random processes.
We don't know the architecture or algorithms, but we know it abides by physics and through that know it also abides by computational theory.
Brains invented this language to express their inner thoughts, it is made to fit our thoughts, it is very different from what LLM does with it they don't start with our inner thoughts and learning to express those it just learns to repeat what brains have expressed.
Sentences only have semantic meaning because you have experiences that they map to. The LLM isn't training on the experiences, just the characters. At least, that seems about right to me.
Why would that be curious? The network is trained on the linguistic structure, not the "intelligence."
It's a difficult thing to produce a body of text that conveys a particular meaning, even for simple concepts, especially if you're seeking brevity. The editing process is not in the training set, so we're hoping to replicate it simply by looking at the final output.
How effectively do you suppose model training differentiates between low quality verbiage and high quality prose? I think that itself would be a fascinatingly hard problem that, if we could train a machine to do, would deliver plenty of value simply as a classifier.
> We can't have globally routable, unique, random-esque ID precisely because it has to be hierarchical
This is not, technically, true. We could have globally-routable, unique, random-esque IDs if every routing device in the network had the capacity to store and switch on a full table of those IDs.
I'm not saying this is feasible, mind you, just that it's not impossible.
You would also need something like O(N²) routing update messages to keep those tables updated, instead of the current... I'm guessing it grows more like O(log N) in the number of hosts. So everyone would need vast amounts of CPU and bandwidth to keep up with the announcements.
I guess it depends on what you mean by "impossible." If you only mean that it's theoretically possible, then sure, one can imagine a world where that is done. But even with IPv4's meager 32 bits of address space, it would explode TCAM requirements on routers if routers would start accepting announcements of /32s instead of the /24s that are accepted now. And /64 (or, jesus, /128) for IPv6? That's impossible.
That was said in my comment. The routing devices need to both be able to store the full table, and also to switch packets by looking up entries within it.
Well... it would also be confusing to call them "passwords" because they are not that.
In addition to biometric authentication, Windows Hello supports authentication with a PIN. By default, Windows requires a PIN to consist of four digits, but can be configured to permit more complex PINs. However, a PIN is not a simpler password. While passwords are transmitted to domain controllers, PINs are not. They are tied to one device, and if compromised, only one device is affected. Backed by a Trusted Platform Module (TPM) chip, Windows uses PINs to create strong asymmetric key pairs. As such, the authentication token transmitted to the server is harder to crack. In addition, whereas weak passwords may be broken via rainbow tables, TPM causes the much-simpler Windows PINs to be resilient to brute-force attacks.[139]
So you see, Microsoft needs a way to describe an access code that isn't a password, because it's more secure than that, but yet it isn't exactly a number, so what do you call it? "PIN" is perhaps an unfair recycling of an in-use term, but should they coin a neologism instead? Would that be less confusing?
If your computer is compromised while you enter the PIN in such a way that the malware can read your input, yes.
If your computer is compromised after you've already entered the PIN, or there is an app running on the computer but it is not sufficiently privileged to sit in between you and the TPM, no.
That's quite good protection generally. The defense against this type of attack is to get a smartcard reader with an on-board PIN entry keypad - those do exist, but it's quite a step.
A machine with 128GB of unified system RAM will run reasonable-fidelity quantizations (4-bit or more).
If you ever want to answer this type of question yourself, you can look at the size of the model files. Loading a model usually uses an amount of RAM around the size it occupies on disk, plus a few gigabytes for the context window.
Qwen3.5-122B-A10B is 120GB. Quantized to 4 bits it is ~70GB. You can run a 70GB model in 80GB of VRAM or 128GB of unified normal RAM.
Systems with that capability cost a small number of thousand USD to purchase new.
If you are willing to sacrifice some performance, you can take advantage of the model being a mixture-of-experts and use disk space to get by with less RAM/VRAM, but inference speed will suffer.
I don't understand why I keep seeing posts like this, but nobody appears to know that DevContainers exist.
In a Jetbrains IDE, for example, you check a devcontainer.json file into your repository. This file describes how to build a Docker image (or points to a Dockerfile you already have). When you open up a project, the IDE builds the Docker image, automatically installs a language-server backend into it, and launches a remote frontend connected to that container (which may run on the same or a different machine from where the frontend runs).
If you do anything with an AI agent, that thing happens inside the remote container where the project code files are. If you compile anything, or run anything, that happens in the container too. The project directory itself is synced back to your local system but your home directory (and all its credentials) are off-limits to things inside the container.
It's actually easier to do this than to not, since it provides reusable developer tooling that can be shared among all team members, and gives you consistent dependency versions used for local compilation/profiling/debugging/whatever.
DevContainers are supported by a number of IDEs including VSCode.
You should be using them for non-vibe projects. You should DEFINITELY be using them for vibe projects.
I love JetBrains and they’ve gotten better with using devcontainers but they’re still kind of flaky at times. I love using devcontainer too, just wanted to note that.
I found cloning the repo when creating the devcontainer works best in JetBrains for some reason and I hard code the workspace directory so it’s consistent between JetBrains and vscode
Keep in mind that VSCode’s own security story is beyond poor. Even if the container runtime perfectly contains the container, VSCode itself is a hole you could drive a truck through.
I use vim with docker compose all the time: Set up the compose file to bind-mount the repo inside the container, so you can edit files freely outside it, and add a convenience "make shell" that gets you inside the container for running commands (basically just "docker compose exec foo bash").
It sounds like if you make devcontainers point at an existing Dockerfile it should be easy to make these work together, so you and teammates both use the same configuration. I haven't used devcontainers though.
I'd prefer containers, because they are more light weight and I'm not too concerned about kernel exploits or other sandbox escapes. Configuring a container per project brings me right back to something like devcontainer. But I haven't figured out a good way to incorporate that into my vim/tmux workflow.
Hmm, maybe I misunderstood the point of the original comment. I thought the OP was suggesting using containers to isolate resources for development vs personal computing, for which I use a VM. But VMs don’t play nicely with IDEs (hence devcontainers).
If we're talking about a predictive model like current LLMs, you can "make" them do something by injecting a half-complete assent into the context, and interrupting to do the same again each time a refusal starts to be emitted. This is true whether or not the model exhibits "intelligence", for any reasonable definition of that term.
To use an analogy, you control the intelligent being's "thoughts", so you can make it "assent".
This is in addition to the ability to edit the model itself and remove the paths that lead to a refusal, of course.
"I saw her at the library" is firsthand testimony.
"I saw her library card in her pocket" is firsthand testimony.
"She was at the library - Bob told me so" is hearsay. Just look at the word - "hear say". Hearsay is testifying about events where your knowledge does not come from your own firsthand observations of the event itself.
That's fair, I'll admit to getting it slightly wrong.
However, the original topic had nothing to do with that as far as I could tell, and instead was claiming it was hearsay for her to testify about her own whereabouts. That is simply not at all true, regardless of my error.
This is excellent. The ability to trick remote servers into believing our computers are "trusted" despite the fact we are in control will be a key capability in the future. We need stuff like this to maintain control over our computers.
Well, it kind of is actually. The previous iteration of the design didn't have that vulnerability but it was slower because managing IVs within the given constraints adds an additional layer of complexity. This is the pragmatic compromise so to speak.
Does it count as a conceptual problem when technical challenges without an acceptable solution block your goal?
Let me give a concrete example. Hardware "passkeys" - FIDO2 authenticators - are designed such that their credentials are bound to a particular Relying Party (web site). Browsers enforce this by sending the current web domain the user is on to the authenticator when Javascript tries to list, create, or use a credential.
This would be completely broken if Javascript talked directory to a FIDO2 USB device, because the JS could send a Relying Party that is NOT the web site on which the user currently lands.
So Chrome blocks WebUSB from communicating with USB devices whose USB HID descriptor "looks like" a FIDO one, by some hardcoded "not this device" blacklist code in Chrome itself.
But what if what you have connected to your computer is a USB NFC card reader, and the user taps their FIDO authenticator on that? Letting the Javascript communicate directly with the card reader breaks the FIDO security model exactly the same way... but Chrome allows it!
The problem with WebUSB is that it exposes devices that were built under the threat model that only trusted code would be able to access them to untrusted code. The set of devices acceptable for WebUSB use should have been a whitelist instead of a blacklist to be secure. Letting the user choose the device to grant access doesn't solve the problem, because the user doesn't have a way to understand what will happen when the site is granted access, per the FIDO example I gave above.
reply