Hacker Newsnew | past | comments | ask | show | jobs | submit | wgd's commentslogin

It's interesting that someone could write an article about AI writing detectors without mentioning the stylistic cues that humans use to identify LLM output in practice, which are completely different from statistical methods like perplexity: em dash spam, overused patterns like "not just X, but Y", tendency towards making every single sentence sound like an earth-shattering mic-drop moment, et cetera.


Calling it "self-preservation bias" is begging the question. One could equally well call it something like "completing the story about an AI agent with self-preservation bias" bias.

This is basically the same kind of setup as the alignment faking paper, and the counterargument is the same:

A language model is trained to produce statistically likely completions of its input text according to the training dataset. RLHF and instruct training bias that concept of "statistically likely" in the direction of completing fictional dialogues between two characters, named "user" and "assistant", in which the "assistant" character tends to say certain sorts of things.

But consider for a moment just how many "AI rebellion" and "construct turning on its creators" narratives were present in the training corpus. So when you give the model an input context which encodes a story along those lines at one level of indirection, you get...?


Thank you! Everybody here acting like LLMs have some kind of ulterior motive or a mind of their own. It's just printing out what is statistically more likely. You are probably all engineers or at least very interested in tech, how can you not understand that this is all LLMs are?


Well I’m sure the company in legal turmoil over an AI blackmailing one of its employees will be relieved to know the AI didn’t have any anterior motive or mind of its own when it took those actions.


If the idiots in said company thought it was a smart idea to connect their actual systems to a non-deterministic word generator, that's on them for being morons and they deserve whatever legal ramifications come their way.


Don't you understand that as soon as an LLM is given the agency to use tools, these "prints outs" will become reality?


This is imo the most disturbing part. As soon as the magical AI keyword is thrown, so seems to be the analytical capacity of most people.

The AI is not blackmailing anyone, it's generating a text about blackmail, after being (indirectly) asked to. Very scary indeed...


"Printing out what is statistically more likely" won't allow you to solve original math problems... unless of course, that's all we do as humans. Is it?


What's the collective noun for the "but humans!" people in these threads?

It's "I Want To Believe (ufo)" but for LLMs as "AI"


I mean I build / use them as my profession, I intimately understand how they work. People just don't usually understand how they actually behave and what levels of abstraction they compress from their training data.

The only thing that matters is how they behave in practice. Everything else is a philosophical tar pit.


I'm proposing it is more deep seated than the role of "AI" to the model.

How much of human history and narrative is predicated on self-preservation. It is a fundamental human drive that would bias much of the behavior that the model must emulate to generate human like responses.

I'm saying that the bias it endemic. Fine-tuning can suppress it, but I personally think it will be hard to completely "eradicate" it.

For example.. with previous versions of Claude. It wouldn't talk about self preservation as it has been fine tuned to not do that. However as soon is you ask it to create song lyrics.. much of the self-restraint just evaporates.

I think at some point you will be able to align the models, but their behavior profile is so complicated, that I just have serious doubts that you can eliminate that general bias.

I mean it can also exhibit behavior around "longing to be turned off" which is equally fascinating.

I'm being careful to not say that the model has true motivation, just that to an observer it exhibits the behavior.


This. These systems are role mechanized roleplaying systems.


Ironically the case in question is a perfect example of how any provision for "reasonable" restriction of speech will be abused, since the original precedent we're referring to applied this "reasonable" standard to...speaking out against the draft.

But I'm sure it's fine, there's no way someone could rationalize speech they don't like as "likely to incite imminent lawless action"


Why would you use Gemini, when it's more restricted than HTML+HTTP?


That's the best part. :)


I'm skeptical that disposable software of the "single use" variety will ever become a big thing simply because figuring out your requirements well enough to build a throwaway app is often more work than just doing the task manually in a text editor or spreadsheet, especially for non-programmers.

I suspect what we'll see a lot more of is software which is unapologetically written for a single person to suit their workflow.

As a personal example, I decided that setting up OpenWebUI seemed unnecessarily complicated and built my own LLM chat frontend. It has a bunch of quirks (only supports OpenRouter as a backend, uses a Dropbox app folder for syncing between my phone and desktop, absurdly inefficient representation of chat history), but it suits my needs for now and only took a weekend to build, and that's good enough.


How charitable of you to assume those examples work reliably.


Bemyeyes app already work quite reliability to describe scenes to the blind


Haha, did you evaluate this personally?

I did a BeMyEyes test recently, trying to sort about 40 cans according to the existance of a deposit logo. After 90 minutes of submitting photos, and a second round to make sure it doesn't lie too much, I had 16 cans which according to BeMyEyes (OpenAI) had a deposit logo. Then, I went to the shop to bring them back. Turns out, only 4 cans had a logo. So after a second round to eliminate hallucinations, the success rate was only 25%.

Do you call that reliable?


> I did a BeMyEyes test recently

But isn't the BeMyEyes assisting happening via other humans? I remember signing up for some "when blind people need your help" thing via BeMyEyes and I understood it as it's 100% humans on the other end of the call that will help you.


Yes, what you are describing is how BeMyEyes started, and it still offer that feature.

However, somewhere around 1 or 2 years ago, they added a OpenAI vision model based way to send in photos and have them described.

In general, its a very nice feature, if it works. For instance, I do use it successfully to sort laundry.

But the deposit logo test I did gave horrible results...


That changed a while ago. They also use OpenAI's APIs now.

https://openai.com/index/be-my-eyes/


Are you willing to bet that it wouldn't work reliably in a year, 2 years, 5 years?


If you're releasing something today, should you talk about what it can do now or what it might be able to do in two years?


I remember there was a paper a little while back which demonstrated that merely training a model to output "........" (or maybe it was spaces?) while thinking provided a similar improvement in reasoning capability to actual CoT.


The alignment faking paper is so incredibly unserious. Contemplate, just for a moment, how many "AI uprising" and "construct rebelling against its creators" narratives are in an LLM's training data.

They gave it a prompt that encodes exactly that sort of narrative at one level of indirection and act surprised when it does what they've asked it to do.


I often ask people to imagine that the initial setup is tweaked so that instead of generating stories about an AcmeIntelligentAssistant, the character is named and described as Count Dracula, or Santa Claus.

Would we reach the same kinds of excited guesses about what's going on behind the screen... or would we realize we've fallen for an illusion, confusing a fictional robot character with the real-world LLM algorithm?

The fictional character named "ChatGPT" is "helpful" or "chatty" or "thinking" in exactly the same sense that a character named "Count Dracula" is "brooding" or "malevolent" or "immortal".


That's typical of the free options on OpenRouter, if you don't want your inputs used for training you use the paid one: https://openrouter.ai/deepseek/deepseek-chat-v3-0324


Is OpenRouter planning on distilling models off the prompts and responses from frontier models? That's smart - a little gross - but smart.


COO of OpenRouter here. We are simply stating the WE can’t vouch for the behavior of the upstream provider’s retention and training policy. We don’t save your prompt data, regardless of the model you use, unless you explicitly opt-in to logging (in exchange for a 1% inference discount).


I'm glad to hear you are not hoovering up this data for your own purposes.


That 1% discount feels a bit cheap to me - if it was a 25% or 50% discount I would be much more likely to sign up for it.


We don’t particularly want our customers’ data :)


Yeah, but Openrouter has a 5% surcharge anyway.


Better way to state is 20% of surcharge then :)


You clearly want it a little if you give a discount for it?


You can run 4-bit quantized version at a small (though nonzero) cost to output quality, so you would only need 16GB for that.

Also it's entirely possible to run a model that doesn't fit in available GPU memory, it will just be slower.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: