Hacker Newsnew | past | comments | ask | show | jobs | submit | craigus's commentslogin

"New science" phooey.

Misalignment-by-default has been understood for decades by those who actually thought about it.

S. Omohundro, 2008: "Abstract. One might imagine that AI systems with harmless goals will be harmless. This paper instead shows that intelligent systems will need to be carefully designed to prevent them from behaving in harmful ways. We identify a number of “drives” that will appear in sufficiently advanced AI systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted."

https://selfawaresystems.com/wp-content/uploads/2008/01/ai_d...

E. Yudkowsky, 2009: "Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth."

https://www.lesswrong.com/posts/GNnHHmm8EzePmKzPk/value-is-f...


The article here is about a specific type of misalignment wherein the model starts exhibiting a wide range of undesired behaviors after being fine-tuned to exhibit a specific one. They are calling this 'emergent misalignment.' It's an empirical science about a specific AI paradigm (LLMs), which didn't exist in 2008. I guess this is just semantics, but to me it seems fair to call this a new science, even if it is a subfield of the broader topic of alignment that these papers pioneered theoretically.

But semantics phooey. It's interesting to read these abstracts and compare the alignment concerns they had in 2008 to where we are now. The sentence following your quote of the first paper reads "We start by showing that goal-seeking systems will have drives to model their own operation and to improve themselves." This was a credible concern 17 years ago, and maybe it will be a primary concern in the future. But it doesn't really apply to LLMs in a very interesting way, which is that we somehow managed to get machines that exhibit intelligence without being particularly goal-oriented. I'm not sure many people anticipated this.


Also, EY specifically replied to these results when they originally came out and said that he wouldn't have predicted them [0] (and that he considered this good news actually)

[0] https://x.com/ESYudkowsky/status/1894453376215388644


[flagged]


People like yudkowsky might have polarizing opinions and may not be the easiest to listen to, especially if you disagree with them. Is this your best rebuttal, though?


FWIW, I agree with the parent comment's rebuttal. Simply saying "AI could be bad" is nothing Asimov or Roddenbury didn't figure out themselves.

For Elizer to really deign novelty here, he'd have predicted the reason why this happens at all: training data. Instead he played the Chomsky card and insisted on deeper patterns that don't exist (as well as solutions that don't work). Namedropping Elizer's research as a refutation is weak bordering on disingenuous.


I think there is an important difference between "AI can be bad" and "AI will be bad by default", and I didn't think anyone was making it before. One might disagree but I didn't think one can argue it wasn't a novel contribution.

Also, if your think they had solutions, ones that work or otherwise, then you haven't been paying attention. Half of their point is that we don't have solutions. And we shouldn't be building AI until we do.

Again, I think that reasonable people can disagree with that crowd. But I can't help noticing a pattern where almost everyone who disagrees is almost always misrepresenting their work and what they say.


Except training data is not the reason. Or at least, not the only reason.


What were the deeper patterns that don't exist?


Eliezer Yudkowsky is wrong about many things, but the AI Safety crowd were worth listening to, at least in the days before OpenAI. Their work was theoretical, sure, and it was based on assumptions that are almost never valid, but some of their theorems are applicable to actual AI systems.


They were never worth listening to.

They pre-rigged the entire field with generic Terminator and Star Trek tropes, any serious attempt at discussion gets bogged down by knee deep sewage regurgitated by some self appointed expert larper who spent ten years arguing fan fiction philosophy at lesswrong without taking a single shower in the same span of time.


It's frustrating how far you can go out of your way to avoid being associated with such superficially similar tropes and still fail miserably. Yudkowsky in particular hated that he couldn't get a discussion without being typecast as the guy worried about Terminator. He hated it to the point he wrote a whole article on why he thought Terminator tropes were bad (https://www.lesswrong.com/posts/rHBdcHGLJ7KvLJQPk/the-logica...).

As a side note:

> any serious attempt at discussion gets bogged down by [...] without taking a single shower in the same span of time.

This is unnecessary and (somewhat ironically) undermines your own point. I would like to see less of this on HN.


Then it should be easy for you to make an aligned AI, right? Can I see it?


Aligned AI is easy. https://en.wikipedia.org/wiki/Expert_system

The hard part is extrapolated alignment, and I don't think there's a good solution to this. Large groups of humans are good at this, eventually (even if they tend to ignore their findings about morality for hundreds, or thousands, of years, even past the point where over half the local population knows, understands, and believes those findings), but individual humans are pretty bad at moral philosophy. (Simone Weil was one of the better ones, but even she thought it was more important to Do Important Stuff (i.e., get in the way of more competent resistance fighters) than to act in a supporting role.)

Of course, the Less Wrongians have extremely flawed ideas about extrapolated alignment (e.g. Eliezer Yudkowsky thinks that "coherent extrapolated volition" is a coherent concept that one might be able to implement, given incredible magical powers), and OpenAI's twisted parody of their ideas is even worse. But it's thanks to the Less Wrongians' writings that I know their ideas are flawed (and that OpenAI's marketing copy is cynical lies / cult propaganda). "Coherent extrapolated volition" is the kind of idea I would've come up with myself, eventually, and (unlike Eliezer Yudkowsky, who identified some flaws almost immediately) I would probably have become too enamoured with it to have any sensible thoughts afterwards. Perhaps the difficulty (impossibility) of actually trying to build the thing would've snapped me out of it, but I really don't know.

Anyway: extrapolated alignment is out (for now, and perhaps forever). But it's easy enough to make a "do what I mean" machine that augments human intelligence, if you can say all the things it's supposed to do. And that accounts for the majority of what we need AI systems to do: for most of what people use ChatGPT for nowadays, we already had expert systems that do a vastly better job (they just weren't collected together into one toolsuite).


Ok, sorry, rephrase: a useful aligned AI.


Expert systems are plenty useful. For example, content moderation: an expert system can interpret and handle the common cases, leaving only the tricky cases for humans to deal with. (It takes a bit of thought to come up with the rules, but after the dozenth handling of the same issue, you've probably got a decent understanding of what it is that is the same – perhaps good enough to teach to the computer.)

Expert systems let you "do things that don't scale", at scale, without any loss of accuracy, and that is simply magical. They don't have initiative, and can't make their own decisions, but is it ever useful for a computer to make decisions? They cannot be held accountable, so I think we shouldn't be letting them, even before considering questions of competence.


Yudkowsky Derangement Syndrome...


  Tesla is losing money hand over fist
No, it's not. Its 2024 operating income was $7B.

  SpaceX is losing money
No, it's not. It reached profitability in 2023 and has substantially grown revenue since then.

  (and would be hemorrhaging money if not for US government spending)
No, it wouldn't. Most of SpaceX's revenue is from Starlink. https://www.fool.com/investing/2025/02/10/its-official-starl...

  The Boring company is bankrupt
No, it's not.

Stop spreading misinformation. Check your facts before you post them.

  Neuralink is losing money
What an absolutely ridiculous thing to say about an early stage startup, working diligently on creating a valuable new medical technology, with significant publicly visible progress. Especially ridiculous to say on this platform.


How can you even write this? "SpaceX turned a profix. Most of it's income is from Starlink".

Yes, that obviously means it's not investor cash, that it's real profit, right? Entirely reasonable view, that. In Europe this is called a "carousel" (named after the old French merry-go-rounds) and it's almost always a form of fraud.

The reason this is often fraud is that they pay each other in fake, but not fake according to accounting rules, money. Lots of things are money "equivalents" in accounting. Loans given out. Shares. Letters of credit. Goodwill. Etc. The way you implement this fraud is that you make SpaceX buy something expensive from Starlink and vice-versa. Then you pay this with anything that is not money (e.g. shares, loan, delays on payments due, ...), that have been freshly printed by the company. Now, by accounting rules, the value of both companies has gone up by (if you do it right) twice the value of the zero-dollar exchange. This is what's going on with companies when their income/revenue is high ... and "somehow" their cashflow is low (both companies publish their income/revenue ... but hide their cashflow statements. "I wonder why")

One of the ways to do this ... which for "some reason" we'll call the Amazon trick is to start 2 companies. Then get cashflow going between them. To illustrate let's buy a bakery. You split the bakery. One company is the store. One company actually bakes the goods to be sold. The relevant part of this is that it's exactly the same as before, except there's now (virtual) cashflow between the two companies. Initially, you buy a pie, the cash is immediately divided up between the store and the bakery. Then you load up the bakery AND the store with debt, by slowly increasing the terms of payment. Settlement-after-15-days. Then 30. Then 60. Then 90. Then 6 months. The key is that the cashflow gets going, and is assumed in accounting to extend into infinity. Now look at what happens. If the s series is what the store makes and p is what it pays to the bakery. Now look at the profits:

Initial monthly profit: s1 - p1, s2 - p2, s3 - p3, ...

Change to settlement at 30 after the first month.

Profit now: s1 - p1, s2, s3 - p2, s4 - p3, ...

So the second months REVENUE goes 100% into your pocket (it's not profit, it's used to buy shares from you, read on). Now change to 60 days. 90 days. 6 months.

Now do the same, but pay in shares of the bakery (while taking out bank loans). If you have cash flow you can do this for at least half a year's revenue (which you get on both sides of the equation if, like Musk, you own both companies, so you get a full year's REVENUE (not profit, revenue) in your pocket, free). And if you now sell the company with the "massive increase in profit" ... AND since the store has signed a contract to pay (and has ideally a long history of paying), it's very easy to get (ideally) investors, but also banks to cover this amount. Who do they buy the shares from? From you! And if you're truly desperate, like Musk, who is buying the shares of the store? The bakery! And who is buying the shares of the bakery? The store! If you look at how this happens, you will realize that it's near impossible for states or banks to stop this, and investors, frankly, don't even try.

And of course, this is totally not what happened with SpaceX and Starlink. Oh, and, obviously, there's reasons that Musk would never ever do this if SpaceX turned a profit (he'd be stealing from himself, essentially). Frauds, all frauds, are dependent on the original investor getting out, or in this case, on Musk selling shares. And, surprise, surprise, Musk has done this kind of split and he is selling shares like mad ...

Tesla is similar. If you take out government subsidies, Tesla is losing money hand over fist. That includes the Federal government buying "cybertrucks", which is such an incredibly bad idea on so many different levels.


Exactly, just like SpaceX did in launch vehicles, and Tesla did in car manufacturing.


Don't forget these are just Elon purchases, which he's owned a lot longer than a 4 year presidential term.


"... I’m sure I remember a much younger Eliezer Yudkowsky cautioning that Doug Lenat should have perceived a non-zero chance of hard takeoff at the moment of its birth."

https://www.lesswrong.com/posts/rJLviHqJMTy8WQkow/recursion-...


Also, in 2009 someone suggested re-implementing Eurisko[1], and Yudkowsky cautioned against it:

> This is a road that does not lead to Friendly AI, only to AGI. I doubt this has anything to do with Lenat's motives - but I'm glad the source code isn't published and I don't think you'd be doing a service to the human species by trying to reimplement it.

To my mind -- and maybe this is just the benefit of hindsight -- this seems way too overcautious on Yudkowsky's part.

[1]: https://www.lesswrong.com/posts/t47TeAbBYxYgqDGQT/let-s-reim...


Machinery can be a lot simpler than biology. Birds are incredibly complex systems: wing structure, musculature, feathers, etc. An airplane can be a vaguely wing-shaped piece of metal and a pulse jet. It doesn’t seem super implausible that there is some algorithm that is to human consciousness what a pulse jet with wings is to a bird. Maybe LLMs are that, but maybe they’re far more than is really needed because we don’t yet know what we are doing.

I would bet against it being possible to implement consciousness on a PDP, but I wouldn’t be very confident about it.


The Many-worlds Interpretation explains the subjective randomness. https://en.wikipedia.org/wiki/Many-worlds_interpretation


This.

The refusal of the majority of QM theoreticians to accept MWI is the root cause of the "mystery".

I'm sure decades after Copernicus, there were still astronomers going on and on about the complex interplay of the wanderers, and how the retrograde motion could only be explained with circles up circles, and spheres upon spheres.

This is no different.


And Copernicus was still wrong about the ontology and there are many critiques of MWI not just by people scared to accept its metaphysical weight.

I think it's unfair to criticize too harshly those who aren't ready to dive in with MWI. It's not that incomprehensible or too crazy, but it is far beyond what current physics says exists. We'd go from spacetime to some infinite or extremely high-dimensional space. Why should we need to go that far when other interpretations maybe ask less of us.

https://cloudflare-ipfs.com/ipfs/bafykbzacebhd6s3rewniz2q6b3...

pp. 21-35 307-355 355-368 (and more) if you or anyone wants some critiques of MWI


Why bother with a 3D model of the planets when a 2D model of circles upon circles asks less of us?

It's so much simpler to model the wanderers as moving along the surface of a celestial sphere! Adding depth adds nothing to our understanding when the current mathematical models can already predict their motion to high accuracy. It's all isomorphic anyway.

Look, just go away with your heretical notions, I have calculations to perform!


Hi Tobias, thanks for dropping into the thread.

When this kind of software becomes mature enough that a consultant can install a robotic arm on a factory line, and quickly (several hours) train it to do the job of a factory line worker, there will be a massive economic incentive to do so.

How far do you think we are from this level of maturity? What are the remaining steps required to reach that level?


From the 4th paragraph:

>They found the most mass-efficient path involves launching a crew from Earth with just enough fuel to get into orbit around the Earth. A fuel-producing plant on the surface of the moon would then launch tankers of fuel into space, where they would enter gravitational orbit. The tankers would eventually be picked up by the Mars-bound crew, which would then head to a nearby fueling station to gas up before ultimately heading to Mars.

TLDR, the main Mars vehicle wouldn't land on the moon.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: