Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> 3) Solving AI Alignment is an actual problem and not just dumb extrapolation from science fiction

As far as I am aware there is still no actionable science behind mathematical analysis of AI models. You cannot take a bunch of weights and tell how it will behave. So we "test" models by deploying and HOPE there is nothing nefarious within.

It has been shown that models will "learn" to exfiltrate data between stages. You may call it dumb extrapolation, but it has been shown that it is a problem: a solution that we want is not necessarily the most optimal against the cost function that we give. The more inputs/weights model has, the harder it would be to spot problems in advance.



> You cannot take a bunch of weights and tell how it will behave.

We know that they only contain pure functions, so they don't "do" anything besides output numbers when you put numbers into them.

Testing a system that contains a model and does actions with it is a different story, but if you don't let the outputs influence the inputs it's still not going to do much.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: