Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



I never understood the point of the pellican on a bicycle exercise: LLMs coding agent doesnt have any way to see the output. It means the only thing this test is testing, is the ability of the LLMs to memorise.

Edit: just to show my point, a regular human on a bicycle is way worse with the same model: https://i.imgur.com/flxSJI9.png


Because it excercises thinking about a pelican riding a bike (not common) and then describing that using SVG. It's quite nice imho and seems to scale with the power of the LLM model. Sure Simon has some actual reasons though.


> Because it excercises thinking about a pelican riding a bike (not common)

It is extremely common, since it's used on every single LLM to bench it.

And there is nothing logic, LLMs are never trained for graphics tasks, they dont see the output of a code.


I mean the real world examples of a pelican riding a bike is not common. It's common in benchmarking LLM's but that's not what I meant.


The only thing it exercises is the ability of the model to recall its pelican-on-bicycle and other SVG training data.


It's more for fun than as a benchmark.


It also measure something llms are good probably due to cheating.


I wouldn't say any LLMs are good at it. But it doesn't really matter, it's not a serious thing. It's the equivalent of "hello world" - or whatever your personal "hello world" is - whenever you get your hands on a new language.


Memorise what exactly?


Coordinate and shape of the element used to form a pellican. If you think about how LLMs ingest their data, they have no way to know how to form a pellican in SVG.

I bet their ability to form a pellican result purely because someone already did it before.


> If you think about how LLMs ingest their data, they have no way to know how to form a pellican in SVG.

It's called generalization and yes, they do. I bet you could find plenty of examples of it working on something that truly isn't "present in the training data".

It's funny, you're so convinced that it's not possible without direct memorization but forgot to account for emergent behaviors (which are frankly all over the place in LLM's - where you been)?

At any rate, the pelican thing from simonw is clearly just for fun at this point.


pelican on a bicycle benchmark probably getting saturated... especially as it's become a popular way to demonstrate model ability quickly


But where is the training set of good pelicans on bikes coming from? You think they have people jigging them up internally?


Assuming they updated the crawled training data, just having a bunch of examples of specifically pelicans on bicycles from other models is likely to make a difference.


But then how does the quality increase? Normally we hear that when models are trained on the output of other models the style becomes very muted and various other issues start to appear. But this probably the best pelicans on a bicycle I've ever seen, by quite some margin.


Just compare it with a human on a bicycle, you would see that LLMs are weirdly good at drawing pelicans in SVG but not humans.


I thought a human would be a considerable step up in complexity but I asked it first for a pelican[0] and then for a rat [1] to get out of the bird world and it did a great job on both.

But just fot thrills I also asked for a "punk rocker"[2] and the result--while not perfect--is leaps and bounds above anything from the last generation.

0 -- ok, here's the first hurdle! It's giving me "something went wrong" when I try to get a share link on any of my artifacts. So for now it'll have to be a "trust me bro" and I'll try to edit this comment soon.


... but can it create an svg renderer for claude's site.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: