That's a pretty pelican on a bicycle! https://jsbin.com/hiruvubona/edit?html,out...

Kuinox · 2025-09-29T17:16:16 1759166176

I never understood the point of the pellican on a bicycle exercise: LLMs coding agent doesnt have any way to see the output. It means the only thing this test is testing, is the ability of the LLMs to memorise.

Edit: just to show my point, a regular human on a bicycle is way worse with the same model: https://i.imgur.com/flxSJI9.png

_joel · 2025-09-29T17:18:44 1759166324

Because it excercises thinking about a pelican riding a bike (not common) and then describing that using SVG. It's quite nice imho and seems to scale with the power of the LLM model. Sure Simon has some actual reasons though.

Kuinox · 2025-09-29T17:23:50 1759166630

> Because it excercises thinking about a pelican riding a bike (not common)

It is extremely common, since it's used on every single LLM to bench it.

And there is nothing logic, LLMs are never trained for graphics tasks, they dont see the output of a code.

_joel · 2025-09-29T17:48:46 1759168126

I mean the real world examples of a pelican riding a bike is not common. It's common in benchmarking LLM's but that's not what I meant.

imiric · 2025-09-29T17:23:02 1759166582

The only thing it exercises is the ability of the model to recall its pelican-on-bicycle and other SVG training data.

furyofantares · 2025-09-29T17:41:19 1759167679

It's more for fun than as a benchmark.

Kuinox · 2025-09-29T17:43:44 1759167824

It also measure something llms are good probably due to cheating.

furyofantares · 2025-09-29T19:06:31 1759172791

I wouldn't say any LLMs are good at it. But it doesn't really matter, it's not a serious thing. It's the equivalent of "hello world" - or whatever your personal "hello world" is - whenever you get your hands on a new language.

mhh__ · 2025-09-29T17:18:26 1759166306

Memorise what exactly?

Kuinox · 2025-09-29T17:25:29 1759166729

Coordinate and shape of the element used to form a pellican. If you think about how LLMs ingest their data, they have no way to know how to form a pellican in SVG.

I bet their ability to form a pellican result purely because someone already did it before.

throwaway314155 · 2025-09-29T20:06:57 1759176417

> If you think about how LLMs ingest their data, they have no way to know how to form a pellican in SVG.

It's called generalization and yes, they do. I bet you could find plenty of examples of it working on something that truly isn't "present in the training data".

It's funny, you're so convinced that it's not possible without direct memorization but forgot to account for emergent behaviors (which are frankly all over the place in LLM's - where you been)?

At any rate, the pelican thing from simonw is clearly just for fun at this point.

greenfish6 · 2025-09-29T17:14:22 1759166062

pelican on a bicycle benchmark probably getting saturated... especially as it's become a popular way to demonstrate model ability quickly

AlecSchueler · 2025-09-29T17:22:59 1759166579

But where is the training set of good pelicans on bikes coming from? You think they have people jigging them up internally?

eli · 2025-09-29T17:25:50 1759166750

Assuming they updated the crawled training data, just having a bunch of examples of specifically pelicans on bicycles from other models is likely to make a difference.

AlecSchueler · 2025-09-29T17:33:31 1759167211

But then how does the quality increase? Normally we hear that when models are trained on the output of other models the style becomes very muted and various other issues start to appear. But this probably the best pelicans on a bicycle I've ever seen, by quite some margin.

Kuinox · 2025-09-29T17:40:12 1759167612

Just compare it with a human on a bicycle, you would see that LLMs are weirdly good at drawing pelicans in SVG but not humans.

AlecSchueler · 2025-09-29T17:58:52 1759168732

I thought a human would be a considerable step up in complexity but I asked it first for a pelican[0] and then for a rat [1] to get out of the bird world and it did a great job on both.

But just fot thrills I also asked for a "punk rocker"[2] and the result--while not perfect--is leaps and bounds above anything from the last generation.

0 -- ok, here's the first hurdle! It's giving me "something went wrong" when I try to get a share link on any of my artifacts. So for now it'll have to be a "trust me bro" and I'll try to edit this comment soon.

_joel · 2025-09-29T17:17:08 1759166228

... but can it create an svg renderer for claude's site.