More

nickandbro · 2026-01-06T20:51:41 1767732701

Love it

nickandbro · 2025-12-02T02:47:00 1764643620

For anyone that is interested

"create me a svg of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FhqYdli5

chronogram · 2025-12-02T11:02:16 1764673336

It created a whole webpage to showcase the SVG with animation for me: https://output.jsbin.com/qeyubehate

nickandbro · 2025-11-25T02:59:33 1764039573

I use the following models like so nowadays:

Gemini is great, when you have gitingested the code of pypi package and want to use it as context. This comes in handy for tasks and repos outside the model's training data.

5.1 Codex I use for a narrowly defined task where I can just fire and forget it. For example, codex will troubleshoot why a websocket is not working, by running its own curl requests within cursor or exec'ing into the docker container to debug at a level that would take me much longer.

Claude 4.5 Opus is a model that I feels trustworthy for heavy refactors of code bases or modularizing sections of code to become more manageable. Often it seems like the model doesn't leave any details out and the functionality is not lost or degraded.

nickandbro · 2025-11-25T00:09:29 1764029369

"Create me a SVG of a PS4 controller"

Gemini 3.0 Pro: https://www.svgviewer.dev/s/CxLSTx2X

Opus 4.5: https://www.svgviewer.dev/s/dOSPSHC5

I think Opus 4.5 did a bit better overall, but I do think eventually frontier models will eventually converge to a point where the quality will be so good it will be hard to tell the winner.

esperent · 2025-11-25T01:14:00 1764033240

I can only see the svg code there on mobile. I don't see any way to view the output.

Thev00d00 · 2025-11-25T10:01:30 1764064890

Click the export tab

nickandbro · 2025-11-18T15:40:38 1763480438

What we have all been waiting for:

"Create me a SVG of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FfhmhTK1

Thev00d00 · 2025-11-18T15:41:56 1763480516

That is pretty impressive.

So impressive it makes you wonder if someone has noticed it being used a benchmark prompt.

burkaman · 2025-11-18T15:48:49 1763480929

Simon says if he gets a suspiciously good result he'll just try a bunch of other absurd animal/vehicle combinations to see if they trained a special case: https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

ddalex · 2025-11-18T16:06:39 1763481999

https://www.svgviewer.dev/s/TVk9pqGE giraffe in a ferrari

jmmcd · 2025-11-18T16:04:21 1763481861

"Pelican on bicycle" is one special case, but the problem (and the interesting point) is that with LLMs, they are always generalising. If a lab focussed specially on pelicans on bicycles, they would as a by-product improve performance on, say, tigers on rollercoasters. This is new and counter-intuitive to most ML/AI people.

BoorishBears · 2025-11-18T19:18:47 1763493527

The gold standard for cheating on a benchmark is SFT and ignoring memorization. That's why the standard for quickly testing for benchmark contamination has always been to switch out specifics of the task.

Like replacing named concepts with nonsense words in reasoning benchmarks.

jmmcd · 2025-11-19T09:06:49 1763543209

Yes. But "the gold standard" just means "the most natural, easy and dumb way".

rixed · 2025-11-18T16:28:11 1763483291

I have tried combinations of hard to draw vehicle and animals (crocodile, frog, pterodactly, riding a hand glider, tricycle, skydiving), and it did a rather good job in every cases (compared to previous tests). Whatever they have done to improve on that point, they did it in a way that generalise.

bitshiftfaced · 2025-11-18T15:49:35 1763480975

It hadn't occurred to me until now that the pelican could overcome the short legs issue by not sitting on the seat and instead put its legs inside the frame of the bike. That's probably closer to how a real pelican would ride a bike, even if it wasn't deliberate.

xnx · 2025-11-18T15:53:13 1763481193

Very aero

nickandbro · 2025-10-14T21:17:36 1760476656

You could! But just like others have mentioned, the performance would be negligible. If you really wanted to see more of a performance boost by pretraining you could try to create a bigger chunk of data to train off of. This would be done by either creating synthetic data off of your material, or finding adjacent information to your material. Here's a good paper about it: <https://arxiv.org/abs/2409.07431>

nickandbro · 2025-09-09T23:13:27 1757459607

She did man

nickandbro · 2025-08-27T18:21:22 1756318882

Very cool site! Working on my own similar project:

https://vimgolf.ai

To learn new vim motions. Have since gotten distracted by life, but need to actually finish it.

_diyar · 2025-08-27T18:26:25 1756319185

Neat idea! Nit: maybe offer the first few exercises without requiring a login, that way I can get a feel for it before deciding to sign up.

mac-attack · 2025-08-27T19:35:11 1756323311

The required login/sign-up, privacy policy and lack of apparent open-sourcing seems antithetical for the average Linux user. You're going after a niche of a niche of a niche with this one, good luck lol.

rs186 · 2025-08-27T20:21:13 1756326073

I'd argue that the average Linux user likely knows how to use vim for the most basic editing but isn't necessarily motivated to learn vim. Intermediate users will be able to name a few modes in vim and navigate somewhat efficiently, that's about it. Only advanced users and those who really want to master vim (in other words, hardcore nerds) will try to make the most out of vim and use as few strokes as possible to navigate/edit, which is what these tools/sites are for. That's a few "niches" there.

opan · 2025-08-28T00:36:27 1756341387

I think once you start trying to use the occasional macro and/or make custom keybinds it pushes you further into the vim golf mindset. When you're saving an action to be repeated 100 times you really gotta get it right. I learned a lot of advanced movements due to macros as well. Like } and ) and marks (only just recently learned apostrophe jumps to marked line while backtick jumps to marked character on the line after years of always using apostrophe). I recently spent a half hour or so making two keybinds to insert the date/time in my preferred format at the end or start of a line + return the cursor to where it was before. While about half of the process was the same for both binds, I ran into multiple issues with the start of line version. Like, `I` for insert at start of line in neovim places your cursor after whitespace instead of before it, so instead had to use 0 and then insert stuff relatively. Also found out marked characters are based on the numbers of characters into the line, so if you add new stuff to the start of the line and then return to your mark, you won't be on the same word. 14 characters in before, 14 characters in now. I worked around that by counting how many I was inserting with my date text + spaces and such, then adding that # and l (move right) to the end of the keybind to make up for the difference. It was pretty satisfying when it finally worked.

nickandbro · 2025-08-28T02:15:18 1756347318

You are truly a vim master. Yes, that's exactly the reason why I used containers to host the vim instances, as using a DOM based vim library wouldn't record each stroke accurately. Thank you for trying my site out.

nickandbro · 2025-08-27T19:45:47 1756323947

Good feedback

fuzzythinker · 2025-08-28T05:17:18 1756358238

Will you be removing the sign-up/in then?

nickandbro · 2025-08-24T15:40:42 1756050042

Wonder what this means for the pelican riding on a bicycle test? Or will it just be good at strictly reasoning type problems.

nickandbro · 2025-08-04T17:36:49 1754329009

The fact that it doesn’t change the images like 4o image gen is incredible. Often when I try to tweak someone’s clothing using 4o, it also tweaks their face. This only seems to apply those recognizable AI artifacts to only the elements needing to be edited.

vunderba · 2025-08-04T20:06:25 1754337985

That's why Flux Kontext was such a huge deal - it gave you the power of img2img inpainting without needing to manually mask the content.

https://mordenstar.com/blog/edits-with-kontext

diggan · 2025-08-05T09:48:16 1754387296

Seems strange to not include the prompts themselves, if people are curious in trying to replicate it themselves.

vunderba · 2025-08-05T14:40:34 1754404834

Well.... that's a good idea - I'll see if I can dig them up!

herval · 2025-08-04T19:42:38 1754336558

You can select the area you want edited on 4o, and it’ll keep the rest unchanged

barefootford · 2025-08-04T20:28:28 1754339308

gpt doesn't respect masks

icelancer · 2025-08-04T20:48:00 1754340480

Correct. Have tried this without much success despite OpenAI's claims.