Hacker Newsnew | past | comments | ask | show | jobs | submit | nickandbro's commentslogin

Love it

For anyone that is interested

"create me a svg of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FhqYdli5


It created a whole webpage to showcase the SVG with animation for me: https://output.jsbin.com/qeyubehate


I use the following models like so nowadays:

Gemini is great, when you have gitingested the code of pypi package and want to use it as context. This comes in handy for tasks and repos outside the model's training data.

5.1 Codex I use for a narrowly defined task where I can just fire and forget it. For example, codex will troubleshoot why a websocket is not working, by running its own curl requests within cursor or exec'ing into the docker container to debug at a level that would take me much longer.

Claude 4.5 Opus is a model that I feels trustworthy for heavy refactors of code bases or modularizing sections of code to become more manageable. Often it seems like the model doesn't leave any details out and the functionality is not lost or degraded.


"Create me a SVG of a PS4 controller"

Gemini 3.0 Pro: https://www.svgviewer.dev/s/CxLSTx2X

Opus 4.5: https://www.svgviewer.dev/s/dOSPSHC5

I think Opus 4.5 did a bit better overall, but I do think eventually frontier models will eventually converge to a point where the quality will be so good it will be hard to tell the winner.


I can only see the svg code there on mobile. I don't see any way to view the output.


Click the export tab


What we have all been waiting for:

"Create me a SVG of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FfhmhTK1


That is pretty impressive.

So impressive it makes you wonder if someone has noticed it being used a benchmark prompt.


Simon says if he gets a suspiciously good result he'll just try a bunch of other absurd animal/vehicle combinations to see if they trained a special case: https://simonwillison.net/2025/Nov/13/training-for-pelicans-...



"Pelican on bicycle" is one special case, but the problem (and the interesting point) is that with LLMs, they are always generalising. If a lab focussed specially on pelicans on bicycles, they would as a by-product improve performance on, say, tigers on rollercoasters. This is new and counter-intuitive to most ML/AI people.


The gold standard for cheating on a benchmark is SFT and ignoring memorization. That's why the standard for quickly testing for benchmark contamination has always been to switch out specifics of the task.

Like replacing named concepts with nonsense words in reasoning benchmarks.


Yes. But "the gold standard" just means "the most natural, easy and dumb way".


I have tried combinations of hard to draw vehicle and animals (crocodile, frog, pterodactly, riding a hand glider, tricycle, skydiving), and it did a rather good job in every cases (compared to previous tests). Whatever they have done to improve on that point, they did it in a way that generalise.


It hadn't occurred to me until now that the pelican could overcome the short legs issue by not sitting on the seat and instead put its legs inside the frame of the bike. That's probably closer to how a real pelican would ride a bike, even if it wasn't deliberate.


Very aero


You could! But just like others have mentioned, the performance would be negligible. If you really wanted to see more of a performance boost by pretraining you could try to create a bigger chunk of data to train off of. This would be done by either creating synthetic data off of your material, or finding adjacent information to your material. Here's a good paper about it: <https://arxiv.org/abs/2409.07431>


She did man


Very cool site! Working on my own similar project:

https://vimgolf.ai

To learn new vim motions. Have since gotten distracted by life, but need to actually finish it.


Neat idea! Nit: maybe offer the first few exercises without requiring a login, that way I can get a feel for it before deciding to sign up.


The required login/sign-up, privacy policy and lack of apparent open-sourcing seems antithetical for the average Linux user. You're going after a niche of a niche of a niche with this one, good luck lol.


I'd argue that the average Linux user likely knows how to use vim for the most basic editing but isn't necessarily motivated to learn vim. Intermediate users will be able to name a few modes in vim and navigate somewhat efficiently, that's about it. Only advanced users and those who really want to master vim (in other words, hardcore nerds) will try to make the most out of vim and use as few strokes as possible to navigate/edit, which is what these tools/sites are for. That's a few "niches" there.


I think once you start trying to use the occasional macro and/or make custom keybinds it pushes you further into the vim golf mindset. When you're saving an action to be repeated 100 times you really gotta get it right. I learned a lot of advanced movements due to macros as well. Like } and ) and marks (only just recently learned apostrophe jumps to marked line while backtick jumps to marked character on the line after years of always using apostrophe). I recently spent a half hour or so making two keybinds to insert the date/time in my preferred format at the end or start of a line + return the cursor to where it was before. While about half of the process was the same for both binds, I ran into multiple issues with the start of line version. Like, `I` for insert at start of line in neovim places your cursor after whitespace instead of before it, so instead had to use 0 and then insert stuff relatively. Also found out marked characters are based on the numbers of characters into the line, so if you add new stuff to the start of the line and then return to your mark, you won't be on the same word. 14 characters in before, 14 characters in now. I worked around that by counting how many I was inserting with my date text + spaces and such, then adding that # and l (move right) to the end of the keybind to make up for the difference. It was pretty satisfying when it finally worked.


You are truly a vim master. Yes, that's exactly the reason why I used containers to host the vim instances, as using a DOM based vim library wouldn't record each stroke accurately. Thank you for trying my site out.


Good feedback


Will you be removing the sign-up/in then?


Wonder what this means for the pelican riding on a bicycle test? Or will it just be good at strictly reasoning type problems.


The fact that it doesn’t change the images like 4o image gen is incredible. Often when I try to tweak someone’s clothing using 4o, it also tweaks their face. This only seems to apply those recognizable AI artifacts to only the elements needing to be edited.


That's why Flux Kontext was such a huge deal - it gave you the power of img2img inpainting without needing to manually mask the content.

https://mordenstar.com/blog/edits-with-kontext


Seems strange to not include the prompts themselves, if people are curious in trying to replicate it themselves.


Well.... that's a good idea - I'll see if I can dig them up!


You can select the area you want edited on 4o, and it’ll keep the rest unchanged


gpt doesn't respect masks


Correct. Have tried this without much success despite OpenAI's claims.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: