I've been playing this since it was first mentioned on HN a few puzzles in. It's a nice idea and pretty well executed.
I have, however, rejected making a user login. I recognise you're putting in time and energy to make something I'm just taking without payment, and it's your right to try to leverage it into something more - I wish you all the best in doing so - but asking for a user login as a gate to a feature you clearly don't need a user login for is enshittification.
Hey, thanks for the feedback. I didn't intend to pressure you to create an account. Sorry if I gave that impression.
I'm guessing you're referring to the ability to filter out completed puzzles from the archive? I added it for logged-in users first because it was simpler but I can extend that feature so it's available for everyone. (I'll need to add some alternate logic to pass your indexeddb levels to the server endpoint when fetching the archive. It's not complex. I just haven't prioritized it yet.)
I'll add this to my backlog and try to get to it after the player puzzles release.
Beyond that everything is available regardless of user account right now. I do plan to require an account to submit custom puzzles when that's released. (Mostly to make moderation easier. I may relax this down the line.)
EDIT: On further thought I realized it's also required to have an account to view and share your profile stats, though that could also work without an account with some changes.
Anything that requires server-side storage is a good reason to ask for an account, IMO. Theoretically you could assign a pseudo-account and store the id in client storage to have a shareable profile, but then you'll have to figure out how long you'll retain idle pseudo-accounts. (Assuming that completion detail is in client storage, at least for anonymous players).
A consequence of me being a freeloader too is that you don't have to change your plans to please me :-)
To me a "stop watch" is a type of watch, that's straight forward. But there are other clues that rely on cultural references I'm not familiar with - and that is, I think, inevitable in this type of game. We all have different backgrounds and there's no universal shared understanding that would make every clue the same difficulty for everyone.
I saw someone on here recently say they like to do the puzzle without looking at the clues, and I've started doing that on and off too, it changes the game in an interesting way.
It depends on the purpose for the model. AFAIK LLMs aren't particularly capable at researching answers, relying more on having 'truth' baked in to their weights, so if it takes 12 months to train up a crowd-trained LLM it'll be 12 months behind the times.
How serious a risk is poisoned weights?
Can we leverage the cryptobros into using LLM training as a proof of work?
Does Qwen3.5 know it needs to do this because the API in question has had loads of churn and much of its training data is on obsolete versions, or do you need to prompt it? How well does it handle having an API reference with sample code in its context window?
Having an LLM use a web search tool isn't the same thing as researching a topic, IMO, because it's so ephemeral and needs constant reinforcement. LLMs aren't learning machines, they're static ones.
Baidu have a lot of services I've never heard of, that are highly successful in China. The lack of interest in expanding into Western audiences doesn't seem to matter there - what's different about inference?
Time will tell. Depends on small model architecture trends and hardware availability. I wouldn't be surprised if something came slightly out of left field. Considering Taiwan is trapped into producing the same chips for the next 2 years, I wouldn't be surprised if a new player emerged.
If you fire all your SWEs they won't sit around twiddling their thumbs waiting for an AI collapse, they'll career shift. Maybe to an unemployment line and/or homelessness, maybe to something else productive, but either way they'll lose SWE skills.
If you close down all the SWE junior positions you'll strongly discourage young people training in the field. They'll do something else.
Then if you want to go back, who will you hire for it?
They are large language models. Not automated development machines. They hallucinate.
The goal post has not shifted since 2023 or so. Make an LLM that doesn't blatantly disregard knowledge it has, instructions it has been giving, over and over, and you win. If trillions of USD of investment can't do it, I'd be curious to see what can.
There are definitely automated dev systems, of which an LLM is a part. The remaining part may be called a 'harness' or whatever. The quality of the generated software is another matter.
If the AI is not good enough, then don't fire the devs. If/when the devs are no longer needed, I don't see why the need would return later, that was my point.
A harness like Claude Code does not turn an LLM into a software developer.
If that was the case companies could just have their project managers managing Claude Code instead of developers, and they would immediately realize that using Claude Code to develop software is just as complex and geeky as it ever was - nothing changed in that regard.
A harness and a bunch of skills is just the new "think step by step" prompting technique. Don't just let the LLM rip and write a bunch of code, but try to get it to think before coding, avoid things like churning the code base for no reason, and generally try to prompt it to behave more like a developer not an LLM. Except it still is an LLM.
A coding agent is really not much different to a chat "agent" in this regard. You've got the base LLM then a system prompt trying to steer it to behave in a certain way, always suggest "next step", keep to a consistent persona, etc. None of this actually makes the LLM any smarter or turns it into a brilliant conversationalist, anymore than the coding agent giving the LLM a system prompt magically turns it into a software developer.
If you prefer staying in denial, be my guest. But I've seen multiple instances of fully functioning software created by people who don't even know what code is. Maybe these creators are now developers, in a sense. But no SWE's were needed.
Sure, and I could buy a model rocket engine, strap it to a stick and launch it hundreds of feet into the air. Would that make me a rocket scientist? Next step Mars?
If you don't appreciate the difference between what an LLM or a coding agent can do, vs what a human can do, then I can't help you.
Hah, I like that: the main benefit of monads is turning your functional language back into an imperative one...
IMO it's because option is a monad, list is a monad, io is a monad, async is a monad, try-except is a monad, why invent different magic syntax and semantics for all of them when there's a perfectly good abstraction that covers the lot, and that lets you write functions that are agnostic to which particular monad they're in to boot.
Not the parent, but the only thing I really hate about 1Password is that I can't tell it to never offer to save a specific site's password. I can turn off all offers to save passwords, or I can have the stupid pop-up ask me multiple times a day if I want to save that password. The pop-up chases me across the site until I get rid of it. Aarrgh. Blood boiling. Rage overflowing.
I have the same issue when using Google Passwords. One specific example: Many of my bank websites require 2FA with a code via email, SMS, or token. Each time, Google Chrome asks me if I want to update the password with the 2FA token. I have no idea how to disable it. Am I doing something wrong?
I have the same complaint about lastpass. With lastpass it's doable, but I have to keep looking up how to configure a site to never site and never ask.
We're putting other providers through the gauntlet. An M4 Studio or two running the latest Qwen3 or whatever counts for state of the art in open models is also looking a little more viable all the time.
They can be super complementary! Open weight models can be your everyday standard goto, and frontier models for the harder and bigger tasks.
Having some open weight deployment or vendor is also a good thing, because you may have domain specific tasks where you can get better results on domain specific problems with a quick finetune.
Unsloth makes it particularly easy. Open weight LLMs are incredibly powerful building blocks.
reply