Hacker Newsnew | past | comments | ask | show | jobs | submit | hsaliak's commentslogin

This is most certainly vibed with a few optimization focused prompts. Yes - performance is a feature, but so is lack of risk.

My experience so far tells me that the default path with AI tooling is that it lets us create without learning. So the author is right in that they can pay for a seat in this revolution whenever they want.

A practitioner with more experience maybe a few percentage points more productive, but the median - grab subscription, get tool, prompt, will be mostly good enough.


I think this is true at the solo developer scale, but I suspect experience will be much more evident when working with a team.

I expect tools to start embedding an SLM ~1B range locally for something like this. It will become a feature in a rapidly changing landscape and its need may disappear in the future. How would you turn into a sticky product?


Token usage and agent usage optimisation?

It seems like a real problem for me. Probably because I'm not overly inspired to pay for a Claude x5 subscription and really hate the session restrictions (esp when weekly expend at the end of the week can't be utilized due to session restrictions) on a standard pro model. Most of my tasks are basically using superpowers and I find I get about 30-90m of usage per session before I run out of tokens (resets about every 4 hours after which I generally don't get back to until the next day (my weekly usage is about 50% so lots of wastage due to bad scheduling). A tool like this could add better afk like agent interoperability through batching etc as a one tool fits all like scenario.

If this gets its foot in the door/market-share there is plenty of runway here for adding more optimized agent utilization and adding value for users.


Agreed on the need, and this space needs more exploration that is not going to come from big-cos as they are incentivised in boosting spend. I've been exploring the same problem statement, but with a different approach https://github.com/hsaliak/std_slop/blob/main/docs/CONTEXT_M....

The comment was more around how to make their approach sticky.. I feel that local SLMs can replicate what this product does.


https://github.com/hsaliak/std_slop a sqlite centric coding agent. it does a few things differently. 1 - context is completely managed in sqlite 2 - it has a "mail model" basically, it uses the git email workflow as the agentic plan => code => review loop. You become "linus" in this mode, and the patches are guaranteed bisect safe. 3 - everything is done in a javascript control plane, no free form tools like read / write / patch. Those are available but within a javascript repl. So the agent works on that. You get other benefits such as being able to persist js functions in the database for future use that's specific to your codebase.

Give it a try!


I explored this in std::slop (my clanker) https://github.com/hsaliak/std_slop. One of it's differentiating features of this clanker i that it only has a single tool call, run_js. The LLM produces js scripts to do it's work. Naturally, i tried to teach it to add comments for these scripts and incorporate literate programming elements. This was interesting because, every tool call now 'hydrated' some free form thinking, but it comes at output token cost.

Output Tokens are expensive! In GPT-5.4 it's ~180 dollars per Million tokens! I've settled for brief descriptions that communicate 'why' as a result. The code is documentation after all.


This is a very nice and clean implementation. Related to this - I've been exploring injecting landlock and seccomp profiles directly into the elf binary, so that applications that are backed by some LLM, but want to 'do the right thing' can lock themselves out. This ships a custom process loader (that reads the .sandbox section) and applies the policies, not unlike bubblewrap which uses namespaces). The loading can be pushed to a kernel module in the future.

https://github.com/hsaliak/sacre_bleu very rough around the edges, but it works. In the past there were apps that either behaved well, or had malicious intent, but with these LLM backed apps, you are going to see apps that want to behave well, but cannot guarantee it. We are going to see a lot of experimentation in this space until the UX settles!


GCP Next is Apr 22-24. Hope this continues to live afer that.


I'd like to plug https://github.com/hsaliak/std_slop/blob/main/docs/mail_mode... my coding harness (std::slop)'s mail model (a poor name i admit). I believe this solves a fundamental problem of accummulating errors along with code in your project.

This brings the Linux Kernel style patch => discuss => merge by maintainer workflow to agents. You get bisect safe patches you 'review' and provide feedback and approve.

While a SKILL could mimic this, being built in allows me to place access control and 'gate' destructive actions so the LLM is forced to follow this workflow. Overall, this works really well for me. I am able to get bisect-safe patches, and then review / re-roll them until I get exactly what I want, then I merge them.

Sure this may be the path to software factories, but it scales 'enough' for medium size projects and I've been able to build in a way that I maintain strong understanding of the code that goes in.


No it does not. None of these models have the “depth” that the frontier models have across a variety of conversations, tasks and situations. Working with them is like playing snakes and ladders, you never know when it’s going to do something crazy and set you back.


The Gemini-CLI situation is poor. They did not communicate that AI Pro or AI Ultra accounts cannot be used with this API broadly earlier. I specifically remember searching for this info. Seeing this made me wonder if I had missed it. Turns out it was added to the TOS 2 days ago - diff https://github.com/google-gemini/gemini-cli/pull/20488/chang.... I'd be happy to stand corrected here.

Anti Gravity I understand, they are subsidizing to promote a general IDE, but I dont understand constraining the generative AI backend that Gemini CLI hits.

Finally, it's unclear what's allowed and what's not if I purchase the API access from google cloud here https://developers.google.com/gemini-code-assist/docs/overvi...

The Apache License of this product at this point is rich. Just make it closed source and close the API reference. Why have it out there?


I have a Code Assist Standard license to evaluate gemini-cli (and the new models)

To this day I cannot coax the gemini-cli to allow me to use the models they claim you have access to. Enabled all the preview stuff in cloud etc etc.

Still I mostly get 2.5 and rarely get 3 or 3.1 offered.

The gemini-cli repo is a shit show.

I can seem to access the new models using opencode, but am 429 rate limited almost immediately such that its like 5 minutes between calls.


It takes your query, computes the complexity of the request, and tries to route it to the appropriate model. There is a /manual command i think, to pick the right model.

They mask the 429s well in Gemini-Cli - if an endpoint is rate limited, they try another, or route to another model, etc to keep service availability up.

Your experience on the 429s is consistent with mine - the 429s is the first thing they need to fix. Fix that and they have a solid model at a good price point.

I use my own coding agent (https://github.com/hsaliak/std_slop) and not being able to bring my (now cancelled) AI account with Google to it is a bummer.

I'd still use it with the Code Assist Standard license if the google cloud API subscription allows for it but I have no clarification.


> It takes your query, computes the complexity of the request, and tries to route it to the appropriate model. There is a /manual command i think, to pick the right model.

That is what is should do, but there is no > 2.5 model shown in /model and it always picks a 2.5 model. Ive enabled preview models in the google cloud project as well.

If I pass the 3 model in start param it shows 3 in the lower right corner but it is still using 2.5.

I know google has issues dealing with paying customers but the current state is a shit show. If you go to the gemini-cli repo its a deluge of issues and ai slop. It seems there is a cadre of people jumping to be the first person to pump an issue into claude and get some sort of PR clout.

It might be good but it needs more time to cook, or they need to take a step back and evaluate what they should consider a paid product.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: