Hacker Newsnew | past | comments | ask | show | jobs | submit | cowlby's commentslogin

Underrated comment here. https://www.anthropic.com/research/emotion-concepts-function This study convinced me to be "nice" to AI agents. At least as I understood it, there's something in the weights that activating the "desperate" vector makes it more likely to cheat or cut corners. So yes I would err towards your suggested prompt over NEVER FUCKING GUESS.

LLM agents are unlocking demand and supply for applications that wouldn't have been possible before due to time constraints though. There's a growing demand for single user or smaller scoped apps where giving LLM agents direct access means velocity. The failure/rollback model is much easier with these as long as we have good backup hygiene.

> There's a growing demand for single user or smaller scoped apps where giving LLM agents direct access means velocity. The failure/rollback model is much easier with these as long as we have good backup hygiene.

This makes no sense to me. For anything that has sensitive payment or personally identifieable data, direct access to DB is potentially illegal.

> The failure/rollback model is much easier with these as long as we have good backup hygiene.

Have you actually operated systems like this in production? Even reverting to a DB state that is only seconds old can still lose hundreds or thousands of transactions. Which means loads of unhappy customers. More realistically, recovery points are often minutes or hours behind once you factor in detection, validation and operational overhead.

DB revert is for exceptional disaster recovery scenarios, not something you want in normal day-to-day operations. If you are saying that you want to give LLM full access to prod DB and then revert every time it makes a mistake, you aren't running a serious business.


You are thinking way too hard. This person is a hazard that needs to learn the hard way.

If velocity means letting agents live edit a db, I'm fine being slow. Holy hell. Let these people crash and burn but definitely let me know the app name so I know never to use it first.


Not everything is a SaaS. I commented this elsewhere but I picture all the business running on spreadsheets/CSVs/MS Access databases on someone's desktop. People delete these all the time by accident. They have no security, no authentication, etc.

An LLM agent (with RW access to a DB), a developer, and a few days these become proper apps that SMB business would pay well for.

Sure don't give an LLM agent access to PII or properly built CRMs etc. But to not see the rest of the landscape seems like a missed opportunity.


At the very least you should give it a non-prod copy of the database, not direct access to the DB actively powering production right now.

I've done work for a hedge fund where the DB ran directly on the manager's desktop. I worked with my local copy and sent an update script, and he had a second copy he ran on to verify.

Even with humans you shouldn't be working directly against the prod DB in these cases!


Yes, I just think there's a sane way to do things that is not "never let LLM agents do things".

For dev/prod staging though, there's that other story on HN right now of an LLM agent that maneuvered it's way to prod credentials and destroyed prod. And backups went along with it. I'm paranoid enough to think backups in this use case means out-of-band uncorrelated storage.


There is literally no excuse. The fact that there is any resistance to this let alone from multiple people terrifies me.

I just think there's more nuance to it. Some things have an implicit RTO/RPO/SLA of say a day. Risk is also correlated to recovery and rollback. And there's levels of LLMs out there.

Surely in the Venn Diagram of things, there's a slot where it's okay let a Claude Opus agent run on a process with good backups/recovery? Where taking the risk of a 1-hour restore job is worth the LLM agent velocity?

For extra paranoia, surely even Opus/Mythos can't figure out how to destroy log level backups to immutable storage.


The only nuance I can see is, does the data matter at all? If it does you shouldn't do this. If it doesn't then who cares, also why even put it in a database.

This narrative seems to come from people who haven't worked on meaningfully complex software systems. They're more like script kiddies than software developers. I don't mean that in a derogatory manner. They're right that LLMs are unlocking new possibilities in the realm of their work. They just don't realize that these new possibilities are constrained to relatively simple applications, or very thin slices of complex systems.

I use an LLM to access my database occasionally, but never in production and never with write access. It is genuinely useful. It would never be useful in a production setting, though.

It's worth noting too that people should be wary of what a read only user means in database land. There are plenty of foot guns where writes can occur with read-like statements, and depending on the schema, maybe this would be a rollback-worthy situation. You really need to understand your database and schema before allowing an LLM anywhere near it, and you should be reviewing every query.


That's the issue that I feel misses the forest for the trees. Relatively simple applications or thin slices exist right now, in production, in critical paths, as spreadsheets/CSVs/files on someone's desktop. That's the pent up demand I picture out there for developers.

Go to any SMB out there and there's a goldmine of processes that could be improved with LLM agents with full RW access to a database. Where backups are sufficient as a recovery mechanism that is better-than-before.


I think the Venn diagram of people letting LLMs have complete control of their database AND having good backups, will have no overlap. The people that would benefit or not the people that have backups.

This is also a good point. Details like this are why I think experienced developers are going to remain relevant for a while yet. Anticipating what can go wrong is such a huge component of what building software systems is about. LLMs can be great at it, but only with the limited context they have, and even then only somewhat coincidentally.

Okay, totally agree. I think good harnesses are crucial but the premise is absolutely valid.

I'm not thinking of SaaS or properly built apps with an API, modeled databases, etc. I'm thinking spreadsheets/CSVs/MS Access that thousands of SMBs use to power their critical paths and someone accidentally deletes. Typically single user, maybe a small team. Infrequent writes, lots of reads.

But are those users allowed to see all the data in the databawe by the law? Some privacy laws require that personal information must be hidden from employees unless they have a narrow and specific business reason to view it. Blanket full access to a database may be illegal for that reason.

I think a lot of the objections to your post could be answered by reminding folks of how Microsoft Access databases tend to pop up in small businesses as well as corporate environments outside of IT departments. Yes, they're not "proper" databases but they /get business done/ and often serve as v0 before a real app can be properly conceived of.

One can easily imagine an LLM-enabled database that lets a wider audience build meat-and-potatoes line-of-business apps for small team use with minimal compliance concerns.


Yes, that's the right framing. Millions flow through spreadsheets/CSVs/MS Access with none of the auth/backups/architecture people seem to be stuck to.

I saw an article on HN one time about CSVs and how much business still flows through them. Reminds me of the xkcd comic about the one tiny block propping up lots of infrastructure. It stuck with me because it's ripe area for LLM agent based upgrades.

Sure don't give LLMs access to the well architected blocks. But not wanting to improve the brittle areas seems crazy to me even if it's contrarian.


> single user

If you're just vibe coding a tool for yourself, you don't have 'production database' at all even if you use database technology for storage. Just like many Android apps use local sqlite DBs but they're not production databases.

Of course in this case no traditional wisdom about production databases matters to you. In other words, it's off-topic.


I commented this elsewhere: There's thousands of small and medium business though. They have maybe one true CRM, and a dozen spreadsheets/files floating around that would benefit becoming proper apps. People delete spreadsheets all the time!

Sure don't give an LLM agent write access to the modeled CRM that took months/years to build.

But turning a spreadsheet into an app in a few days? By giving the LLM proper read/write capabilities for velocity? I think the case is there for it. Right tool for the right job.


1) Can you explain what demand and supply mean in this context?

2) In regards to having good backup hygiene, who is we?


I think of all the pent up demand for proper applications that are just infeasible when it would take a developer weeks-to-months to create. Now it's just a few days with an LLM agent.

Examples for me are all the apps that live in a spreadsheet, or in a MS Access database. Or all the crappy ad backed apps on the iOS app store. People wipe full spreadsheets all the time and backups are the only recovery.

Just last weekend I was frustrated with the poor quality of Pokedex type apps that spam ads left and right. Took just one session with Claude Opus to roll a custom Pokedex. It knew internally about things like the PokeApi dataset, Pokemon data modelling etc. To-the-hour snapshots of the database are trivial for bespoke apps like this so the LLM agent velocity seems like an okay trade off for me.

Clearly people don't agree...


This makes no sense whatsoever.

It's not news that if you just give all developers at a company write access to the production databases, owner permissions on all resources, etc. that velocity can be increased. But at what cost?

The reason we don't do that in most cases is that "move fast and break things" only makes sense for trivial, non-critical applications that don't have any real importance, like Facebook.


There's thousands of small and medium business though. They have maybe one true CRM, and a dozen spreadsheets/files floating around that would benefit becoming proper apps. People delete spreadsheets all the time!

Sure don't give an LLM agent write access to the modeled CRM that took months/years to build.

But turning a spreadsheet into an app in a few days? By giving the LLM proper read/write capabilities for velocity? I think the case is there for it. Right tool for the right job.


I think the argument would be mostly about the companies where such trivialities like proper auth were given up to maximum possible extent. I'm sure even some bigger ones are only gnashing their teeth over implementing security measures that are required by law and not seeing much point to it.

This comment is savage and I’m here for it.

Tiller https://tiller.com/ is a good Plaid "proxy". It'll write data to a Google Sheets and can maneuver from there.

I recently did this for myself using https://tiller.com/ to sync checking/credit-card transactions to a Google Sheets spreadsheet. Then a GitHub action mirrors the spreadsheet to a free Supabase database.

From there, Supabase MCP or psql gives Claude/Codex access to the transactions/balances for english queries. Really impressed with their ability to find subscription patterns, abnormal patterns, etc. Also to predict cashflow which no online tool so far is good at i.e. "tell me how much cash I can move to savings based on my monthly spend patterns and available cash".

For autocategorization, I learned Claude is really good at custom DSLs. Had it create a markdown table based ruleset to normalize payee/categories. I also run the autocat rules as part of the GitHub actions.


How does Tiller get your transaction data from your bank?

Do they pull it through Plaid and the like? It's been a while since I checked them out.

Does it still entail entrusting Plaid with your web banking user credentials? How's 2FA handled?

Does Plaid still rely on screen scraping for certain financial institutions who lack formal API's? What happens if there's a bug and they inadvertently click something they shouldn't, eg. "I Agree" to a popup or something you don't consent to, or even send funds to the wrong place? I know they claim they are "read only" but afaik no bank offers the ability to set up secondary user accounts (on personal banking plans) that truly are just read only?

Do they maintain underwriten insurance or a bond or something to improve your confidence you'll be reimbursed if they, say, cause you a million+ dollars worth of financial damage?

How about the implications of letting both those parties see all your private banking data? I heard there was a class action lawsuit with allegations data was sold or shared inappropriately, any indications on what actually happened?

Or how about the clauses in your banking Terms of Service where you agree not to share your password with third parties?

I just feel queesy using a web / cloud service to manage my finances. Would prefer some client software that runs locally and talks to some kind of bank API's. Does such a thing exist in Canada? (Open Banking is supposed to be coming but I'm not clear if individuals will be able to access it for software they write themselves?) I would switch banks if it did.

These are genuine questions, I sure could use something like an API to my bank, if it were impeccably trustworthy and enforced policies of limited internal data retention once I've "downloaded" it.


Local AI models are getting a lot better. If you have the capability to run them, you could automate this yourself using your own browser automation, actually. It is rather fiddly, as mentioned in the post, but is absolutely doable, and probably the only option, at least for now, where you wouldn't need to provide your credentials to a third party.

Plaid does do screen scraping for smaller banks, but they have agreements for OAuth-based access with most of the largest institutions.


I believe they use Yodlee and yes there is a lot of trust in Yodlee/Tiller to keep data safe. The integrations go through an OAuth type flow where you hit say Chase first and approve/revoke individual accounts so it seems like it's API based now, not screen scraping.

For all those concerns, I bet you could automate just parsing all the data from the statements or a CSV export.


Another +1 for Tiller!

I'm doing something similar with Tiller (which I've been using since Mint was acquired by Intuit).

It's neat to see how OP did this using a Claude Routine though. My version locally uses a local qwen model + an API key (annoyingly created using OAuth) with sheets access. A Claude Routine would've been significantly easier


Routines are so fun to work with. It's almost too easy to spin a new one up. A little worried about when I get past the 15 routines limit, though. Then it goes into "extra usage" land.

That's so cool! Are you planning to open source any of this? Would love to see how you set everything up, or - maybe most interested in - some of your prompts.

It was all GitHub Spec Kit + Claude Opus tbh. I narrated a couple paragraphs of how I wanted to sync to flow and it knocked it out in one pass practically.

Here's the initial spec it created. I started off writing to a local sqlite db instead of Supabase: https://gist.github.com/cowlby/0dbeb52403c3f3c0f1d8122505203...

Edit: Here's also the DSL categorization spec. First one was string based, found it cumbersome, so second one was the Markdown table refactor: https://gist.github.com/cowlby/30d6b5cf132fc1424ab146f0eaf4a...

https://gist.github.com/cowlby/d569c8e05b5b6eecfd4d237372c06...

(edit: put in Gist instead of inline here)


This is really neat, and actually pretty sophisticated. It's like a tiny fintech.

How well has the cash flow prediction worked?


It's great because Claude Code generated complex analysis/models 10X better than I was familiar with.

The key was normalizing the payee/categories so we can analyze month to month, and separating fixed vs variable spend. It then did a fancy Monte Carlo simulation with the computed mean/stddev per payee. And out came T+30/T+60 estimates at P50/P80/P90.


why not just use Plaid under the hood

I've been using Tiller since the pre-AI era. Plaid seems to be more B2B oriented so I haven't looked into it. But eventually yes that would be ideal to own the full pipeline.

Have you tried browser automation?

Ahh that makes sense. Sometimes it's convenient to re-use an older conversation that has all the context I need. But maybe it's just the last 20% that's relevant.

It would be nice to be able to summarize/cut into a new leaner conversation vs having to coax all the context back into a fresh one. Something like keep the last 100,000 tokens.

I believe /compact achieves something like this? It just takes so long to summarize that it creates friction.


I keep reading about this lately but what doesn't make sense then is how few deaths/injuries there are relative to how much acetaminophen is consumed. If tens of millions take it every day, that's billions of doses a year of acetaminophen. Why don't we see MORE injuries/deaths?

"Acetaminophen toxicity is the second most common reason for liver transplantation worldwide and the most common cause of acute liver failure in the United States. Responsible for 56,000 emergency department visits and 2600 hospitalizations, acetaminophen poisoning causes 500 deaths annually in the United States."

56,000 emergency room visits is the key here, because "the mortality associated with acetaminophen overdose is low if recognized and treated within the first 8 hours after an acute ingestion."

So I guess it depends on if you think 56,000 is low or not.

Source: "Acetaminophen Toxicity", David H. Schaffer; Brian P. Murray; Babak Khazaeni. 2026/02/19. https://www.ncbi.nlm.nih.gov/books/NBK441917/


About 50% of overdoses are intentional (especially suicidal teenagers), with the other 50% accidental.

So when pondering the issue of numbers, it matters what path people took to overdose.


For all accidental acute poisonings leading to hospitalizations from OTC drugs amongst adults and adolescents, the top culprits are:

1. Acetaminophen: Dangers noted in article, and stats given in my parent comment

2. NSAIDs: "NSAIDs are ingested commonly in overdose, however severe toxicity is rare"

3. Salicylates "Severe salicylate poisoning follows ingestion of greater than 500 mg/kg". For an adult weighing 150lbs that is 68kg, which means severe poisoning requires 34g of aspirin, which at 325mg per pill is 104 pills total. Hardly easy to do this accidentally.

[1] "Acute poisoning: understanding 90% of cases in a nutshell", S L Greene, P I Dargan, A L Jones, Postgrad Med J 2005;81:204–216


I "fixed" this for myself with tweakcc which let's you patch the system prompts. I changed the malware part to just be "watch out for malware" and it's stopped being unaligned.

They really should hand off read() tool calls to a lean cybersecurity model to identify if it's malware (separately from the main context), then take appropriate action.


The newest versions of the Claude Code package on npm just download the native executables and run that instead. Does tweakcc support that yet? Last time I tried it, there were some pretty huge error messages. For now I've been coping with a pinned version.

I'm fascinated that Anthropic employees, who are supposed to be the LLM experts, are using tricks like these which go against how LLMs seem to work.

Key example for me was the "malware" tool call section that included a snippet with intent "if it's malware, refuse to edit the file". Yet because it appears dozens of times in a convo, eventually the LLM gets confused and will refuse to edit a file that is not malware.

I've resorted to using tweakcc to patch many of these well-intentioned sections and re-work them to avoid LLM pitfalls.


These aren't as much tricks as just one layer of defense. But prompting is useless, as you can use the API directly without these prompts.

I run claude code with my own system prompt and toolings on top of it. tweakcc broke too often and had too many glitches.


They aren’t necessarily experts at using Llm’s. They have different incentives as well

Was that an Anthropic issue, or a gpt-oss problem?

Using tweakcc I can see the system prompt is supposed to mean “if it’s malware, refuse to improve or augment the code”. But due to all the malware noise it’s confusing the instruction as “don’t improve or augment after reading”.

I thought this was integral to LLM context design. LLMs can’t prompt their way to controls like this. Surprised they took such a hard headed approach to try and manage cybersecurity risks.


Sometimes it feels like as developers we live in a a bubble. Don't most jobs endanger human development? I can't help but think about all the billions of factory, food service, assembly line type jobs. Do these not threaten "human development"? My cynical take would be all AI endangers is "white collar" work.

I think you're not wrong and I also think the author is not wrong -- and this just may be how technology/civilization/humans are going to change inevitably?

For example a possibly trajectory might be that many years in the future because human thinking has degraded due to AI-assisted cognition, most people will get a chip implant and AI-assistance becomes integrated with the brain. Basically same pattern as most everything else -- technology augments solve the new reality. I'm not saying this will happen, but just a possible outcome of this.


Doesn’t that sound lovely

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: