secbear's comments

secbear · 2026-02-24T02:13:29 1771899209

Totally agree. I've found in many cases it's easier to roll your own software/patch existing software with AI than to open an issue, submit a PR, get it reviewed/merged, etc. Let alone buying software

tclancy · 2026-02-24T02:47:56 1771901276

Yes, but this is the honeymoon period. A year from now when you want to make three of the tools talk to each other and they're in three different languages, two of which you don't know and there's no common interface or good place to put one, well, here's hoping you hung onto the design documents.

ang_cire · 2026-02-24T12:57:20 1771937840

Maybe I'm just naive, but I've been making lots of my 'vibe-coded' tools interoperable already.

My assumption is that eventually the VC-backed gravy train of low-cost good-quality LLM compute is going to dry-up, and I'm going to have to make do with what I got out of them.

secbear · 2026-02-24T02:11:26 1771899086

seems pretty solid from a security perspective actually

secbear · 2026-02-16T23:22:04 1771284124

Hard mode- constant auto-complete suggestions that are right ~50% of the time

secbear · 2026-02-16T23:06:18 1771283178

The finding that self-generated skills provide negative benefit (-1.3pp) while curated skills give +16.2pp is the most interesting result here imo. Big discrepancy, but makes sense. Aligns with the thought that LLMs are better consumers of procedural knowledge than producers of it.

+4.5pp for software engineering is suspiciously low compared to +51.9pp for healthcare. I suspect this reflects that frontier models already have strong SWE priors from training data, so skills add less marginal value. If true, skills become most valuable precisely in the domains where models are weakest — which is where you'd actually want to deploy agents in production. That's encouraging.

cheema33 · 2026-02-17T00:38:05 1771288685

> +4.5pp for software engineering is suspiciously low compared to +51.9pp for healthcare.

This stood out for me as well. I do think that LLMs have a lot of training data on software engineering topics and that perhaps explains the large discrepancy. My experience has been that if I am working with a software library or tool that is very new or not commonly used, skills really shine there. Example: Adobe React Spectrum UI library. Without skills, Opus 4.6 produces utter garbage when trying to use this library. With properly curated/created skills, it shines. Massive difference.

D-Machine · 2026-02-17T06:06:09 1771308369

Nothing other to say than I appreciate you sharing these explicit details and insights here.

hardware2415 · 2026-02-16T23:09:08 1771283348

[flagged]

nvader · 2026-02-16T23:14:33 1771283673

Hmm, not for me, but I'm curious if there are signatures I'm missing.

To me, author reads like an articulate native English speaker, but typing on their phone.

jeron · 2026-02-16T23:48:56 1771285736

not all em-dash users are AI!

jibal · 2026-02-17T00:14:51 1771287291

All ad hominems are irrational but that one is worse than most.

secbear · 2026-02-16T19:20:57 1771269657

I feel similar... OpenClaw has lots of vulnerabilities, and it's very messy, but it also brought self-hosted cron-based agentic workflows to your favorite messaging channel (iMessage, telegram, slack, WhatsApp, etc.), which shouldn't be overlooked

secbear · 2026-02-14T23:07:44 1771110464

Nice, have you considered adding support for agents/commands as a part of these presets?

lennacodes · 2026-02-15T05:03:36 1771131816

Thank you I hadn't, but that's a great call. Working on adding support for those now. Should be in the next release soon.

secbear · 2026-02-14T23:05:54 1771110354

Agreed, my experience and code quality with claude code and agentic workflows has dramatically increased since investing in learning how to properly use these tools. Ralph Wiggum based approaches and HumanLayer's agents/commands (in their .claude/) have boosted my productivity the most. https://github.com/snwfdhmp/awesome-ralph https://github.com/humanlayer

secbear · 2026-02-14T23:01:21 1771110081

Amazing to see claude's reasoning and process through reversing this

secbear · 2026-02-09T18:08:14 1770660494

Built this because I wanted Claude Code to run untrusted snippets without touching my system, but Docker felt heavy. Uses jail.nix (bubblewrap) for isolation. Currently supports Python, Node, Bash with persistent REPL sessions. Would love feedback on the interface design.