More

bobkb · 2026-06-09T18:25:13 1781029513

In an interesting coincidence I ended up watching Person of Interest S4 E5 while reading the announcement. The series showed some code supposedly belonging to to an AI.

Fable 5 said the first screen shot is from “ IDA Pro’s Hex-Rays decompiler” and a windows driver. The second screenshot triggered the safety guard rails and pushed me into Haiku.

Apparently the code is Windows driver code.

bobkb · 2026-06-07T16:18:34 1780849114

It’s impossible to write a spec that’s not ambiguous , complete and correct in natural languages. Thus prompts will always generate unreliable software.

bobkb · 2026-06-07T16:11:19 1780848679

IMHO even if we are using auditing tools I believe we must use deterministic tools for critical analysis like this. Such rule and pattern based systems may not scale beyond certain point but they can be accurate.

bobkb · 2026-06-07T10:20:46 1780827646

At work we are now in the process of migrating away from Figma. We had spend years perfecting our Figma based design workflow. Currently we are moving all the designs into the code itself using Storybook. The gap currently is reviews and feedback which is addressed by Chromatic now.

bobkb · 2026-06-06T14:26:13 1780755973

I tried building a deliberately vague project around managing MCP servers [0]. The purpose was to find what LLMs and agents can do. While the project didn’t reach anywhere I was amazed by how it’s possible to navigate even with no clear direction. The ability of the “glorified auto-complete” system to pull off something this sort was an eye opener for me.

0. https://github.com/bobinson/aop1

bobkb · 2026-06-05T11:02:51 1780657371

False positives from the deterministic audits a very difficult problem to address. Comparing and deduplicating across different methods or LLM audits seems to the only way.

bobkb · 2026-06-04T22:08:09 1780610889

I think these audit tools can look beyond just security and can look for compliance audits as well. The ability to audit real targets in staging environments makes it easy to identify issues.

bobkb · 2026-06-04T22:05:56 1780610756

Very interesting.

I have working on and using a similar tool for a while now :

https://github.com/bobinson/vulture

I have been struggling with false positives and using Claude + MCP as a poor man’s audit tool. As of last few days found better result with nvidia hosted models.

bobkb · 2026-06-01T14:08:10 1780322890

When will npm issues stop ? This has become a big pain !

bobkb · 2026-05-24T19:17:03 1779650223

That’s impressive!

On the sheer performance it’s comparable to Opus ?

stavros · 2026-05-24T22:41:41 1779662501

Here are my stats (from DeepSeek directly, with a script I wrote). The prices are what equivalent Sonnet usage would have cost, the actual amount I paid was $10. On performance, DeepSeek V4 Pro is comparable to Sonnet for me.

     ./cost.py amount-2026-5.csv 0.3 3.75 15
    input_cache_hit_tokens: 472,971,520 tokens -> $141.8915
    input_cache_miss_tokens: 13,299,013 tokens -> $49.8713
    output_tokens: 3,334,962 tokens -> $50.0244
    cache hit rate: 97.27% (472,971,520/486,270,533)
    cache miss rate: 2.73% (13,299,013/486,270,533)
    total: $241.7872

All of this usage was with an OpenCode subagent exclusively.