More

nestorD · 2025-12-15T20:35:20 1765830920

Fun fact: archaeological evidence on I Ching divinatory records shows an hexagram distribution different from the one produced by the yarrow stalk method. Meaning that, while it is now considered the traditional method, it was likely not the original approach.

jackzhuo · 2025-12-16T01:23:32 1765848212

That's a really cool fact about the archaeology!

To be honest, my reason for picking this method was simple: I was reading a book about the I Ching that described the different ways to cast hexagrams.

The Yarrow Stalk method stood out to me because it felt more mysterious—in the past, it seemed like a secret method known only by a few experts.

Also, from a coding perspective, this algorithm was just much more interesting to build than a simple coin toss!

z2 · 2025-12-15T21:39:30 1765834770

Naive question: could this have been survivorship bias? Could certain ones not have been written down or kept with the others?

nestorD · 2025-12-16T00:35:00 1765845300

I doubt it. The I Ching does not really have bad / low interest hexagrams. Also historians who studied the topic seem pretty sure that the yarrow stalk method is a recent introduction (by I Ching standards, we are talking about a bronze age divination tool...).

nestorD · 2025-11-16T23:16:42 1763335002

So far I have seen two genuinely good arguments for the use of MCPs:

* They can encapsulate (API) credentials, keeping those out of reach of the model,

* Contrary to APIs, they can change their interface whenever they want and with little consequences.

the_mitsuhiko · 2025-11-16T23:44:54 1763336694

> * Contrary to APIs, they can change their interface whenever they want and with little consequences.

I already made this argument before, but that's not entirely right. I understand that this is how everybody is doing it right now, but that in itself cause issues for more advanced harnesses. I have one that exposes MCP tools as function calls in code, and it encourages the agent to materialize composed MCP calls into scripts on the file system.

If the MCP server decides to change the tools, those scripts break. That is is also similar issue for stuff like Vercel is advocating for [1].

[1]: https://vercel.com/blog/generate-static-ai-sdk-tools-from-mc...

lsaferite · 2025-11-17T16:09:27 1763395767

Wouldn't the answer to this be to have the agent generate a new materialized workflow though? You already presumably have automated the agent's ability to create these workflows based off some prompting and a set of MCP Servers.

rsanheim · 2025-11-17T02:56:38 1763348198

But …you have to give the MCP the creds somehow. Maybe it’s via a file on disk (bad), maybe via an env var (less bad). Maybe you do it via your password CLI that you biometricly auth to, which involves a timeout of some sort for security, but that often means you can’t leave an agent unattended.

In any case, how is any of this better than a CLI? CLIs have the same access models and tradeoffs, and a persistent agent will plumb the depths of your file system and environment to find a token to do a thing if your prompt was “do a thing, use tool/mcp/cli”.

So where is this encapsulation benefit?

tuananh · 2025-11-17T03:37:06 1763350626

mcp is easy to self-host. model? a little less so.

cstrahan · 2025-11-17T20:21:11 1763410871

You're not wrong, but I figured I'd point out the cons / alternatives:

> They can encapsulate (API) credentials, keeping those out of reach of the model

An alternative to MCP, which would still provide this: code (as suggested in https://www.anthropic.com/engineering/code-execution-with-mc... and https://blog.cloudflare.com/code-mode/).

Put the creds in a file, or secret manager of some sort, and let the LLM write code to read and use the creds. The downside is that you'd need to review the code to make sure that it isn't printing (or otherwise moving) the credentials, but then again you should probably be reviewing what the LLM is doing anyway.

* Contrary to APIs, they can change their interface whenever they want and with little consequences.

The upside is as stated, but the downside is that you're always polluting the context window with MCP tool descriptions.

tptacek · 2025-11-17T00:50:44 1763340644

What's the alternative design where the model has access to API credentials?

baby_souffle · 2025-11-17T01:11:00 1763341860

> What's the alternative design where the model has access to API credentials?

All sorts of ways this can happen but it usually boils down to leaving them on disk or in an environment variable in the repo/dir(s) where the agent is operating in.

moneywoes · 2025-11-17T01:02:47 1763341367

what about things like rate limiting, how are those implemented, any Goodreads

nestorD · 2025-11-14T22:27:54 1763159274

Oh! That's a nice use-case and not too far from stuff I have been playing with! (happily I do not have to deal with handwriting, just bad scans of older newspapers and texts)

I can vouch for the fact that LLMs are great at searching in the original language, summarizing key points to let you know whether a document might be of interest, then providing you with a translation where you need one.

The fun part has been build tools to turn Claude code and Codex CLI into capable research assistant for that type of projects.

throwup238 · 2025-11-14T23:21:55 1763162515

> The fun part has been build tools to turn Claude code and Codex CLI into capable research assistant for that type of projects.

What does that look like? How well does it work?

I ended up writing a research TUI with my own higher level orchestration (basically have the thing keep working in a loop until a budget has been reached) and document extraction.

nestorD · 2025-11-15T02:43:47 1763174627

I started with a UI that sounded like it was built along the same lines as yours, which had the advantage of letting me enforce a pipeline and exhaustivity of search (I don't want the 10 most promising documents, I want all of them).

But I realized I was not using it much because it was that big and inflexible (plus I keep wanting to stamp out all the bugs, which I do not have the time to do on a hobby project). So I ended up extracting it into MCPs (equipped to do full-text search and download OCR from the various databases I care about) and AGENTS.md files (defining pipelines, as well as patterns for both searching behavior and reporting of results). I also put together a sub-agent for translation (cutting away all tools besides reading and writing files, and giving it some document-specific contextual information).

That lets me use Claude Code and Codex CLI (which, anecdotally, I have found to be the better of the two for that kind of work; it seems to deal better with longer inputs produced by searches) as the driver, telling them what I am researching and maybe how I would structure the search, then letting them run in the background before checking their report and steering the search based on that.

It is not perfect (if a search surfaces 300 promising documents, it will not check all of them, and it often misunderstands things due to lacking further context), but I now find myself reaching for it regularly, and I polish out problems one at a time. The next goal is to add more data sources and to maybe unify things further.

throwup238 · 2025-11-15T04:09:01 1763179741

> It is not perfect (if a search surfaces 300 promising documents, it will not check all of them, and it often misunderstands things due to lacking further context)

This has been the biggest problem for me too. I jokingly call it the LLM halting problem because it never knows the proper time to stop working on something, finishing way too fast without going through each item in the list. That’s why I’ve been doing my own custom orchestration, drip feeding it results with a mix of summarization and content extraction to keep the context from different documents chained together.

Especially working with unindexed content like colonial documents where I’m searching through thousands of pages spread (as JPEGs) over hundreds of documents for a single one that’s relevant to my research, but there are latent mentions of a name that ties them all together (like a minor member of an expedition giving relevant testimony in an unrelated case). It turns into a messy web of named entity recognition and a bunch of more classical NLU tasks, except done with an LLM because I’m lazy.

nestorD · 2025-10-22T17:40:54 1761154854

The paper[0] is actually about their logarithmic number system. Deep learning is given as an example, and their reference implementation is in PyTorch, but it is far from the only application.

Anything involving a large number of multiplications that produce extremely small or extremely large numbers could make use of their number representation.

It builds on existing complex number implementations, making it fairly easy to implement in software and relatively efficient. They provide implementations of a number of common operations, including dot product (building on PyTorch's preexisting, numerically stabilized by experts, log-sum-of-exponentials) and matrix multiplication.

The main downside is that this is a very specialized number system: if you care about things other than chains of multiplications (say... addition?), then you should probably use classical floating-point numbers.

[0]: https://arxiv.org/abs/2510.03426

nestorD · 2025-08-18T15:09:33 1755529773

I have found putting the spec together with a model, having it to try find blindspots and write done the final take in clear and concise language, useful.

A good next step is to have the model provide a detailed step by step plan to implement the spec.

Both steps are best done with a strong planning model like Claude Opus or ChatGPT5, having it write "for my developer", before switching to something like Claude Code.

nestorD · 2025-08-11T15:52:19 1754927539

I have found Claude code to be significantly better, both in how good the model ends up being and in how polished it is. To the point that I do not drop down to Gemini CLI when I reach my Claude usage limit.

nestorD · 2025-08-08T20:19:08 1754684348

The first step is to acquire hardware fast enough to run one query quickly (and yes, for some model size you are looking at sharding the model and distributed runs). The next one is to batch request, improving GPU use significantly.

Take a look at vLLM for an open source solution that is pretty close to the state of the art as far as handling many user queries:https://docs.vllm.ai/en/stable/

nestorD · 2025-07-10T17:42:34 1752169354

One thing I could not find on a cursory read is how used were those developers to AI tools. I would expect someone using those regularly to benefit while someone who only played with them a couple of time would likely be slowed down as they deal with the friction of learning to be productive with the tool.

uludag · 2025-07-10T17:59:13 1752170353

In this case though you still wouldn't necessarily know if the AI tools had a positive causal effect. For example, I practically live in Emacs. Take that away and no doubt I would be immensely less effective. That Emacs improves my productivity and without it I am much worse in no way implies that Emacs is better than the alternatives.

I feel like a proper study for this would involve following multiple developers over time, tracking how their contribution patterns and social standing changes. For example, take three cohorts of relatively new developers: instruct one to go all in on agentic development, one to freely use AI tools, and one prohibited from AI tools. Then teach these developers open source (like a course off of this book: https://pragprog.com/titles/a-vbopens/forge-your-future-with...) and have them work for a year to become part of a project of their choosing. Then in the end, track a number of metrics such as leadership position in community, coding/non-coding contributions, emotional connection to project, social connections made with community, knowledge of code base, etc.

Personally, my prior probability is that the no-ai group would likely still be ahead overall.

iLemming · 2025-07-11T22:41:49 1752273709

FWIW, LLM tooling for Emacs is great. gptel for example allows you to converse with wide-range of different models from anywhere in Emacs — you can spontaneously send requests while typing some text or even browsing M-x menu. I often do things like "summarize current paragraph in pdf document" or "create a few anki cards based on this web page content", etc.

nestorD · 2025-06-25T09:00:59 1750842059

Yes! I recently had to manually answer and close a Github issue telling me I might have pushed an API key to github. No, "API_KEY=put-your-key-here;" is a placeholder and I should not have to waste time writing that.

nestorD · 2025-06-16T09:22:08 1750065728

I don't use it to avoid reading man pages. Rather, as often with LLMs, this is a faster way to do things I already know how to do. Looking at commands I run in various situations and typing them for me, faster than I can remember the name of a flag i use weekly with a pdf processing tool or type 5 consecutive shell commands.

Money wise, my full usage so far (including running purposely large inputs/outputs to stress test it) has cost me.... 19c. And I am not even using the cheapest model available. But, you could also run it with a local model.