It doesn't override git diff at all, sem is its own standalone CLI. git diff continues to work exactly as before. You do sem setup only when you want to change your default git diff behavior, other wise after installing sem you can use it straight away using sem commands.
If I were you I'd remove setup/unsetup commands and replace with a note that if you want to use it for git diff here's what to put in your config, or suggest aliasing as git sdiff or whatever.
Ah okay thankyou! Is the MCP server manually configured, or is there documentation on the suggested way to tell an agent to use sem? My guess was that setup was how to do that.
no setup just configures your git diff to use sem by defult, you will find the sem mcp directory on github repostiory, also there's skill.md file which will tell your agent on how to use sem.
sem doesn't override git diff, it's a completely separate command (sem diff). Your regular git diff should work exactly as it always has after installing sem.
If you want to change your git diff default behavior then you can do sem setup.
That’s not clear at all from the docs. It shouldn’t be called “setup” then. Even after doing sem setup there should be a CLI flag to get the default diff output without unsetting up. Very annoying hijack.
sorry if you consider that as hijack, it was just a user's request to use this as default plugin on their git. But I will add it to let the users know thanks for the feedback
This is actually the exact scenario we just spent the last few weeks optimizing for. On a 71K-file TypeScript monorepo, sem was previously choking entirely (DNF), and now completes in 6.5s with the topology cache warm. On a 100K-file generated fixture, sem impact went from 90s cold down to about 1s warm. The key was building a SQLite-backed cache that stores the dependency graph structure so repeat runs skip re-parsing unchanged files entirely.
Thx oh and maybe don't call it sem. It's not really semantic, more like a big picture view vs the ground level git lines. How about "bye", short for bird's-eye?
Thanks! The data artifacts angle is really interesting. in some ways the problem is even harder there because data pipelines have less explicit structure than code, I guess.
The artifacts themselves have more structure, but diffing is hard because of size: what exactly do you show in the different? Row-level? Summary statistics? How do you keep it from getting slow on bigger datasets?
Then there are plots saved as images which have basically no structure at all exposed.
Row level and summary stats are both diffs over values that can tell you that something changed but not whether the * meaning * has changed. What I'm working on is providing more information on how the meaning changes.
What questions I'd like to answer with the diffing is more like: will the grain go from one-row-per-user to one-row-per-user-per-day, will a key stop being unique, will a join start fanning out and quietly double a measure, will something additive become non-additive.
This diff is over structure but this structure is latent in the transformation that produces it and to make things harder, if we are talking about some declarative language being used (e.g. SQL) the code doesn't even describe how things are getting done, but what the output would be.
What I've ended up doing is recovering the structure from the code by analyzing it and then using * cheap * profiling than a full row compare.
As an example, my equivalent impact sub-command output would be something like this: "this change makes account_id non-unique three models downstream"
git is actually great, and there are not much of the issues as the world says about it, and the best is to build complimentary layers that makes it even stronger is the best bet I guess.
reply