Why? WHY?! Why the heck are you using (D)VFS on your immutable data? What is the reasoning? That stuff is immutable and usually incremental.. Just throw proper syncing algoritm on it and sync w/ backups.. thats all. I wonder aby logic behind this...
Docs and other files you often change is completly different story. This is where DVFS shines. I wrote my own very simple DVFS exacly for that case. You just create directory, init repo manager.. and vioala.. Disk wide VFS is kinda useless as most of your data there just sits..
I also used to use git-annex on my photos, ended up getting frustrated with how slow it was and wrote aegis[1] to solve my use case.
I wrote a bit about why in the readme (see archiving vs backup). In my opinion, syncing, snapshots, and backup tools like restic are great but fundamentally solve a different problem from what I want out of an archive tool like aegis, git-annex, or boar[2].
I want my backups to be automatic and transparent, for that restic is a great tool. But for my photos, my important documents and other immutable data, I want to manually accept or reject any change that happens to them, since I might not always notice when something changes. For example if I fat finger an rm, or a bug in a program overrides something and I don't notice.
I don't really need the versioning aspect too much, but sometimes I modify the photos a bit (e.g. rotating or so). But all the other things are relevant for me, like having it distributed, syncing, only partially having the data on a particular node, etc.
So, what solution would be better for that? In the end it seems that other solutions provide a similar set of features. E.g. Syncthing.
But what's the downside with Git-annex over Syncthing or other solutions?
If you want two-way distributed syncing, that is a bit more complicated and error prone, but most tools support it, even rsync. Simpler aproach is to have central primary node (whatever it desktop or storage) when you sync copy data and sync it to backups.
As I said, handling immutable data (incremental) is easy. You just copy and sync. Kinda trival. The problem I had personaly was all the importand docs (and similar) files I work on. First, I wanted snapshots and history, in case of some mistake or failure. Data checksuming, because they are importand. Also, full peer2peer syncing because I have desktop, servers, VMs, laptop, so I want to sync data around. And because I really like GIT, great tool for VCS, I wanted something similar but for generic binary data. Hence I interested in DVFS system. First I wanted full blown mountable DVFS system, but that is complicated and much harder to make it portable.. Repository aproach is easy to implement and is portable (Cygwin, Linux, UNIX, Posix). Works like a charm.
As for downside, If you think git-annex will work for you, just use it :) For me, it was far too complicated (too much moving parts) even for my DVFS usecase.
For immutable data is absolutly overkill, to keep 100s of GBs of data there.
I just sync :)
> Why the heck are you using (D)VFS on your immutable data?
Git-annex does not put your data in Git. What it tracks using Git is what’s available where, updating that data on an eventually consistent basis whenever two storage sites come into contact. It also borrows Git functionality for tracking moves, renames, etc. The object-storage parts, on the other hand, are essentially a separate content-addressable store from the normal one Git uses for its objects.
(The concrete form of a git-annex worktree is a Git-tracked tree of symlinks pointing to .git/annex/objects under the repo root, where the actual data is stored as read-only files, plus location-tracking data indexed by object hash in a separate branch called “git-annex”, which the git-annex commands manipulate using special merge strategies.)
I am looking into using Git for my photos/videos backup external HDDs and the reasoning is simple. It's not about keeping track of changes within the files themselves since like you said, they (almost) never change. Rather, it's about keeping track of changes in _folders_. That is, I want to keep track of when I last copied images from my phones, cameras, etc. to my HDDs, which folders did I touch, if I reorganized existing files into a different folder structure then what are the changes, etc. Also it acts as a rollback mechanism if I ever fat finger and delete something accidentally. I wonder if there's a better tool for this though
Then I think some syncing software like rsync will probably be better. Now sure how often you keep changing archived folders. I split that work TRASH like dirs and archives. When I done w/ files, I move them out of TRASH do proper place and that it. I prefer KISS aproach, but whatever works for you :)
Why... not? Git just works for syncing data and version control and we're all familiar with it. It is also secure, reliable, available everywhere, decentralized, with built-in access control, deduplication, e2ee with gitcrypt... In short, it is great.
The problem is performance in some use cases, but I don't see anything fundamentally wrong with using git for sync.
Git wasnt designed for generic binary blob handling. Sure, if you repo is small and you set proper .gitattributes, it will work fine. But I would advise to use generic DVFS for such task.
Docs and other files you often change is completly different story. This is where DVFS shines. I wrote my own very simple DVFS exacly for that case. You just create directory, init repo manager.. and vioala.. Disk wide VFS is kinda useless as most of your data there just sits..