Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
TMSU is a tool for tagging your files (tmsu.org)
209 points by subleq on Dec 12, 2014 | hide | past | favorite | 64 comments


I remember looking at this a few years ago. My main reason for not using it was the time it would have taken to tag all my files.

Basically, if you have so many files that it's worth tagging them, you need to spend a lot of time tagging, and if you have so few that it's reasonable to tag them all, you don't need this system. If you have a manageable number but expect tagging to be useful in the near future, then this might be useful.

I'd be interested in hearing from people who have found a use for TMSU.


Throwaway, but I've used it to organize downloaded porn. Downloaded videos tend to have terrible filenames, but reasonable titles or tags on the website. I have some scripts to scan through the filenames and automatically apply tags, but for the most part it still goes unused. It's a fun idea though.


A number of years ago my media filenames were getting out of hand and after a drive crash I decided to start naming all media files manually with a basic structure. It worked quite well and I've kept to the same basic idea for a number of years, with more than 10k files named this this way when initially saving the file.

It follows a fairly loose mental structure: Broad Category - [Sub-category -] [Series Title -] Name/Media Title (Creator, Source, Short Description/Tags)

Which usually looks something like:

* Keyboards - Keycaps - Warmaster (martinyeah, geekhack.org) - 2.jpg

* Paintings - Market In Cairo (Leopold Carl Müller, Middle Eastern desert).jpg

* Themes - Android - Themer (mycolorscreen.com) - Wrong Way (Victor Burgoa).png

* Trailers - Godzilla (2013) - Comic-Con Teaser.mp4

* Typefaces - Lettering - Gospel Waffle Sunday syrup logo time-lapse (Lauri Johnston).mp4

The basic idea has been to organize the filenames by broad to narrower info from left-to-right, which helps with quick filtering/searching and allows many categories of files to be kept in large single directories while being readable. By adding the creator and site where needed it also allows me to trace the source. Not a perfect naming system but it only takes a little longer to save while adding considerable value long-term.


I have a library of a lot of ebooks about a lot of subjects, I order them by folders, but is very common to have interesections on the topics they cover. I could tag all them with the folder name, and then manually tag the subtopics each book covers.

edit: too bad it isn't on ubuntu repositories. I've a personal policy to avoid programs outside the repositories


Try Calibre for managing ebooks, it has a built-in tag system.


I really like Calibre, but it's tag system is pretty clunky. I don't think I can even tag multiple books at once.


It's definitely clunky, but you might like to know that you can tag more than one thing at once:

Select more than one book, right click -> Edit metadata -> Edit metadata in bulk


Not only can you have multiple tags, you can have icons which are conditionally displayed based on tags. Tags are stored in a single metadata file, one directory per ebook, so can be parsed with a script when performing bulk operations.

You can then customize search engine like recoll to display ebook metadata as contextual snippets in full-text search results. Best of both worlds.


A lot of files that end up on our machines come from the internet, and, at least on OSX (in most browsers), downloaded files are tagged with origin URLs [0].

It's a small step from there to a companion tool that crawls your filesystem for URLs, and attempts to classify them based on a keyword analysis of the origin.

[0] https://code.google.com/p/understand/wiki/MacOSMetadata


For similar reasons, I just can't bring myself to use OSX's tagging feature; it just seems like so much effort without a good use case. I'm not sure it helps that the defaults are simply colours, rather than real examples. And I don't really want to think about how files are shared (server, dropbox, google drive, usb stick) and do all the necessary research into how tags work in each and every case. I too would be interested to hear if anyone's successfully found an effective use for file-based tagging.


OS X tagging is sometimes handy for tasks over a short time frame. Picking out photos to upload to FB? Scan through them all and hit cmd-6 on the ones you want, maybe cmd-7 on the ones you need to crop. Sure you could use Bridge, Aperture, etc. but having it built in to the OS is nice.


I rely heavily on spotlight to find files I want, and about 99% of the time it works perfectly. Sometimes there is a file I am going to keep that I suspect will be hard to track down using spotlight in the future; either because it has to be badly named (source code files), will not be easily indexible (scanned pdfs or images), or my interest in it is not directly connected to the content of the document (i.e. a description of an algorithm with an obscure name which I have a specific application in mind for).

In these cases I use the tags feature to augment the metadata so that I am pretty confident I will be able to find it quickly in future, even if I can't remember exactly what it was or why it was originally saved.


I tag all of my Firefox bookmarks. I've been meaning to export them out of Firefox and use some other kind of bookmark manager or write my own (perhaps integrating it in to Emacs).

Something like this might help, if I make each bookmark in to a separate file.



I read where you can import Firefox bookmark-Tags into Pinboard.


I haven't tried it yet, but at first glance, a lot of work and thought went into this tool: query parser, SQL, VFS layer, fingerprinting, file move detection ("repair"). The code looks pretty readable and well organised too.

Generally speaking, I can't help but feel we're not quite there yet when it comes to meta-data and flexible file system views. We're still coming up with all sorts of cataloguing systems for different types of binary files – I just implemented a Vorbis comment (music tag) parser the other day, and that's just one out of many formats.

Extended attributes (attr(5), "user_xattr") are enabled by default these days on most Linux file systems, but are limited in space and I don't see a lot of tool implementations around the functionality.

Sure there are some general blunt force search indexers, but overall I feel we still live in a Wild West meta-data age as opposed to the myriad of well standardised open data formats.

It would be nice to be able to use a (music, movie, picture, porn, whatever) site's existing meta-data when downloading a file, or hide all work-related files from the file system when you're at home.


> Extended attributes (attr(5), "user_xattr")

Reminder: some programs will reset these xattributes upon the file being edited. Back when I was looking for a tagging program for the Mac I read Photoshop was one of those programs.

I would not trust putting in the effort to tag a bunch of files in a format so out of my control.


What those programs are likely actually doing is not deliberately resetting the attributes, but saving a file by (1) writing to a new file, and (2) moving the new file over the old file (thus destroying anything about the old file that they aren't aware of in the process.)

Not sure if this helps figure out a way around it.


I'm currently working on https://keepallthethings.com which offers a similar service only web based with a GUI so it will either be more useful or less for you depending on your use case. If anyone wants to try it out go through the sign up process until you hit the credit card form and stop then send me an email(in my profile) and I'll set you up with a free account. Its just launched so there are a lot of features still on the roadmap instead of implemented but it'd be nice to have other peoples feed back/ideas for improvements as I go.


You seem to have a typo on the pricing page: On the 100GB plan it says "50c/GB for space over 50GB"


Thank you for that. I've been playing around with spacing pre-launch and missed that one.


The first use case that came to my mind has been the one I've had difficulty with for the longest - marking/unmarking my music files to sync with my Android phone. I've heard Rhythmbox is good for that but what I'm looking for is a window which shows all my music with those that are also on my phone marked somehow (maybe a differently colored icon or something like that). I should be able to right click or use a shortcut to mark/unmark the music and once done, the music on my phone is updated. Files previously present and now unmarked are deleted, files newly marked are copied.

I can see using TMSU for this with following approach:

1. Mount the VFS, to e.g. mount point "mp"

2. Using Nemo Actions (I use Linux Mint), call a script that upon executing on the selected file will do the following:

- toggles the "sync" tag (so all files with "sync" will show-up under "mp/tags/sync")

- If the file already has a "sync" tag, untag it (can it be done by doing an rm on "mp/tags/sync/<filename>"?)

- Based on tag being present or not, toggle file icon to visually indicate if the file is marked/unmarked. I don't know how this can be done.

3. Once done, we can run a simple rsync script that syncs "mp/tags/sync/*" to the sdcard on the phone.

Not sure when I will get around to actually implenting this :)


git-annex has some interesting tagging functionality. It uses trees of symlinks to provide the virtual filesystem: http://git-annex.branchable.com/metadata/ and http://git-annex.branchable.com/tips/metadata_driven_views/


I've always wanted this for Windows. I wonder how hard it would be to port it.

In a business setting, I often see people's desktops cluttered with files. It'd be nice to give them a virtual folder on the desktop where they can drag and drop files into in to it, with tags.


Hi, author here. Windows support is in progress.


I don't know how you plan on doing the VFS stuff on windows but back in the day when I still used windows (XP) I had a shell extension that let you tag a folder or file from the context menu. It would then be linked into a tag directory using either NTFS reparse points (aka junctions?) for directories or hardlinks for files.


Taggedfrog does something very similar on Windows, except it uses a drag and drop GUI instead of the shell. You can have ready made customized tag clusters (so you don't have to type the tags for each file), and it will tag anything, including web URLs.

http://lunarfrog.com/taggedfrog/

That said, I would like to be able to tag and recall from the command line in order to integrate tags with other tools. Windows already offers a limited tagging service for Office documents.


Without the virtual file system, does this work by hashing the files? That would make it very slow to use on larger files. If I was to tag my movie collection, would TMSU have to load every byte of every movie in order to show the file tags?


It seems that this also hashes the files to produce a fingerprint. Check out the fingerprinter [1] for the raw details on how it's done, but basically, if the file is below the "sparseFingerprintThreshold" of 5 MB, the whole file is hashed. Any file above is "sparsely hashed" where the first 512 KB, middle 512 KB, and end 512 KB are hashed and combined to produce the fingerprint for that file. It shouldn't produce any performance concerns on large files, so it should be safe to use for your movie collection.

[1] https://github.com/oniony/TMSU/blob/master/src/tmsu/common/f...


You may want to check out the repair command. It seems to show that TMSU keeps track of file locations on disk. If you move a file, it can find it if it hasn't been modified. If you move and modify a file, it can't find it and will mark it missing.

https://github.com/oniony/TMSU/blob/master/src/tmsu/cli/repa...


Ah, I see, thanks. The FAQ also explains some of the strengths and weaknesses of this approach:

https://github.com/oniony/TMSU/wiki/FAQ


I wonder why he doesn't use EA's where available (or does he?)


What is EA?


Probably Extended Attributes (http://en.m.wikipedia.org/wiki/Extended_file_attributes)

They stick better with files, as long as your tools know about them and handle them fine. Weakness is that they can (will) get lost in tools that do not. There are many of them: zip-unzip, tar-untar, http requests, ftp, etc.


> Weakness is that they can (will) get lost in tools that do not.

Including, frustratingly, Linux's NFS implementation. A WONTFIX apparently.


Ah. Also, depending on the FS, they are probably too limited, e.g. in size (only 4kb for ext2/3/4).


Tags or folders - Aren't both of them mere kludges that we use when search isn't good enough?


Funny, I'd put it the other way around. Isn't searching what we do when we don't know exactly where something is?

And – admitting I might not be a stereotypical computer user as my files have a high level of organisation – isn't searching the high-level kludge that fails relatively often, especially with hard to parse binary information?

I couldn't imagine relying more on meta-search than knowledge of where my files are located. A tag tool might be nice though, as it enables you to create an alternative hierarchy more suitable to a specific activity without completely abstracting away the physical layout.


Tags and folders are a way to structure and organize your stuff, not a kludge. Tagging things, or placing them in containers (possibly also containing other containers) are valuable to add metadata and semantics to the entities.

In a way, placing things in folders is some kind of (hierarchical) tagging, in which those 'tags' (folder and file names) form a traversable path to the entity. Adding tags to entities adds another retrieval path to the objects.


Folders/directories are a necessity as long as we continue to distribute collections of files the way we do. I'm not sure what github would look like in a world without directories, nor how file naming would work. Do you have a proposal for the issues that doing away with directories brings up?


Folders are only a human factor necessity. Tags forming a graph of related files are more capable, but harder to manage mentally since they can reach any part of the system and cross boundaries.


Search doesn't help when you don't remember filename.


Sure it does when the search also investigates the contents of the files.

For example, GMail and Outlook do this for mail items -- you don't need to know the subject, which is comparable to the file name, to retrieve an email -- and Google Desktop Search (discontinued) or Copernic does this for file systems. It is almost as if every system-retrievable item inside a file (e.g. every word in a text document) automatically becomes a searchable item of the file.


The Outlook is a prime example for situation when pure search doesn't help. When I'm looking for some mail 1+ year old I sometimes don't even remember one correct word in it. Then I resort to some kind of tag system - folders, timestamps etc. And some times I don't find it at all.


So far this only works with text-based files. For videos or photos, there's usually not much good metadata or machine-readable file content.


In an ideal world it should be enough to remember a couple of key words in the file, and possibly some relevant metadata.

In an even more ideal world I should be able to search for "all pictures of my daughter from last Christmas, unless they are out of focus".


I tag directories, not files. Way more sustainable over time, because there is no way I'm going to tag every single file.

Files tend to be better at describing themselves. To take an example from the TMSU docs, why would I ever need to tag an .mp3 file as music? If it's an MP3, and it's under my music directory, it's pretty clear what it is.


> If it's an MP3, and it's under my music directory, it's pretty clear what it is.

Putting files in a directory is semantically equivalent to applying a tag. But directory structure is no way fundamental to computing, and indeed tagging is more general and flexible solution to organizing files.


>Putting files in a directory is semantically equivalent to applying a tag.

Yes, equivalent to a single tag. Not multiple tags.


To predict where is something, I use an hierarchical folder system to organise my documents based on GTD [1]. The initial investment (file renaming) pays off on the long run.

[1] https://github.com/we-build-dreams/folder-system


This is really cool, the virtual filesystem and query abilities makes it really powerful.


How about auto tagging of you files? http://cloudfindhq.com/

We actually automatically tag files in Dropbox and Google Drive, so there's no need to move files

disclaimer: I work here!


Too bad WinFS was canceled. Seems like no GUI support for any of these tools :(


TagSpaces has excellent GUI support.


That looked excellent until I read that it stores the tags in the filename. That's just... completely not what I'm looking for, or would use, ever.


Yes, it's a trade-off; or you keep the tags in a database on which you are dependent or you store everything in the file name or the file itself. Tagging inside the file can only be done if the filetype has support for it (exim or mp3 tags) and you have to program support for the different filetypes. Keeping it in a database necessitates a sync step between moving and renaming actions in the filesystem and the database itself and has to be programmed and tested for every OS. By keeping the tags in the filename, this way this tag system still keeps working without the tooling itself and across operating systems, but you have to give up some freedom in the naming of your files (so it doesn't really work on DOS). Why wouldn't you use such a system ever? People accept metadata in the file itself, the file name usualy describes the data - so why not use tags in the filename other then that it seems ugly hack? How are the alternatives better?


Was thinking "I like this but no Windows" so thanks for referencing TagSpaces, gonna try it right now


Thanks, I downloaded it.


It would be nice if this app allowed regular expression searches of tags and spaces in tag names.


I like a mix of hierarchy and this, glad to see somebody is trying.


Hm. How could this be used efficiently for real work? I think because many of our tools are already file-based, this is not the greatest idea for uses cases that require structure. If anything tagging should be used along side hierarchical layouts...Or users should follow the standard of what /etc, /home, /lib and friends are used for. They are essentially tags in directory form.

Take TMSU's example command:

tmsu tag summer.mp3 music big-jazz mp3

We could come up with the same behavior WITHOUT an additional program:

mkdir ~/music/big-jazz/mp3/

ln -s ~/Downloads/summer.mp3 ~/music/big-jazz/mp3/summer.mp3

Could easily make this into a shell script.

Just my 2 cents.


This quickly becomes cumbersome when you have dozens or hundreds of tags, and doesn't address tag values.


The other problem is when a file can be in multiple categories: say you're sorting photos into folders based on who is in the photo.

alice.jpg goes into alice/, and bob.jpg goes into bob/, but where do you put groupshot.jpg?


Both?

Or a separate category?

Any grouping system would let you do both.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: