Good pointer! I did know about CVXPY but haven't used it yet. Though in requiring everything to be (log-)convex it constrains the user programs quite a bit, but I imagine it would have worked in this case because the objective is linear and the feasible set convex. I won't deny the role accident and force of habit had in picking this particular implementation path; I used Pytorch because it's my daily driver :D and because I had found 'mctorch' which provided the Riemann SGD logic.
It's incredible how many algorithms can be made differentiable end-to-end.
E.g. in https://arxiv.org/abs/1905.11885 "Differentiable ranks and sorting using optimal transport" they show how a relaxation of an array sorting procedure leads to differentiable rank statistics (quantiles etc.). The underlying theory is quite close to what I show in the blog post.
In DETR they don't actually use a differentiable Hungarian algorithm, it's only used to score their neural predictions outside of the training loop IIUC.
Yes, the problem can now be solved efficiently with a scikit one-liner, but the aim of this post was more speculative rather than beating the benchmark :)
Oh yeah that's sure, i wasn't trying to diminish the content, the opposite. as in, it was so complex and out of what i expected that i was lucky to have a library that could solve it for me
An interesting connection between combinatorial optimization and differential geometry. And some Pytorch too, because when all you have is a hammer..
I built most of this one year ago, but only now managed to write it up. I enjoyed the change of pace of a pure speculative experiment (with some deep connections to a bunch of interesting theory and applications), and I hope you will enjoy reading about it too.