Hacker Newsnew | past | comments | ask | show | jobs | submit | ByronBates's commentslogin

You could try the alternative version of the database-generator: https://github.com/jmforsythe/Git-Heat-Map/pull/6 - it shouldn't crash.


Computing diffs is what takes large amounts of time as the object database is used intensively along with limited efficiency of object caches.

I couldn't resist and threw `gitoxide` at it, and it turned out to be more than 2x as fast (even though it uses way more CPU to do that, there is definitely room for improvement).

The PR which adds the `db-gen` program: https://github.com/jmforsythe/Git-Heat-Map/pull/6


> This is fast enough for small repositories, but as the repository increases in size the overhead of parsing these plain-text files to get the graph relationships becomes too expensive. Even the fact that we need a binary search to locate the object within the packfile begins to add up.

From my experience and measurements as the author of `gitoxide` both the parsing of commits as well as the binary search to find the object are negligible costs here. The majority of the time is spent in `zlib` which has to inflate objects and deltas prior to use. This makes me wonder what `zstd` would do to these kinds of workloads.

In any case, the object database decoding performance is the bottleneck when traversing commits, and one can expect to get about 120000 commits per second on the linux kernel pack on a modern CPU core like M1 with `gitoxide`s implementation.

Besides that, I find the explanation of the commit-graph file interesting and how it relates to commit-graph queries. There is so much to learn, and so much still to implement :).


I wonder if zlib-ng would make a difference, since it has a lot of optimizations for modern hardware.

https://github.com/zlib-ng/zlib-ng/discussions/871


That number is already based on using `zlib-ng`. Sometimes I wonder if it's cheating since `git` might not actually use it. In any case, improving the efficiency of `zlib` along with the Rust integration for it has immediate impact on the git object database performance of `gitoxide`.


gitoxide is a cool project, but it cant even clone yet:

https://github.com/Byron/gitoxide/discussions/300


`gitoxide` can currently perform many of the tasks required, and closing the loop for a full clone with worktree is going to happen at the end of this year.

That said, here is how to receive a pack from the remote and resolve it: `mkdir out && gix -v no-repo pack receive https://github.com/Byron/gitoxide out`.


It's a project goal to make the API so accessible and easy to use, that's it's straightforward to quickly throw together your very own custom commands that do exactly what you need. This should allow building these slot-in commands as the need arises, and without breaking into sweat.


Let me reply with the most tedious parts, as I wouldn't want to make the impression of labelling git as 'bad' - I have been fascinated by it for more than 10 years now and it's time to scratch my itch for good.

Please note that everything I name can certainly be fixed by some tool that already exists, and that's great. It's just that I would like to have 'this one tool' that gets most things right and be happy with it alone.

That said, here is my list:

* adding individual files to the staging area/index

* looking at the commit history and copying individual commit hashes

* pulling while assuring no merge commit is created, or: simplified trunk-based development

* resumable clones

* seeing what's going on during all remote operations - here in China GitHub is incredibly slow most of the time and connections can be flaky


Thanks! That's really interesting to see that only 2 of your list (around adding and pulling) were on my radar as git usability issue, and not even in my top five!

The third one is a bit unclear to me:

> looking at the commit history and copying individual commit hashes

What do you mean exactly by that?


I frequently use `tig` to see the commit history, and pull out a single commit hash for use in `git checkout` or to reference it elsewhere. Probably it's just me not knowing all the `tig` hotkeys to make that easy, but it's my hope to have `gitoxide` as an obvious way to do things at least I need commonly.


As this is part of the every-day workflow, I will think long and hard on how to make it as painless and user-friendly as possible. Having a more user-friendly experience on the command-line is one of the project goals, and it will also be me benefiting from it so the motivation will be high to achieve it. By then, that thinking process will hopefully be public enough to allow more people to chime in and get a better result that way. And yes, I think a lot is possible, and I can't wait to test some ideas of mine :)


I'm totally looking forward to that then! :)


I am the author, and will be here for a few hours in case there are any questions.

(proof: https://keybase.io/byronbates)


I'm currently using https://github.com/rust-lang/git2-rs to interact with git in my software. Are you aiming to have a stable api that could act as a pure-rust replacement for that at some point in the future?


Absolutely! My plan is to proclaim a version 1.0 once the basic workflow of clone-commit-push can be performed, which should be enough for many to start using it. From that point on, the API should remain stable, following semantic versioning as usual.

It is my hope that the quality and stability will be high enough to convince people [in the Rust ecosystem] to move away from libgit2 and instead contribute to the maintenance and development of gitoxide, rewarding everyone with better performance and improved usability.


This sounds great, I added support for signing git commit's in git2-rs and would be happy to try to do something similar in gitoxide if I start to use it.


I might have missed your attention to this thread, but I'll ask and hopefully you'll be able to check later. I have to deal with a monolithic codebase that is of excessive size (10's of millions of lines of code) and has a very large number of developers committing against it.

I haven't looked at the internals of Git much, but performance becomes an issue with it in this context (we currently use a different vcs), and switching seems impractical at the moment. Issues from local performance (30 seconds or so for git status, git checkout, etc) and clone/fetch speeds.

I'm curious, given your experience on porting, have you noticed areas where we might be able to make improvements that might be easier based on Rust's safety improvements? Such as more threading or other areas that could improve Git for use with very large code bases?

(I plan to review the project as this would be an exciting area to work in to help with productivity in this space)


That sounds like such a project should definitely see improvements when using GitOxide. Even in projects of the size of the Linux Kernel checkouts can take time as those only reach about 3000 files per second. You will see only 70% of a single core being busy and a lot of kernel overhead. On my machine, using all the 4 cores, this number can be 7500 when checking out in parallel, something that `gitoxide` is definitely going to do from day one. After all, there is no value in just being as fast as git.

The same applies when checking for changes on disk - right now git does not do that in parallel. These are low hanging fruit that I plan to pick for my own sake.

There is another contributor who is very interested in increasing pack performance, which will directly impact fetch and pull speeds. Judging from my experience with packs thus far, I believe a lot of options are still to be tapped in that field as well. Rust will open it up for contributions and experiments to a greater audience, so I would hope that this will go way beyond of what I can do.

Even though this project is not very contributor friendly right now, I will be working on improving this so more people can join in (see https://github.com/Byron/gitoxide/issues/8).


I’m excited. Thanks for the response. This is something I will try and find time to contribute to.


How does gitoxide perform compared to c-git? A slow operation I often encounter is `git log --graph` for example.


Unfortunately I can't yet tell, as the corresponding code does not yet exist. Object and pack lookup is competitively fast so I would hope that translates well to everyday operations like that.

Actual numbers for what's there, pack access and traversal, can be found here: https://github.com/Byron/gitoxide/issues/1 and here: https://github.com/Byron/gitoxide/issues/5


That operation is much faster in recent versions of Git, especially after a GC or “git commit-graph write” command.


Is native Windows support a goal?


Great point! Windows is already tested on CI, but I used the opportunity to make Windows support explicit in the project goals.


Hi Byron.

What in your opinion would be a good place for someone to start contributing to this?


Right now, the project is clearly missing contribution guidelines, but now that it's a bit more public these will be added soon, possibly along with some tickets that are ready for pickup. For now I am very focussed on implementation. That said, I have created a quick issue for you with a small tasks and high value: https://github.com/Byron/gitoxide/issues/7 Thanks a lot for your consideration <3


Great project - more power to you for driving it so far.

My 2 cents. I think rust-analyzer offers a great new dev onboarding experience. issues labelled with has-instructions usually have a a link to the relevant snippet of code in the repo and sometimes even the stub of a failing unit test. While it might be obvious to you, the author, how and where to find the necessary code, it enables a great many people to send patches.

Witness the number of contributors with <10 commits https://github.com/rust-analyzer/rust-analyzer/graphs/contri...

If you want to increase adoption, please borrow as many dev experience tricks from rust-analyzer as possible


Do you think you'd end up creating C APIs, to make this competitive with libgit2?


I am not interested in competing with libgit2, and have no plans in adding and maintaining C-bindings to the project. This doesn't mean, however, that other parties couldn't maintain them out-of repo at first, and once things stabilize, it just makes sense to make them part of an organization that maintains it along with the project. After all, C-bindings open it up to be used in interpreters, which might be especially interesting for users of `GitPython`. It has been around for a decade and is definitely in for an alternative (I am the author and maintainer of that one, too).


The same author also wrote hyperfine, a tool to compare performance of various program runs.

    hyperfine './target/release/hexyl ./target/release/hexyl' 'xxd ./target/release/hexyl' 'hexdump ./target/release/hexyl'
    Benchmark #1: ./target/release/hexyl ./target/release/hexyl
      Time (mean ± σ):      1.529 s ±  0.028 s    [User: 1.476 s, System: 0.050 s]
      Range (min … max):    1.491 s …  1.581 s    10 runs

    Benchmark #2: xxd ./target/release/hexyl
      Time (mean ± σ):      70.5 ms ±   0.5 ms    [User: 68.0 ms, System: 1.2 ms]
      Range (min … max):    69.5 ms …  72.3 ms    41 runs

    Benchmark #3: hexdump ./target/release/hexyl
      Time (mean ± σ):     262.4 ms ±   2.8 ms    [User: 260.1 ms, System: 1.5 ms]
      Range (min … max):   259.8 ms … 268.8 ms    11 runs

    Summary
      'xxd ./target/release/hexyl' ran
        3.72 ± 0.05 times faster than 'hexdump ./target/release/hexyl'
       21.70 ± 0.43 times faster than './target/release/hexyl ./target/release/hexyl'
Currently hexyl seems nearly 22x slower than xxd.


... and I have already used hyperfine to benchmark hexyl as well :-)

Yes, it's a shame. But I don't think there is too much we can do about it. We have to print much more to the console due to the ANSI escape codes and we also have to do some conditional checks ON EACH BYTE in order to colorize them correctly. Surely there are some ways to speed everything up a little bit, but in the end I don't think its a real issue. Nobody is going to look at 1MB dumps in a console hex viewer (that's 60,000 lines of output!) without restricting it to some region. And if somebody really wants to, he can probably spare 1.5 seconds to wait for the output :-)


We have to print much more to the console due to the ANSI escape codes and we also have to do some conditional checks ON EACH BYTE in order to colorize them correctly.

A few extra comparisons and output for each byte shouldn't be that much slower; fortunately the function of this program is extremely well-defined, so we can calculate some estimates. Assuming a billion instructions per second, taking ~1.5s to hexdump ~1 million bytes means each byte is consuming ~1500 instructions to process. In reality the time above is probably on a faster CPU, so that number maybe 2-3x more. That is a shockingly high number just to split a byte into two nybbles (expected to be 1-3 instructions), convert the nybbles into ASCII (~3 instructions), and decide on the colour (let's be very generous and say ~100 instructions.)

The fact that the binary itself is >1MB is also rather surprising, especially given that the source (not familiar with Rust, but still understandable) seems quite small and straightforward.


Rust binaries can be large because unlike C, the standard library is statically linked, as well as jemalloc. Jemalloc will no longer be the default as of the next release, so that will shave off ~300k...


What's replacing Jemalloc?


The system malloc implementation. Users who want to use jemalloc have to opt in, but doing so is relatively easy (using the jemallocator crate from crates.io).


Why was this done?

Did rust become less dependent on allocator performance, or did system allocators improve enough? IIRC glibc malloc has improved a lot over the last few years, particularly for multithreaded use, but I don't know about windows / macOS.


So, long ago, Rust actually had a large, Erlang-like runtime. So jemalloc was used. Over time, we shed more and more of this runtime, but jemalloc stayed. We didn't have a pluggable allocator story, and so we couldn't really remove it without causing a regression for people who do need jemalloc. Additionally, jemalloc was already removed on some platforms for a long time; Windows has been shipping the system allocator for as long as I can remember.

So, now that we have a stable way to let you use jemalloc, the right default for a systems language is to use the system allocator. If jemalloc makes sense for you, you can still use it, but if not, you save a non-significant amount of binary size, which matters to a lot of people. See the parent I originally replied to for an example of a very common response when looking at Rust binary sizes.

It's really more about letting you choose the tradeoff than it is about specific improvements between the allocators.


It seems I was wrong. The new hexyl version is significantly faster (see my other comment)


You may be able to speed things up by using a lookup table instead of branching.

(If it's spending a lot of time in Rust's format function you could also use a (or the same) lookup table to convert to hex/dec/oct.)


The format function is going to end up allocating a string for every single byte. That's a huge overhead.

Edit: Turns out to be about 22% overhead, see https://github.com/sharkdp/hexyl/pull/23. Also it was 2 strings per byte, not 1.


Thanks to that PR, hexyl is now slightly faster than hexdump. Both are about a factor of 2-3 slower than xxd:

    Benchmark #1: hexyl $(which hexyl)
      Time (mean ± σ):     169.8 ms ±   8.2 ms    [User: 152.5 ms, System: 17.1 ms]
      Range (min … max):   162.2 ms … 189.1 ms    16 runs
     
    Benchmark #2: hexdump -C $(which hexyl)
      Time (mean ± σ):     188.5 ms ±   4.4 ms    [User: 186.2 ms, System: 2.2 ms]
      Range (min … max):   184.1 ms … 198.2 ms    14 runs
     
    Benchmark #3: xxd $(which hexyl)
      Time (mean ± σ):      72.8 ms ±   2.7 ms    [User: 71.9 ms, System: 1.1 ms]
      Range (min … max):    71.0 ms …  87.8 ms    40 runs


I made a little clone for fun and got a bit carried away optimising. Now at about 3x the speed of hexyl 0.3.1:

https://github.com/sjmulder/hxl

Most of the improvement came from not using printf, fputs, and putchar in favour of operating directly on an array for the line that can be fwritten in one call.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: