I use Gemini CLI on a daily basis. It used to crash often and I'd lose the chat history. I found this tool called ai-cli-log [1] and it does something similar out of the box. I don't run Gemini CLI without it.
Fast is also cheap. Especially in the world of cloud computing where you pay by the second. The only way I could create a profitable transcription service [1] that undercuts the rest was by optimizing every little thing along the way. For instance, just yesterday I learned that the image size I've put together is 2.5× smaller than the next open source variant. That means faster cold boots, which reduces the cost (and providers a better service).
ive approached the same thing but slightly differently. i can run it on consumer hardware for vastly cheaper than the cloud and don't have to worry about image sizes at all. (bare metal is 'faster') offering 20,000 minutes of transcription for free up to the rate limit (1 Request Every 5 Seconds)
if you ever want to chat about making transcription virtually free or so cheap for everyone let me know. I've been working on various projects related to it for a while. including open source/cross-platform superwhisper alternative https://handy.computer
> i can run it on consumer hardware for vastly cheaper than the cloud
Woah, that's really cool, CJ! I've been toying the with idea of standing up a cluster of older iPhones to run Apple's Speech framework. [1] The inspiration came from this blog post [2] where the author is using it for OCR. A couple of things are holding me back: (1) the OSS models are better according to the current benchmarks and (2) I have customers all over the world, so that geographical load-balancing is a real factor. With that said, I'll definitely spend some time checking out your work. Thanks for sharing!
Yep. I'm hoping that installed copies of PAPER (at least on Linux) will be somewhere under 2MB total (including populating the cache with its own dependencies etc). Maybe more like 1, although I'm approaching that line faster than I'd like. Compare 10-15 for pip (and a bunch more for pipx) or 35 for uv.
Well said. And it's not just the cloud. We self-host at my job and there are real cost savings to speed here too. Being able to continue using an old server for another year and having your staff be just a little more efficient adds up quickly.
Fast doesn't necessarily mean efficient/lightweight and therefore cheaper to deploy. It may just mean that you've thrown enough expensive hardware at the problem to make it fast.
Hmm… That's a good point. I recall a few instances where I went too far to the detriment of production. Having a trusty testing and benchmarking suite thankfully helped with keeping things more stable. As a solo developer, I really enjoy the development process, so while that bit is costly, I didn't really consider that until you mentioned it.
I wouldn't describe it as "unusable" so much as needing to understand its constraints and how to work around them. I built a business on top of Whisper [1] and one of the early key insights was to implement a good voice activity detection (VAD) model in order to reduce Whisper's hallucinations on silence.
Thanks for noticing. It took a lot of effort to optimize the pipeline every step of the way. VAD, inference server, hardware optimization, etc. But nothing that would compromise on quality. The audio is currently transcribed in its original speed. I'll be sure to publish something if I manage to speed it up without incurring any losses to the WER.
It's sustainable, but not enough to retire on at this point.
> Just wondering if I cam build a retirement out of APIs :)
I think it's possible, but you need to find a way to add value beyond the commodity itself (e.g., audio classification and speaker diarization in my case).
Can it do real-time transcription with diarization? I'm looking for that for a product feature I'm working on. Currently I've seen Speechmatics do this well, haven't heard of others.
This looks promising! Are you considering adding GPU support? It would be great if I could spin up a CUDA-enabled container right on Cloudflare, as opposed to using a Cloudflare worker to spin up a serverless GPU container on some other cloud provider.
I've yet to try jj, but in git, my flow is to start a new feature with a WIP commit and then to `--amend` it with every change. I usually have a running TODO list in the commit message body that I check off along the way.
[1] https://github.com/alingse/ai-cli-log