Hacker Newsnew | past | comments | ask | show | jobs | submit | ilyakaminsky's commentslogin

I use Gemini CLI on a daily basis. It used to crash often and I'd lose the chat history. I found this tool called ai-cli-log [1] and it does something similar out of the box. I don't run Gemini CLI without it.

[1] https://github.com/alingse/ai-cli-log


How can I submit my service to your website? Is there a simpler way than creating a PR here? https://github.com/Klavis-AI/klavis/tree/main/mcp_servers


Yes, submit a PR there and we will merge it if the quality of your server is good and we think it will be helpful to our communities. You can read our contributing guide for more detail: https://github.com/Klavis-AI/klavis/blob/main/CONTRIBUTING.m....


Shameless plug -- check out speechischeap.com

I spent three months perfecting the speaker diarization pipeline and I think you'll be quite pleased with the results.


How well does it work with multiple languages?


TIL, thanks! I asked Claude to generate a simulator [1] based on your comment. I think it came out well.

[1] https://claude.ai/public/artifacts/1b921a50-897e-4d9e-8cfa-0...


Righteous!


Fast is also cheap. Especially in the world of cloud computing where you pay by the second. The only way I could create a profitable transcription service [1] that undercuts the rest was by optimizing every little thing along the way. For instance, just yesterday I learned that the image size I've put together is 2.5× smaller than the next open source variant. That means faster cold boots, which reduces the cost (and providers a better service).

[1] https://speechischeap.com


ive approached the same thing but slightly differently. i can run it on consumer hardware for vastly cheaper than the cloud and don't have to worry about image sizes at all. (bare metal is 'faster') offering 20,000 minutes of transcription for free up to the rate limit (1 Request Every 5 Seconds)

https://geppetto.app

I contributed "whisperfile" as a result of this work:

* https://github.com/Mozilla-Ocho/llamafile/tree/main/whisper....

* https://github.com/cjpais/whisperfile

if you ever want to chat about making transcription virtually free or so cheap for everyone let me know. I've been working on various projects related to it for a while. including open source/cross-platform superwhisper alternative https://handy.computer


> i can run it on consumer hardware for vastly cheaper than the cloud

Woah, that's really cool, CJ! I've been toying the with idea of standing up a cluster of older iPhones to run Apple's Speech framework. [1] The inspiration came from this blog post [2] where the author is using it for OCR. A couple of things are holding me back: (1) the OSS models are better according to the current benchmarks and (2) I have customers all over the world, so that geographical load-balancing is a real factor. With that said, I'll definitely spend some time checking out your work. Thanks for sharing!

[1] https://developer.apple.com/documentation/speech

[2] https://terminalbytes.com/iphone-8-solar-powered-vision-ocr-...


ty! if there's any way I can help just lmk, always happy to lend a hand or an ear


Is S3 slow or fast? It’s both, as far as I can tell and represents a class of systems (mine included) that go slow to go fast.

S3 is “slow” at the level of a single request. It’s fast at the level of making as many requests as needed in parallel.

Being “fast” is sometimes critical, and often aesthetic.


We have common words for those two flavors of “fast” already: latency and throughput. S3 has high latency (arguable!), but very very high throughput.


Fast is cheap everywhere. The only reasons software isn’t faster:

* developer insecurity and pattern lock in

* platform limitations. This is typically software execution context and tool chain related more than hardware related

* most developers refuse to measure things

Even really slow languages can result in fast applications.


Yep. I'm hoping that installed copies of PAPER (at least on Linux) will be somewhere under 2MB total (including populating the cache with its own dependencies etc). Maybe more like 1, although I'm approaching that line faster than I'd like. Compare 10-15 for pip (and a bunch more for pipx) or 35 for uv.


Well said. And it's not just the cloud. We self-host at my job and there are real cost savings to speed here too. Being able to continue using an old server for another year and having your staff be just a little more efficient adds up quickly.


Fast doesn't necessarily mean efficient/lightweight and therefore cheaper to deploy. It may just mean that you've thrown enough expensive hardware at the problem to make it fast.


Your CSS is broken fyi


Not in development and maintenance dollars it's not


Hmm… That's a good point. I recall a few instances where I went too far to the detriment of production. Having a trusty testing and benchmarking suite thankfully helped with keeping things more stable. As a solo developer, I really enjoy the development process, so while that bit is costly, I didn't really consider that until you mentioned it.


I wouldn't describe it as "unusable" so much as needing to understand its constraints and how to work around them. I built a business on top of Whisper [1] and one of the early key insights was to implement a good voice activity detection (VAD) model in order to reduce Whisper's hallucinations on silence.

[1] https://speechischeap.com


How does this make a profit? Whisper should be $0.006 to $0.010 per minute, but you rate less than $0.001? Do you 10x the audio?


Thanks for noticing. It took a lot of effort to optimize the pipeline every step of the way. VAD, inference server, hardware optimization, etc. But nothing that would compromise on quality. The audio is currently transcribed in its original speed. I'll be sure to publish something if I manage to speed it up without incurring any losses to the WER.


I've already done that [1]. A fraction of the price, 24-hour limit per file, and speedup tricks like the OP's are welcome. :)

[1] https://speechischeap.com


Nice. Don't expect you to spill the beans but is it doing OK (some customers?)

Just wondering if I cam build a retirement out of APIs :)


It's sustainable, but not enough to retire on at this point.

> Just wondering if I cam build a retirement out of APIs :)

I think it's possible, but you need to find a way to add value beyond the commodity itself (e.g., audio classification and speaker diarization in my case).


Can it do real-time transcription with diarization? I'm looking for that for a product feature I'm working on. Currently I've seen Speechmatics do this well, haven't heard of others.


Not yet. The gains in efficiency come from optimizing the speedup factor. Real-time audio cannot be processed any faster than 1× by definition.


This looks promising! Are you considering adding GPU support? It would be great if I could spin up a CUDA-enabled container right on Cloudflare, as opposed to using a Cloudflare worker to spin up a serverless GPU container on some other cloud provider.


I've yet to try jj, but in git, my flow is to start a new feature with a WIP commit and then to `--amend` it with every change. I usually have a running TODO list in the commit message body that I check off along the way.


Gitmoji has been around for eight years now. https://gitmoji.dev/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: