ilyakaminsky's comments

ilyakaminsky · 2025-11-14T02:24:17 1763087057

I use Gemini CLI on a daily basis. It used to crash often and I'd lose the chat history. I found this tool called ai-cli-log [1] and it does something similar out of the box. I don't run Gemini CLI without it.

[1] https://github.com/alingse/ai-cli-log

ilyakaminsky · 2025-09-24T01:20:19 1758676819

How can I submit my service to your website? Is there a simpler way than creating a PR here? https://github.com/Klavis-AI/klavis/tree/main/mcp_servers

wirehack · 2025-09-24T01:34:42 1758677682

Yes, submit a PR there and we will merge it if the quality of your server is good and we think it will be helpful to our communities. You can read our contributing guide for more detail: https://github.com/Klavis-AI/klavis/blob/main/CONTRIBUTING.m....

ilyakaminsky · 2025-08-19T12:22:13 1755606133

Shameless plug -- check out speechischeap.com

I spent three months perfecting the speaker diarization pipeline and I think you'll be quite pleased with the results.

diamondage · 2025-08-19T20:14:06 1755634446

How well does it work with multiple languages?

ilyakaminsky · 2025-07-30T20:45:58 1753908358

TIL, thanks! I asked Claude to generate a simulator [1] based on your comment. I think it came out well.

[1] https://claude.ai/public/artifacts/1b921a50-897e-4d9e-8cfa-0...

mindcrime · 2025-07-30T23:21:42 1753917702

Righteous!

ilyakaminsky · 2025-07-30T19:54:30 1753905270

Fast is also cheap. Especially in the world of cloud computing where you pay by the second. The only way I could create a profitable transcription service [1] that undercuts the rest was by optimizing every little thing along the way. For instance, just yesterday I learned that the image size I've put together is 2.5× smaller than the next open source variant. That means faster cold boots, which reduces the cost (and providers a better service).

[1] https://speechischeap.com

sipjca · 2025-07-30T21:42:53 1753911773

ive approached the same thing but slightly differently. i can run it on consumer hardware for vastly cheaper than the cloud and don't have to worry about image sizes at all. (bare metal is 'faster') offering 20,000 minutes of transcription for free up to the rate limit (1 Request Every 5 Seconds)

https://geppetto.app

I contributed "whisperfile" as a result of this work:

* https://github.com/Mozilla-Ocho/llamafile/tree/main/whisper....

* https://github.com/cjpais/whisperfile

if you ever want to chat about making transcription virtually free or so cheap for everyone let me know. I've been working on various projects related to it for a while. including open source/cross-platform superwhisper alternative https://handy.computer

ilyakaminsky · 2025-07-30T23:12:23 1753917143

> i can run it on consumer hardware for vastly cheaper than the cloud

Woah, that's really cool, CJ! I've been toying the with idea of standing up a cluster of older iPhones to run Apple's Speech framework. [1] The inspiration came from this blog post [2] where the author is using it for OCR. A couple of things are holding me back: (1) the OSS models are better according to the current benchmarks and (2) I have customers all over the world, so that geographical load-balancing is a real factor. With that said, I'll definitely spend some time checking out your work. Thanks for sharing!

[1] https://developer.apple.com/documentation/speech

[2] https://terminalbytes.com/iphone-8-solar-powered-vision-ocr-...

sipjca · 2025-08-01T00:54:12 1754009652

ty! if there's any way I can help just lmk, always happy to lend a hand or an ear

mlhpdx · 2025-07-30T20:20:44 1753906844

Is S3 slow or fast? It’s both, as far as I can tell and represents a class of systems (mine included) that go slow to go fast.

S3 is “slow” at the level of a single request. It’s fast at the level of making as many requests as needed in parallel.

Being “fast” is sometimes critical, and often aesthetic.

claytonjy · 2025-07-30T20:37:10 1753907830

We have common words for those two flavors of “fast” already: latency and throughput. S3 has high latency (arguable!), but very very high throughput.

austin-cheney · 2025-07-31T11:58:30 1753963110

Fast is cheap everywhere. The only reasons software isn’t faster:

* developer insecurity and pattern lock in

* platform limitations. This is typically software execution context and tool chain related more than hardware related

* most developers refuse to measure things

Even really slow languages can result in fast applications.

zahlman · 2025-07-30T22:08:04 1753913284

Yep. I'm hoping that installed copies of PAPER (at least on Linux) will be somewhere under 2MB total (including populating the cache with its own dependencies etc). Maybe more like 1, although I'm approaching that line faster than I'd like. Compare 10-15 for pip (and a bunch more for pipx) or 35 for uv.

Breza · 2025-08-01T15:48:43 1754063323

Well said. And it's not just the cloud. We self-host at my job and there are real cost savings to speed here too. Being able to continue using an old server for another year and having your staff be just a little more efficient adds up quickly.

HarHarVeryFunny · 2025-07-30T21:30:21 1753911021

Fast doesn't necessarily mean efficient/lightweight and therefore cheaper to deploy. It may just mean that you've thrown enough expensive hardware at the problem to make it fast.

b_e_n_t_o_n · 2025-07-30T20:19:58 1753906798

Your CSS is broken fyi

willsmith72 · 2025-07-30T21:00:23 1753909223

Not in development and maintenance dollars it's not

ilyakaminsky · 2025-07-30T21:21:47 1753910507

Hmm… That's a good point. I recall a few instances where I went too far to the detriment of production. Having a trusty testing and benchmarking suite thankfully helped with keeping things more stable. As a solo developer, I really enjoy the development process, so while that bit is costly, I didn't really consider that until you mentioned it.

ilyakaminsky · 2025-07-22T11:18:10 1753183090

I wouldn't describe it as "unusable" so much as needing to understand its constraints and how to work around them. I built a business on top of Whisper [1] and one of the early key insights was to implement a good voice activity detection (VAD) model in order to reduce Whisper's hallucinations on silence.

[1] https://speechischeap.com

poly2it · 2025-07-22T14:11:01 1753193461

How does this make a profit? Whisper should be $0.006 to $0.010 per minute, but you rate less than $0.001? Do you 10x the audio?

ilyakaminsky · 2025-07-22T18:08:28 1753207708

Thanks for noticing. It took a lot of effort to optimize the pipeline every step of the way. VAD, inference server, hardware optimization, etc. But nothing that would compromise on quality. The audio is currently transcribed in its original speed. I'll be sure to publish something if I manage to speed it up without incurring any losses to the WER.

ilyakaminsky · 2025-06-25T18:39:37 1750876777

I've already done that [1]. A fraction of the price, 24-hour limit per file, and speedup tricks like the OP's are welcome. :)

[1] https://speechischeap.com

bravesoul2 · 2025-06-25T21:55:29 1750888529

Nice. Don't expect you to spill the beans but is it doing OK (some customers?)

Just wondering if I cam build a retirement out of APIs :)

ilyakaminsky · 2025-06-26T07:04:42 1750921482

It's sustainable, but not enough to retire on at this point.

> Just wondering if I cam build a retirement out of APIs :)

I think it's possible, but you need to find a way to add value beyond the commodity itself (e.g., audio classification and speaker diarization in my case).

satvikpendem · 2025-06-27T02:21:03 1750990863

Can it do real-time transcription with diarization? I'm looking for that for a product feature I'm working on. Currently I've seen Speechmatics do this well, haven't heard of others.

ilyakaminsky · 2025-07-08T16:16:08 1751991368

Not yet. The gains in efficiency come from optimizing the speedup factor. Real-time audio cannot be processed any faster than 1× by definition.

ilyakaminsky · 2025-04-12T14:24:48 1744467888

This looks promising! Are you considering adding GPU support? It would be great if I could spin up a CUDA-enabled container right on Cloudflare, as opposed to using a Cloudflare worker to spin up a serverless GPU container on some other cloud provider.

ilyakaminsky · on Dec 24, 2024

I've yet to try jj, but in git, my flow is to start a new feature with a WIP commit and then to `--amend` it with every change. I usually have a running TODO list in the commit message body that I check off along the way.

ilyakaminsky · on July 22, 2024

Gitmoji has been around for eight years now. https://gitmoji.dev/