Hacker Newsnew | past | comments | ask | show | jobs | submit | d4rkp4ttern's commentslogin

Was a big fan of Handy until I found Hex, which, incredibly, has even faster transcription (with Parakeet V3), it’s MacOS only:

https://github.com/kitlangton/Hex


I tried this out but the brew command errors out saying it only works on macOS versions older than Sequoia.

That's unfortunate. I think I can update my version but I have heard some bad things about performance from the newer update from my elder brother.


works fine on my MacOS w Tahoe

May or may not be related to a surge in VPS hosting of OpenClaws on Hetzner. It’s a very popular option right now.

Yes people are too fixated on just the model. The real question for coding use cases is - does Gemini X + Gemini CLI outperform Opus + Claude Code? With 3.0 the answer was no. I won’t waste time checking 3.1 until I hear otherwise.

I ran into this question when thinking about the approach for a recent project. Yes CLI coding tools are good agents for interactive use, but if you are building a product then you do need an agent abstraction.

You could package Claude Code into the product (via agents-sdk or Claude -p) and have it use the API key (with metered billing) but in my case I didn’t find it ergonomic enough for my needs, so I ended up using my own agent framework Langroid for this.

https://github.com/langroid/langroid

(No it’s not based on that similarly named other framework, it’s a clean, minimal, extensible framework with good dx)


Big fan of handy and it’s cross platform as well. Parakeet V3 gives the best experience with very fast and accurate-enough transcriptions when talking to AIs that can read between the lines. It does have stuttering issues though. My primary use of these is when talking to coding agents.

But a few weeks ago someone on HN pointed me to Hex, which also supports Parakeet-V3 , and incredibly enough, is even faster than Handy because it’s a native MacOS-only app that leverages CoreML/Neural Engine for extremely quick transcriptions. Long ramblings transcribed in under a second!

It’s now my favorite fully local STT for MacOS:

https://github.com/kitlangton/Hex


I installed a few different STT apps at the same time that used Parakeet and I think they disagreed with each other. But Hex otherwise would’ve won for me I think. Wanna reformat the Mac & try again (been a while anyway).

My comment on this from a month back: https://news.ycombinator.com/item?id=46637040


Hex is great and not trying to pull you away from them - would love to get your pov when you give these a spin next time. email or DM me

I was having the same journey but landed on https://github.com/hoomanaskari/mac-dictate-anywhere

Speaking of audio + AI, here's a "learning hack" I've been trying with voice mode, and the 3 big AI labs still haven't nailed it:

While on a walk with mobile phone + earphones, dump an article/paper/HN-Post/github-repo into the mobile chat app (chat-gpt, claude or gemini), and use voice mode to have it walk you through it conversationally, so you can ask follow up questions during the walk-thru and the AI would do web-search etc. I know I could do something like this with NotebookLM, but I want to engage in the conversation, and NotebookLM does have interactive mode but it has been super-flaky to say the least.

I pay for ChatGPT Pro and the voice mode is really bad: it pretends to do web searches and makes up things, and when pushed says it didn't actually read the article. Also the voice sounds super-condescending.

Gemini Pro mobile app - similarly refuses to open links and sounds as if it's talking to a baby.

Claude mobile app was the best among these - the voice is very tolerable in terms of tone, but like the others it can't open links. I does do web searches, but gets some type of summaries of pages, and it doesn't actually go into the links themselves to give me details.


I have found that the "advanced voice mode" is dumb as a box of rocks compared to their "basic" TTS version, so I disable it. I've switched to Claude, so I don't know if that's still an option, but if you are tied to ChatGPT, definitely disable it if possible!

It's amazing how good open-weight STT and TTS have gotten, so there's no need to pay for Wispr Flow, Superwhisper, Eleven-Labs etc.

Sharing my setup in case it may be useful for others; it's especially useful when working with CLI agents like Code Code or Codex-CLI:

STT: Hex [1] (open-source), with Parakeet V3 - stunningly fast, near-instant transcription. The slight accuracy drop relative to bigger models is immaterial when you're talking to an AI. I always ask it to restate back to me what it understood, and it gives back a nicely structured version -- this helps confirm understanding as well as likely helps the CLI agent stay on track. It is a MacOS native app and leverages the CoreML/Neural Engine to get extremely fast transcription (I used to recommend a similar app Handy but it has frequent stuttering issues, and Hex is actually even faster, which I didn't think was possible!)

TTS: Kyutai's Pocket-TTS [2], just 100M params, and amazing speech quality (English only). I made a voice plugin [3] based on this, for Claude Code so it can speak out short updates whenever CC stops. It uses a combination of hooks that nudge the main agent to append a speakable summary, falling back to using a headless agent in case the main agent forgets. Turns out to be surprisingly useful. It's also fun as you can customize the speaking style and mirror your vibe and "colorful language" etc.

The voice plugin gives commands to control it:

    /voice:speak stop
    /voice:speak azelma (change the voice)
    /voice:speak prompt <your arbitrary prompt to control the style>
[1] Hex https://github.com/kitlangton/Hex

[2] Pocket-TTS https://github.com/kyutai-labs/pocket-tts

[3] Voice plugin for Claude Code: https://pchalasani.github.io/claude-code-tools/plugins-detai...


Same setup I’m using! Parakeet and pocket turbo. It’s feels good enough for daily usage.

Anyone know of something like Hex that runs on Linux?

Handy is cross-platform, including linux

+1 for Handy, it's very easy to get running and once it is you don't have to think about it again.

You can roll a script to do this, something that would consume a mic from Pipewire when triggered and then push results to clipboard. With a Parakeet ONNX model in between.

I had cause to do the the opposite: Hotkey -> clipboard TTS


Is Hex MacOS only?


Indeed. Over a few days of iterations I had this TUI built for fast full-text search of Claude Code or Codex sessions using Ratatui (and Tantivy for the full-text search index). I would never have dreamed of this pre coding agents.

https://pchalasani.github.io/claude-code-tools/tools/aichat/...


Attention is all everyone wants.

I think there’s a level beyond 8: not reviewing AI-generated code.

There’s a lot of discussion about whether to let AI write most of your code (which at least in some circles is largely settled by now), but when I see hype-posts about “AI is writing almost all of our code”, the top question I’m curious about is, how much of the AI-written code are they reviewing ?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: