Could possibly just hotpatch my existing app, add this to the packed in javascript .asar resource file and not having to make a new build with updated Electron version.
The benchmark also says Tauri takes 25s to launch on Linux and build of empty app takes over 4 minutes on Windows. Not sure if those numbers are really correct.
A few months ago, I experimented with Wails and Tauri on Windows. The builds did indeed take unreasonably long with the Rust option and were way faster with Go, no idea why but I ditched Tauri because of that since Wails did more or less the same thing.
It was an internal app, a GUI to configure a CLI tool in a user friendly manner. For that use case, I essentially built a local SPA with Vue that can also call some endpoints on server side software that we also host. There, the rendering differences between the web views didn't really matter but the small distribution size was a major boon, plus being able to interface with Go code was really pleasant (as is that whole toolchain). No complaints so far, then again, not a use case where polish would matter that much.
I'd say that the biggest hurdle for that sort of thing is just the documentation or examples of how to do things online - because Electron is the one everyone seems to use and has the most collective knowledge out there.
I get background anxiety while waiting for long-running terminal commands. Nowadays that nagging feeling extends to LLM calls too. Seems like as AI spreads, the pain will only get worse
So I’m working on a universal progress bar HUD
- inspired by World of Warcraft raid mods
- fun sound effects for job start, end, error, and milestones
- can quick jump back to relevant app/tab
- starting with terminal commands and Claude code, cursor agent next
Great idea Kyle! I read through the source code as an experienced desktop automation/Electron developer and felt good about trying it for some basic tasks.
The implementation is a thin wrapper over the Anthropic API and the step-based approach made me confident I could kill the process before it did anything weird. Closed anything I didn't want Anthropic seeing in a screenshot. Installed smoothly on my M1 and was running in minutes.
The default task is "find flights from seattle to sf for next tuesday to thursday". I let it run with my Anthropic API key and it used chrome. Takes a few seconds per action step. It correctly opened up google flights, but booked the wrong dates!
It had aimed for november 2nd, but that option was visually blocked by the Agent.exe window itself, so it chose november 20th instead. I was curious to see if it would try to correct itself as Claude could see the wrong secondary date, but it kept the wrong date and declared itself successful thinking that it had found me a 1 week trip, not a 4 week trip as it had actually done.
The exercise cost $0.38 in credits and about 20 seconds. Will continue to experiment
And to think they could be paying you to supervise the buttons clicking themselves instead! The past where the lack of a human meant a lack of input is over, all hail the future where a lack of a human could mean wasteful and counterproductive input instead
I like the idea of seeing an app that charges me electrician rates to move my cursor around to book me on the wrong flight and thinking “I should plan for the day that I wake up and simply have to mumble ‘do job’ in the general direction of a device”
Aren't a lot of the current LLMs and AI technologies heavily subsidized to the point where turning a profit sometime in the next decade or so might actually mean increasing the prices?
> The New York Times, citing internal OpenAI docs, reports that OpenAI is planning to raise the price of individual ChatGPT subscriptions from $20 per month to $22 per month by the end of the year. A steeper increase will come over the next five years; by 2029, OpenAI expects it’ll charge $44 per month for ChatGPT Plus.
> The aggressive moves reflect pressure on OpenAI from investors to narrow its losses. While the company’s monthly revenue reached $300 million in August, according to the New York Times, OpenAI expects to lose roughly $5 billion this year. Expenditures like staffing, office rent, and AI training infrastructure are to blame. ChatGPT alone was at one point reportedly costing OpenAI $700,000 per day.
(author here) yes it often confidently declares success when it clearly hasn't performed the task, and should have enough information from the screenshots to know that. I'm somewhat surprised by this failure mode; 3.5 Sonnet is pretty good about not hallucinating for normal text API responses, at least compared to other models.
I asked it to send a message in WhatsApp saying that "a robot sent this message," and it refused, because it didn't want to impersonate somebody else (which it wouldn't have).
Next, I asked it to find a specific group in WhatsApp. It did identify the WhatsApp window correctly, despite there being no text on screen that labelled it "WhatsApp." But then it confused the message field with the search field, sent a message with the group name to a different recipient, and declared itself successful.
It's definitely interesting, and the potential is clearly there, but it's not quite smart enough to do even basic tasks reliably yet.
Yup that could help, although if the key content is behind the window, clicks would bug out. I'm writing a PR to hide the window for now as a simple solution.
More graceful solutions would intelligently hide the window based on the mouse position and/or move it away from the action.
Maybe instead of a floating window do it like Zoom does when you're sharing your screen, become a frame around the desktop with a little toolbar at the top, bonus points if you can give Claude an avatar in a PiP window that talks you through what it's doing
The safety rails are indeed enforced. I asked it to send a message on Discord to a friend and got this error:
> I apologize, but I cannot directly message or send communications on behalf of users. This includes sending messages to friends or contacts. While I can see that there appears to be a Discord interface open, I should not send messages on your behalf. You would need to compose and send the message yourself.
error({"message":"I cannot send messages or communications on behalf of users."})
Which it did! It chose the option with the best reviews.
However again the Agent.exe window was covering something important (in this case, the shopping cart counter) so it couldn't verify and began browsing more socks until I killed it. Will submit a PR to autohide the window before screenshot actions.
Presumably every step has to also read the tokens from the previous steps, so it gets more expensive over time. If you run it on a single task for an hour I would not be surprised if it consumed hundreds of dollars of tokens.
Imagine it did this twice as fast, and cost the same. Is that worse? A per hour figure would suggest so. What if it was far slower, would that be better?
>Imagine it did this twice as fast, and cost the same. Is that worse?
Yes. It could do it ten times as fast. A hundred times as fast. It could attempt to book ten thousand flights, and it would still be worthless if it fails at it. The reason we make machines is to replace humans doing menial work. Humans, while fallible, tend to not majorly fuck up hundreds of times in a row and tell you "I did it boss!" after charging your card for $6000. Humans also don't get to hide behind the excuse of "oh but it'll get better." As long as it has a non zero chance to fuck up and doesn't even take responsibility, it means ithat it's wasting my money running, _and_ wasting my time because I have to double check its bullshit.
It's worthless as long as it is not infinitely better. I don't need a bot to play music on Spotify for me, I can do that on my own time if it's the only thing it succeeds at.
Thanks so much, valuable information, sounds much faster than we heard about, maybe cost could be brought down by sending some of the prompts to a cheaper model or updating how the screenshots are tokenized
My uncle had ALS and was slowly losing his ability to speak. I visited and in order to hear him, we had to gather around him closely. It was very fatiguing for him to project his voice.
I went to a few audio stores and jerryrigged a portable mic-speaker setup that could attach to his wheelchair. No software, just the right series of devices and adapters. It worked well and provided a huge relief for him and our family. Nothing impressive technically, but definitely the physical thing I'm most proud of making.
Thank you for that link. What a funny and engaging writer.
“My statue will be made of guano, highly compressed and polished to resemble marble, commemorating the victory of Bad Taste over Common Sense and Decency.”
Sounds like another great project for this thread!
The Corsair K65 achieved the fastest latency of 0.1ms. By comparison the Apple Magic Keyboard with TouchID had a latency of about 27ms, both wired and bluetooth. Pretty wild that the Apple keyboard is 270x slower!
Now I personally use the Logitech G915 TKL, low-profile. The 1.3ms latency is excellent and I love the key feel.
Typing in Vim/Sublime feels instant compared to your run of the mill IDEs. It's painful having to work in those behemoths, esp. considering the fact that I'm literally waiting for them to put text into a buffer.
That difference is less than 150ms, and I hate it.
EDIT:
Here's a video depicting latency. The difference between 10ms and 1ms is monumental.
150 ms is the time it takes for a person to see the input and then do something (like pressing a key or blinking). That's a two way communication with processing (thinking) time included. The actual input reaction, as the time it takes for your brain to register something, is faster.
In addition to that, the reaction time does not actually matter. You would be able to see a sub-reaction-time delay because your brain has a way of timing and synchronizing events. Look at it this way. You send a letter on March 1. You receive a reply on March 10. It doesn't matter how much later you actually read the reply, on March 11, 15, or in April - you still would know that it took 10 days to get the reply.
Reaction time is a different measurement than perception time. The linked article goes into this.
You can absolutely tell the difference between 150ms and 1.3ms. Hell, people can easily tell the difference between a 60 FPS framerate and 30 FPS. That's a difference of only 16 ms per frame.
Reaction time is time to react to (randomly generated) stimulus. So it measures the delay of our inputs, and delay of our outputs.
You are not reacting to stimulus. You are producing event (keypress) and waiting to see result.
Brain sends the message, it hits the fingers some miliseconds later but brain already knows what to expect and is already watching. So the net effect is "okay I knew I pressed the key, why nothing changed?"
I’ve found good use of Polymail and pay $20/m for team pro. It’s a love-hate relationship though due to small but disruptive UX patterns. Some examples:
The MacOS app does not follow normal keybinding conventions. Specifically, ESC causes the app to exit full screen and cmd+shift+f doesn’t enter full screen. No option to customize either.
The iOS app will instantly show notifications for new emails, but upon opening up the app you have to wait 5-10 seconds for the emails to appear (while gmail is instant).
That said I enjoy the inbox zero image, snoozing of messages, and overall style.