This looks cool and could be a much needed step towards fixing the web.
Some questions:
[Tech]
1. How deep does the modification go? If I request a tweek to the YouTube homepage, do I need to re-specify or reload the tweek to have it persist across the entire site (deeply nested pages, iframes, etc.)
2. What is your test and eval setup? How confident are you that the model is performing the requested change without being overly aggressive and eliminating important content?
3. What is your upkeep strategy? How will you ensure that your system continues to WAI after site owners update their content in potentially adversarial ways? In my experience LLMs do a fairly poor job at website understanding when the original author is intentionally trying to mess with the model, or has overly complex CSS and JS.
4. Can I prompt changes that I want to see globally applied across all sites (or a category of sites)? For example, I may want a persistent toolbar for quick actions across all pages -- essentially becoming a generic extension builder.
[Privacy]
5. Where and how are results being cached? For example, if I apply tweeks to a banking website, what content is being scraped and sent to an LLM? When I reload a site, is content being pulled purely from a local cache on my machine?
[Business]
6. Is this (or will it be) open source? IMO a large component of empowering the user against enshittification is open source. As compute commoditizes it will likely be open source that is the best hope for protection against the overlords.
7. What is your revenue model? If your product essentially wrestles control from site owners and reduces their optionality for revenue, your arbitrage is likely to be equal or less than the sum of site owners' loss (a potentially massive amount to be sure). It's unclear to me how you'd capture this value though, if open source.
8. Interested in the cost and latency. If this essentially requires an LLM call for every website I visit, this will start to add up. Also curious if this means that my cost will scale with the efficiency of the sites I visit (i.e. do my costs scale with the size of the site's content).
> 1. How deep does the modification go? If I request a tweek to the YouTube homepage, do I need to re-specify or reload the tweek to have it persist across the entire site (deeply nested pages, iframes, etc.)
If you're familiar with Greasemonkey, we work similar to the @match metadata. A given script could have a specific domain like (https://www.youtube.com/watch?v=cD5Ei8bMmUk) or all videos (https://www.youtube.com/watch*) or all of youtube (https://www.youtube.com/*) or all domains (https:///). During generation, we try to infer your intent based on your request (and you can also manually override with a dropdown.
> 2. What is your test and eval setup? How confident are you that the model is performing the requested change without being overly aggressive and eliminating important content?
Oh boy, don't get me started. We have not found a way to automate eval yet. We can automate "is there an error?", "does it target the right selectors", etc. But the request are open ended so there are 1M "correct" answers. We have a growing set of "tough" requests and when we are shipping a major change, we sit down, generate them all, and click through and manually check pass/fail. We built tooling around this so it is actually pretty quick but definitely thinking about better automation.
This is also where more users comes in. Hopefully you complain to us if it doesn't work and we get a better sense of what to improve!
> 3. What is your upkeep strategy? How will you ensure that your system continues to WAI after site owners update their content in potentially adversarial ways? In my experience LLMs do a fairly poor job at website understanding when the original author is intentionally trying to mess with the model, or has overly complex CSS and JS.
Great question. The good news is that there are things like aria labels that are pretty consistent. If the model picks the right selectors, it can be pretty robust to change. Beyond that, hopefully it is as easy as one update request ("this script doesn't work anymore, please update the selectors"). Though we can't really expect each user to do that, so we are thinking of an update system where e.g. if you install/copy script A, and then the original script A is updated, you can pull that new update. The final stage of this is an intelligent system where the script can heals itself (every so often, it assess the site, sees if selectors have changed and fixes itself) -> that is more long-term.
> 4. Can I prompt changes that I want to see globally applied across all sites (or a category of sites)? For example, I may want a persistent toolbar for quick actions across all pages -- essentially becoming a generic extension builder.
Yes, if domain is https:/// it applies to all sites so you can think of this as a meta-extension builder. E.g. I have a timer script that applies across reddit, linkedin, twitter, etc. and keeps me focused.
> 5. Where and how are results being cached? For example, if I apply tweeks to a banking website, what content is being scraped and sent to an LLM? When I reload a site, is content being pulled purely from a local cache on my machine?
There is a distinction. When you generate a tweek, the page is captured and sent to an LLM. There is no way around this. You can't generate a modification for a site you cannot see.
The result of a generation is a static script that applies to the page across reloads (unless you disable it). When you apply a tweek, everything is local, there is no dynamic server communication.
Hopefully that is all helpful! I need to get to other replies, but I will try to return to finish up your business questions (those are the most boring anyway)
-- Edit: I'm back! --
> 6. Is this (or will it be) open source? IMO a large component of empowering the user against enshittification is open source. As compute commoditizes it will likely be open source that is the best hope for protection against the overlords.
It is very important to me that people trust us. I can say that we don't do X, Y, Z with your data and that using our product is safe, but trust is not freely given (nor should it be). We have a privacy policy, we have SOC II, and in theory, you could even download the extension and dig into the code yourself.
Open-source is one way to build trust. However, I also recognize that many of these "overlords" you speak of are happy to abuse their power. Who's to say that we don't open our code, only to have e.g. OpenAI fork it for their own browser? Of course, we could put restrictive licenses, but lawsuits haven't been particularly protective of copyright lately. I am interested in open-sourcing parts of our code (and there certainly is hunger for it in this post), but I am cognizant that there is a lot that goes into that decision.
> 7. What is your revenue model? If your product essentially wrestles control from site owners and reduces their optionality for revenue, your arbitrage is likely to be equal or less than the sum of site owners' loss (a potentially massive amount to be sure). It's unclear to me how you'd capture this value though, if open source.
The honest answer is TBD. I would push back on your claim that we wrestle control from site owners and reduce their optionality for revenue. While there likely will be users who say "hide this ad" (costing the site revenue) there are also users who say "move this sidbebar from left to right" or "I use {x} button all the time but it is hidden three menus in, place it prominently for easy access". I'd argue the latter cases are not negative for the site owners, they could be positive sum. Maybe we even see a trend that 80% of users make this UX modification on Z site. We could go to Z site and say, "Hey, you could probably make your users happy if you made this change". Maybe they'd even pay us for that insight?
Again, the honest answer is that I'm not certain about the business model. I am a lover of positive sum games. And in the moment, I am building something that I enjoy using and hopefully also provides value to others.
> 8. Interested in the cost and latency. If this essentially requires an LLM call for every website I visit, this will start to add up. Also curious if this means that my cost will scale with the efficiency of the sites I visit (i.e. do my costs scale with the size of the site's content).
As I noted above, this does not require an LLM call for every website you visit. You are correct that that would bankrupt us very quickly! An LLM is only involved when you actively start a generation/update request. There is still a cost and it does scale with the complexity of the site/request, but it is infinitely more feasible than running on every site.
In the future, we may extend functionality so that the core script that is generated can itself dynamically call LLMs on new page loads. That would enable you to do things like "filter political content from my feed" which requires test time LLM compute to dynamically categorize on each load (can't be hard-coded in a once-generated static script). That would likely have to be done locally (e.g. Google actually packages Gemini nano into the browser) for both cost and latency reasons. We're not there yet, and there is a lot you can do with the extension today, but there are definitely opportunities to build really cool stuff, way beyond Greasemonkey.
Wow, you really put me to work with this comment. Appreciate all the great questions!
I love the idea and the execution. The onboarding experience is great as well. Thanks for sharing. I am curious about SOC II. how much effort did you put in to acquire it, and what made you decide to pursue it?
> how much effort did you put in to acquire it, and what made you decide to pursue it?
We originally started looking into it when we were in the B2B space. On our end, we already took security pretty seriously so checking all the boxes was low lift.
> We could go to Z site and say, "Hey, you could probably make your users happy if you made this change". Maybe they'd even pay us for that insight?
My honest opinion:
1. No site would pay for that insight
2. Every site should pay for that insight
Part of the problem is that a lot of companies fall into one of two categories:
1. Small companies that don't have the time/energy/inclination to make changes, even if they're simple; often they're not even the ones making the website itself and they aren't going to way to pay the company who made the site originally to come back and tweak it based on what a small, self-selecting group of users decided to change.
2. Large companies who, even if they did care about what that small, self-selecting group of users wanted to change, have so many layers between A and Z that it's nearly impossible to get anything done without a tangible business need. No manager is going to sign off on developer and engineer time and testing because 40% of 1% of their audience moves the sidebar from one side to the other.
Also:
1. Designers are opinionated and don't want some clanker telling them what they're doing wrong, regardless of the data.
2. Your subset of users may have different goals or values; maybe the users more likely to install this extension and generate tweaks don't want to see recommended articles or video articles or 'you may like...' or whatever, but most of their users do and the change would turn out to be a bad one. Maybe it would reduce accessibility in some way that most users don't care about, etc.
If I had to pick a 'what's the value of all this', I would say that it's less about "what users want from this site" vs. "what users want from sites". For example, if you did the following:
1. Record all the prompts that people create that result in tweaks that people actually use, along with the category of site (banking, blogs, news, shopping, social media, forums); this gives you a general selection of things that people want. Promote these to other users to see how much mass appeal they have
2. Record all the prompts that people create that result in tweaks that people don't actually use; this gives you a selection of things that people think they want but it turns out they don't.
3. Summarize those changes into reports.
Now you could produce a 'web trend report' where you can say:
1. 80% of users are making changes to reduce clutter on sites
2. 40% of users are disabling or hiding auto-play videos
3. 40% of People in countries which use right-to-left languages swap sidebars from one side to another even on left-to-right-language websites
4. The top 'changed' sites in your industry are ... and the changes people make are ...
5. The top changes that people make to sites in your industry are ... and users who make those changes have a 40% lower bounce rate / 30% longer time-on-site / etc. than users who don't make those changes.
On top of that, you could build a model trained on those user prompts that companies could then pay for (somehow?) to run their sites through to provide suggestions of what changes they could make to their sites to satisfy these apparent user needs or preferences without sacrificing their own goals for the websites - e.g. users want to remove auto-playing videos because they're obnoxious, but the company is trying to promote their video content so maybe this model could find a middle-ground to present the video to users in a way that's less obnoxious but generates user engagement.
That's what I think anyway, but I'm not in marketing or whatever.
Seems to me that the obvious business model here is that they will need to have their AI inject their own ads into the DOM. Overall though, this feels like a feature, not a business.
Clearly there’s a tension on this venture-capital-run website between some people using their computer-nerd skills to save money and improve their experience, and other people hustling a business that requires the world to pay them.
> Clearly there’s a tension on this venture-capital-run website
Yeah. If they have a problem with that, they can kill HN. You can't have hackers/smart people in your forum and decide what they will do. Moderation can try do guide it but there is a limit when meeting smart + polite people.
Some questions:
[Tech]
1. How deep does the modification go? If I request a tweek to the YouTube homepage, do I need to re-specify or reload the tweek to have it persist across the entire site (deeply nested pages, iframes, etc.)
2. What is your test and eval setup? How confident are you that the model is performing the requested change without being overly aggressive and eliminating important content?
3. What is your upkeep strategy? How will you ensure that your system continues to WAI after site owners update their content in potentially adversarial ways? In my experience LLMs do a fairly poor job at website understanding when the original author is intentionally trying to mess with the model, or has overly complex CSS and JS.
4. Can I prompt changes that I want to see globally applied across all sites (or a category of sites)? For example, I may want a persistent toolbar for quick actions across all pages -- essentially becoming a generic extension builder.
[Privacy]
5. Where and how are results being cached? For example, if I apply tweeks to a banking website, what content is being scraped and sent to an LLM? When I reload a site, is content being pulled purely from a local cache on my machine?
[Business]
6. Is this (or will it be) open source? IMO a large component of empowering the user against enshittification is open source. As compute commoditizes it will likely be open source that is the best hope for protection against the overlords.
7. What is your revenue model? If your product essentially wrestles control from site owners and reduces their optionality for revenue, your arbitrage is likely to be equal or less than the sum of site owners' loss (a potentially massive amount to be sure). It's unclear to me how you'd capture this value though, if open source.
8. Interested in the cost and latency. If this essentially requires an LLM call for every website I visit, this will start to add up. Also curious if this means that my cost will scale with the efficiency of the sites I visit (i.e. do my costs scale with the size of the site's content).
Very cool.
Cheers