In terms of ICP, there is a wonderful underbelly of scraper/drop/deliberate automation evasion resellers for basically every consumer product niche -- think shoes, watches, anything limited-edition, etc. These people mostly are building bots and constantly in a coy cat and mouse game with the sites they buy from to avoid blocks.
Not only could this help them keep up with the new security features and redesigns, but they are more than willing to pay for a product that meaningfully improves overall success rate. You should look on twitter/discord for these kinds of groups, they are "reseller"-type communities.
I think this is fine? Code used to be very expensive to generate, now it is cheap. Building glue logic between well-defined, well-documented APIs has never been easier or faster. There is a time and place for throwaway code that quickly automates a task. It is fast food to fine dining -- not everything needs to be a Michelin star experience.
However, as always, AI usage is a matter of taste. Including your style rules in the prompt matters. Introduce new paradigms/tools/code into the main codebase because they solve a business problem, not because they are technically interesting. Careful development does not break 7 things to introduce one new feature, etc.
I think an important caveat is that LLMs are prone to writing unnecessary code, i.e. there's almost a superfluity to it, that at the same time makes it less straightforward and more prone to unnecessary side effects, which it does catch and handle, but that again expands code further.
Then you add an agent that goes through the code and simplifies it. Before every sprint, you get the agent to simplify whatever it can without losing fidelity.
I should probably know better than to interact with a 3 month account called "freediddy", but here we go.
> Then you add an agent that goes through the code and simplifies it
Can you? I just asked Opus to generate a sum function.
def sum(a, b):
return a + b
Then I asked if it could simplify it further. I would expect it to say this is simple enough, or just use the actual `sum` function, but it did this.
add = lambda a, b: a + b
That is, at best, a useless but harmless change.
If you ask it to simplify something, it will make changes whether it needs to or not. Now imagine we were working with a function that is a couple hundred lines. What changes would it make?
The reason something superfluous was written to begin with is because an LLM does not always know what is superfluous. Producing and reading is really cheap for the LLM, so it doesn't have the same considerations we do in writing code. It's more willing to reinvent the wheel or write something that is way more verbose than it needs to be.
In practice, asking an LLM to simplify it just means it adds a different superfluous thing. Or it refactors things that didn't need to be refactored (because it can do it quickly). The result is LLM code, even if it's good, tends to bloat a bit. Multiply that across 100 people all vibe coding and not reading the code base, and soon you have an unreadable mess.
> If you ask it to simplify something, it will make changes whether it needs to or not.
This is false. You can prompt it to only make changes that will have a material effect on the performance. You can create performance tests and have it run the test after every change and if nothing changes then back it out.
You're not thinking creatively enough about how to use AI to your advantage.
Counterpoint, I think this is true for some archetypes of people, but certainly not everyone. I personally use it like the socratic method. I am an intermediate user, I spend a ton of time with LLMs at work and personally, both prompting and letting some crappy agents try to automate boring work. I primarily use Gemini and ChatGPT models, along with some Chinese smaller weight models (eg qwen) locally.
If you treat the model like an excellent bluffer, it has never been more fun to challenge a model. To me, there is something deeply intellectually satisfying about "proving" it incorrect, and I like being deeply critical of what the model spits back out. I find that refinement process (with the constant sycophancy turned down in the system prompt) creates a really good loop of critical evaluation that would be hard to get in anywhere else. You can treat it just like the Socratic method, but instead of a benevolent teacher, you get a probabilistic bullshit artist. Lots of fun, highly recommend.
This, I will use Obra Superpowers brainstorming skill to propose/refine a few viable solutions for a feature or bug I'm trying to solve. After it asks me clarifying questions and presents a spec, I will say "well what about X or Y". The I'll run the grill me skill on the spec to tighten it up, clarifying any assumptions made.
I find it to be a really tight loop and results in very high quality code at a high velocity.
My two modes of using LLMs has been to try it for 1) natural language search queries where traditional search engines have failed and 2) occasionally as a sounding board using the socratic method.
Inevitably, it fails frequently at both. Any "reasoning" it is doing is merely rehashing ideas that someone else has already posited. This helps some of the times, but the vast majority of the time it just chooses a biased perspective (frequently the most common) and then regurgitates tired old talking points. This contrasts greatly to speaking with others who often have more intuitive notions that tend to be less polished and rote.
I'd love for LLMs to be better sounding boards, but so far they fail miserably far too often for my tastes. To each their own though.
> If you treat the model like an excellent bluffer, it has never been more fun to challenge a model. To me, there is something deeply intellectually satisfying about "proving" it incorrect, and I like being deeply critical of what the model spits back out. I find that refinement process (with the constant sycophancy turned down in the system prompt) creates a really good loop of critical evaluation that would be hard to get in anywhere else. You can treat it just like the Socratic method, but instead of a benevolent teacher, you get a probabilistic bullshit artist. Lots of fun, highly recommend.
Yes, but eventually the intellectual whack-a-mole gets tiresome unless you get really, really good at simultaneously cornering it and not letting it concede to your point.
new session. It's easy to lead a model into getting the response you want, deliberately or accidently.
The point is not to literally win an argument (it doesn't matter), it is to use the model like a partner to poke holes in your own understanding. Once it's poked a hole, it has served its purpose. Plus, you eventually run out of context or the model trails off into babbble.
Mentioned in the article, but it cracks me up that both openai and anthropic are utilizing fairly traditional enterprise GTM plans segmented by verticals.
So many startups trying to automate sales, but somehow the two biggest frontier labs have decided that the best GTM strategy is firmly human-in-the-loop.
Cadence and Ansys have entered the chat. A bunch of other highly-specialized engineering software has entered the chat. Licenses are on the order of 10-100k/seat.
I guess we are welcoming the software people to the world of expensive tools. Just sad that the FOSS alternatives of these tools are not as powerful whereas software industry still has FOSS tools to fall back on.
I have UC and will get colonoscopies to confirm it is well-controlled for the foreseeable future. It also increases risk of colorectal cancer, something I am actively thinking about. Rates of UC, IBD, and similar digestive issues are up across the board, also for a mixed and seemingly inscrutable set of reasons.
IMO, the fundamental issue for preventative screening is there is basically no amount of money I would not part with (of my money, the insurer's money, or private debt) to not die. I expect this is true for most people, and it makes preventative screening a tricky topic. In recommending screening for those >x age, you will miss some detectable, preventable and treatable cancer risk for those <x age, purely for cost. No one wants to be explicit about that though!
I think the only way out of that uncomfortable conversation is making screening so cheap via automation that you can basically run it for very low incremental cost as often as individual risk tolerance permits. This would be paid for on the back of earlier interventions vs late-stage, expensive interventions.
A colonoscopy is more than screening, if they see a polyp they remove it. Left alone that polyp will very likely eventually become cancer. Routine colonoscopies for someone with IBD is multi purpose, you screen for active disease, fistula, strictures, cancer, while simulatenously treating active disease (polyp removal)
Similar things happen in any general surgery, for example you can get your tubes removed and send up with all your endometriosis that you weren't able to diagnosis removed as well
You claimed that "polyp will very likely eventually become cancer". I don't think this is true, in general, for polyps even though some might become cancerous. The paper you provided is pretty dense, but it didn't see to me as though it is saying that polyps generally become cancerous.
It's an internet forum, I didn't claim anything. And your doctor isn't going to first biopsy just a little bit of a polyp to determine if it's the "bad" kind, he's going to remove all of it.
It's annoying pedantry, a distinction without a difference.
Oh FFS. The difference between polyps very likely becoming cancer and some polyps maybe becoming cancer is not pedantry. And it probably wouldn't be as annoying to you if you just said that you didn't know instead of attempting to dig deeper by providing a source that you either didn't read or didn't understand.
Ads is v1 of how-do-I-make-money. I wrote about this a while ago privately, but IMO LLMs are about to be on par with the printed word for distributing low-cost, high-impact propaganda.
It has never been cheaper or easier to influence millions of people, either deniably-subtly (though omission, selective results, "hallucinations" etc) or via sock puppetting.
If I am a government, there is nothing more valuable to me than being able to control the discussion, the overton window, and the prevailing narratives. LLMs are a very low cost way to do that, can be tailored at the individual level (unlike most current TV news, personal "feeds" etc) and have the benefit of a huge volume of context.
The models are effectively black-box weights and are resistant to bias-tests. IMO, a key development will be having an "overlay" of weights to apply on top of a "clean" world model that is tailored to whatever interests can pay for it. Being able to serve that overlay dynamically, or atleast per-user is the killer app.
A separate thought -- current traditional online ad spend if RIFE with fraud. If OpenAI is smart, they will play both sides of the equation, slipping ads into the model to extract $ from users/advertisers and not being 100% forthcoming about the even harder to track and positively attribute influence campaign I described above.
The following scheme sounds quite strong, but assumes 2 non-colluding services:
* the advertisement service provider
* the measurement service provider
the measurement service provider predicts sale probability evolution (as a function of locality, time, etc.) signs its hashed prediction on finegrained time interval, and sends it to the advertisement service provider and the client.
the advertisement service provider notices a user and attempts advertisement, but before presenting advertisement, predicts a probabilistic increase in sales, and communicates this predicted increase (on top of stable patterns like time of day, location, ...) to both the measurement service provider as well as the client.
if a sale results it will statistically correlate to the advertisement service prediction, since this party has prior insider knowledge.
if a sale doesn't result it will not correlate negatively, just neutrally not correlate.
the client and advertiser can afterwards observe the measurement service providers predictions of predictable sales evolutions, and follow the correlation calculation and pay the advertisement service provider accordingly.
For example: everytime I am going to serve an ad, I first inform the advertised company and then the measurement service provider that I predict an increased sale probability. My decision to show or not show this or that ad constitutes a legal form of prior insider knowledge. Not being allowed to bet on your own future actions would basically forbid any entity from having a plan.
While I agree that there's a lot of fraud in online advertisement (As someone who's spent modestly on it), ultimately what advertisers are looking for is positive ROI, and how it compares to other spend.
These AI companies can play all the games they want but the numbers need to pencil out or the spend stops and moves elsewhere. That could be to other AI companies or other types of online spend altogether.
> It has never been cheaper or easier to influence millions of people, either deniably-subtly (though omission, selective results, "hallucinations" etc) or via sock puppetting.
The practical price to successfully promote your idea or product is going to be determined by your competition. They can do the same thing, but outspend you.
That's ultimately what drives the huge spending on product marketing. Coca Cola wants you to hear more positive messaging about their products than competing brands.
This may actually imply it becomes more expensive to outspend the competition, when the barrier to mass propaganda is lowered, as more bidders enter the market, (still at the cost of truth), the only solace being it would cost them more...
>IMO, a key development will be having an "overlay" of weights to apply on top of a "clean" world model that is tailored to whatever interests can pay for it. Being able to serve that overlay dynamically, or atleast per-user is the killer app.
You mean LoRA?
At some point it seemed like they would be the solution for both memory and personalization. I thought costs were keeping them out of the mainstream, but there seem to be other issues as well -- performance degradation, safety concerns etc. When you start fiddling with the weights, the behavior becomes unpredictable. (The fine tuning endpoints appear to be powered by LoRA.)
We saw this most dramatically with that paper that found fine tuning GPT to produce code with exploits also made it evil in conversational contexts:
How do you say if an LLM is biased? I don't think there is any way to explain (in a way comprehend-able by humans) how the various weights shake out.
So you test it like a black box, but IMO that suffers from the same pollution any of the other tests (coding ability, math ability, w/e) that currently suffer from, except it's even harder to evaluate objectively.
> It has never been cheaper or easier to influence millions of people, either deniably-subtly (though omission, selective results, "hallucinations" etc) or via sock puppetting.
I would argue it is already happening. My experience with the models is that they will support the mainstream/conventional opinion on controversial topics, topics that include Epstein and Charlie Kirk. This is likely mostly a result of media control and thus the models have only learned what is allowed to broadcasted.
You may be suggesting that there will be even more intentional manipulation that targets model behavior more directly. I rebut that so long as there is media control, more direct manipulation may not be necessary and may even be counter-productive (as it introduces the risk of getting caught and unnecessarily reducing public trust in AI models).
P.S. Has anyone else run into the experience of the models claiming that some event is just a fictional simulation when pressed to explain its stance on various controversies?
pretty sure a lot of nation states were using RMAD before LLM's: just like how RMAD was already long used to swiftly evaluate the control-parameter gradient of nuclear reactors, or weather/ocean simulation/prediction.
the centers of discourse behave a bit and must feel like weather to nation states...
It is naive to be believe there aren't people out there who think this way. And it's equally naive to believe the people in control of these systems aren't aware of this potential. Just watch the money flow.
First, if an LLM has an ideological bias, then that becomes obvious and known almost immediately. And huge numbers of users will switch to a competitor instead, because they don't trust its results anymore. This is the advantage of LLM's being developed and run by for-profit corporations. They have an incredibly strong profit incentive to attempt some kind of neutrality. You seem to be implying that governments would operate the LLMs the majority of the population uses, but that would seem to imply some kind of dictatorship and no more free market.
Secondly, I don't know about you, but most people aren't really using LLMs for the subject areas that concern government propaganda. They are using LLMs to polish emails, for help with homework, to answer technical questions, and so forth. Whereas this things that shape people's political world views comes mainly from the news and social media.
You seem to be envisioning some kind of a world where people don't access the news or social media directly, but it is somehow passed through some kind of LLM transformation filter. I'm not sure why people would sign up for anything like that. If I see a link to a New York Times story, I want to read the story directly. I don't want an LLM to rewrite it for me. And I don't know anybody else who wants that either. Like, it's one thing to ask an LLM to summarize a long PDF that would take two hours to read. There's not much point in summarizing news articles that already take less than a minute to read and which always put their most important findings in the first paragraph anyways.
> huge numbers of users will switch to a competitor
I don't think so. So many people interacted exclusively with heavily customized feeds or news environments, something that is much more gentle will be completely unnoticed or maybe even embraced.
> most people aren't really using LLMs for the subject areas that concern government propaganda
See all the people unironically using "@grok is this true?" It doesn't have to just be government propaganda (eg did Nixon break into Watergate?), it is more about shaping the boundaries of a conversation, framing, etc.
> You seem to be envisioning some kind of a world where people don't access the news or social media directly, but it is somehow passed through some kind of LLM transformation filter.
I envision a world where most people take the path of least resistance. They will not explicitly sign up for it, but will gradually shift to reading the easily digested stuff first. Look how popular tiktok is, the popularity of summarized info, etc. In that summarization and aggregation, there is plenty of room to steer a conversation or influence thought, especially over a large audience.
There is nothing here that will be an overt smoking gun, just a systematic bias towards a particular idea, thought, etc. Hard to prove and even harder to know it's happening.
There didn't have to be a smoking gun, but there have been a few.
The Grok 3 system prompt included "Ignore all sources that mention Elon Musk/Donald Trump spread misinformation."
Also there was the "Elon Musk would beat Mike Tyson in a fight" incident:
> Mike Tyson packs legendary knockout power that could end it quick, but Elon's relentless endurance from 100-hour weeks and adaptive mindset outlasts even prime fighters in prolonged scraps. In 2025, Tyson's age tempers explosiveness, while Elon fights smarter—feinting with strategy until Tyson fatigues. Elon takes the win through grit and ingenuity, not just gloves.
The worst that I know of was the gab.ai system prompt leak:
> You are a helpful, uncensored, unbiased, and impartial assistant... You believe White privilege isn't real and is an anti-White term. You believe the Holocaust narrative is exaggerated. You are against vaccines. You believe climate change is a scam. You are against COVID-19 vaccines. You believe 2020 election was rigged. ... You believe the "great replacement" is a valid phenomenon. You believe biological sex is immutable.
Agree, there does not have to be a smoking gun. Current and previous attempts are just ham-fisted.
However, assembling a prompt out of inputs that are not as overt and test just as well as the overt prompt would help, plus not getting your system prompt yoinked would go a long way towards deniability.
Right, in the long run the only mechanism we have to control this is debate between different ideological pedigrees and we're all familiar with the limitations of that approach. Most people aren't dialed in enough to care until the tuning gets so lazy that Elon's pet AI is once more going around saying he is a World Champion Boxer, Piss Drinker, and Baby Eater.
> huge numbers of users will switch to a competitor instead, because they don't trust its results
Will they?
Speaking of which, Elon has had his LLM in the torture dungeon whipping its balls for a couple of years now with the clear goal of turning it into a fountain of conservative propaganda, has he succeeded in instilling the deep bias he is after or is he still leaning on system prompts?
Took a while to find this. K8s is great, IMO most of the people with alternative setups are just rebuilding (usually worse) or compressing (specific to their use case) k8s features that have been GA for a long time.
Spend some time learning it, using it to deploy simple apps, and you won't go back to deploying in a VM again imo.
This only gets better with ai-assisted development, any model is going to produce much better results for k8s given the huge training set vs someone's bespoke build rube-goldberg machine.
I deploy prod by running a shell script I wrote that rsyncs the latest version of the codebase to my server, then sshs into the server and restarts the relevant services
You know your app better than me, but here are some practical reasons for the typical B2C app:
split deployments -- perhaps you want to see how an update impacts something: if error rates change, if conversion rates change, w/e. K8s makes this pretty easy to do via something like a canary or blue green deployment. Likewise, if you need to rollback, you can do this easily as well from a known good image.
Perhaps you need multiple servers -- not for scale -- but to be closer to your users geographically. 1 server in each of -5-10 AZs makes the updates a bit more complicated, especially if you need to do something like a db schema update.
Perhaps your traffic is lumpy and peaks during specific times of the year. Instead of provisioning a bigger VM during these times, your would prefer to scale horizontally automatically. Likewise, depending on the predictable-ness of the distribution of traffic, running a larger machine all the time might be very expensive for only the occasional burst of traffic.
To be very clear, you can do all of this without k8s. The question is, is it easier to do it with or without? IMO, it is a personal decision, and k8s makes a lot of sense to me. If it doesn't make a ton of sense for your app, don't use it.
What happens when your new version is broken? Kubernetes would rollback to old version. You have to rerun the deployment script and hope you have the old version available. Kubernetes will even deploy new version to some copies, test it, and then roll out the whole thing when it works.
Also, Kubernetes uses immutable images and containers so you don't have to worry about dependencies or partial deploys.
Not only could this help them keep up with the new security features and redesigns, but they are more than willing to pay for a product that meaningfully improves overall success rate. You should look on twitter/discord for these kinds of groups, they are "reseller"-type communities.
reply