Long term all of the major LLM platforms will have invisible ads, influences, and propaganda woven into the content. The temptation will be irresistible for these companies.
The difference imo is removing the information from the source. Previously you'd use the source of the information to gauge how much you trust it. If it's a reddit post or a no name website you'd likely be skeptical if it doesn't seem backed up by better sources. But now the info is coming from an LLM that you generally trust to be knowledgeable. And the language it uses backs up this feeling.
The OP post is highlighting how incredibly easy it is for a very small amount of information on the web to completely dictate the output of the LLM in to saying whatever you want.
> But now the info is coming from an LLM that you generally trust
But it's not from the LLM, the LLM clearly cites the wikipedia article as its source. This is just performing an internet search with extra steps, and ending up with misinformation because somebody vandalized wikipedia.
Is there a library of good tools for LLMs to call? I have to imagine the bot-detection avoidance mechanisms are a major engineering effort and not likely to work out of the box with a simple harness and random local LLM.
firecrawl: "if you post content or intellectual property within the Services or give us Feedback about the Services, you hereby grant to us a worldwide, irrevocable, non-exclusive, royalty-free license to use, reproduce, modify, publish, translate and distribute any content that you submit in any form [...] You also grant to us the right to sub-license these rights"
exa: "Query Data is used to improve our products and technology, including by training and fine-tuning models that power our Services"
perplexity: "Perplexity may retain, copy, distribute and otherwise use Search Data for its lawful business purposes, including the improvement and development of products and services."
linkup: "Client grants Linkup a worldwide right to use, reproduce and modify the Client Data, including prompts, for the purposes of providing, maintaining, developing, training"
tavily: "we may use certain portions of your query data to improve our responses to future queries"..."We may share your query data with third-party search index providers (e.g., Google)"
Kagi also has an API. People who hate ads are probably the same folk that should be paying for Kagi. That's the sane alternative world where companies respect their users.
Oh, you got me so excited. I've had a Kagi sub for 3 years, but their API is still in closed beta. I guess I could (and should reach out and ask for access).
That's not how it works. Whether local or hosted, every modern model has a cutoff date for its training data, and can be leveraged by agents / harnesses / tools to fetch context from the internet or wherever.
Reddit has always been fake, but it used to be a real person performing creative writing pretending to be a true story. Now it's spammed out slop at scale.
You need massively expensive hardware to run them, and they aren't as good. It's pretty clear the base price of AI tools is way higher than we are being charged right now.
I wouldn't call my $2k Strix Halo computer "massively expensive", and it runs e.g. Qwen 3.6 27b brilliantly, with tons of memory to spare and is a full x86 powerhouse pulling 120w at absolute max.
IMO the programming world is far too myopic about / insistent on using laptops, especially macbooks. Just because a crappy deal exists doesn't mean everyone is forced to take it. Local AI is a high performance computing problem and laptops are fundamentally a crappy form factor for it; buy an efficient desktop computer and be surprised at what's possible even with today's crazy prices.
Is code content though? This would seem more applicable to video where people are being deceived by AI content masking as real. Rather than internal application code.
It's also easy to overwhelm reviewers with far more code than they can possibly review. And it's also the hardest stuff to review where the code at surface level looks totally fine, but takes long hours of actual testing to make sure it works.
I get the feeling the culture in radio is just not the same as regular open source. The free unrestricted sharing of things is an unusual quirk in the world rather than the norm.
In my experience, amateur radio (both licensed and license-free) and 3d printing both seem to have cultural perspectives on open source that differ considerably from the regular open source software community.
But while in 3d printing, outside of hardware, that difference often feels confused (eg, I've seen the Multiboard creator post compliments online about models that blatantly violate his own license), in radio the difference often feels hostile. You have OpenGD77, for example, with its 'we were never GPL' rug-pull that was likely illegal (they had outside contributions) [1]. You have Meshcore with its 'we are open source, except...', and, as you can see in this thread, a difficulty actually finding parts of the code. You have the heavy cultural push against uSDX (seemingly open hardware+source) toward truSDX (DRM-encumbered), and what seems like the quiet acceptance of things like QMX, where you can solder together a radio with DRM that prevents you from installing your own firmware. You even have digital modes that are legally required to be publicly documented, and actually aren't in any meaningful way: VARA FM is probably the worst offender [2], but even modes that are in-crowd enough to be advertised in FCC license exam questions are often effectively proprietary and legally dubious.
What's particularly foreign to me about the culture is that oftentimes, much of the community seems to support behavior that seems malicious from an open source perspective, and attack the open source proponents.
If it's done in a background process then it won't impact the speed of the tool at all. When the choice is between getting data to help improve the tool at the cost of "bad manners" whatever that means, the choice is pretty easy.
reply