Hacker Newsnew | past | comments | ask | show | jobs | submit | SchemaLoad's commentslogin

Long term all of the major LLM platforms will have invisible ads, influences, and propaganda woven into the content. The temptation will be irresistible for these companies.

The difference imo is removing the information from the source. Previously you'd use the source of the information to gauge how much you trust it. If it's a reddit post or a no name website you'd likely be skeptical if it doesn't seem backed up by better sources. But now the info is coming from an LLM that you generally trust to be knowledgeable. And the language it uses backs up this feeling.

The OP post is highlighting how incredibly easy it is for a very small amount of information on the web to completely dictate the output of the LLM in to saying whatever you want.


> But now the info is coming from an LLM that you generally trust

But it's not from the LLM, the LLM clearly cites the wikipedia article as its source. This is just performing an internet search with extra steps, and ending up with misinformation because somebody vandalized wikipedia.


The hosted ones still have the advantage of being able to search the internet for live info rather than being limited to a knowledge cut off date.

I’m not sure why a model needs to be hosted in order to make network calls?

Is there a library of good tools for LLMs to call? I have to imagine the bot-detection avoidance mechanisms are a major engineering effort and not likely to work out of the box with a simple harness and random local LLM.

Even the hosted ones are blocked from searching certain sites, for example Claude is banned from searching Reddit:

`Error: "The following domains are not accessible to our user agent: ['reddit.com']."`


Tavily, Exa, Firecrawl, Perplexity, and Linkup are all tools for agents to search the web.

I’ve been building a harness the past few months and supports them all out of the box with an API key.


be warned though:

firecrawl: "if you post content or intellectual property within the Services or give us Feedback about the Services, you hereby grant to us a worldwide, irrevocable, non-exclusive, royalty-free license to use, reproduce, modify, publish, translate and distribute any content that you submit in any form [...] You also grant to us the right to sub-license these rights"

exa: "Query Data is used to improve our products and technology, including by training and fine-tuning models that power our Services"

perplexity: "Perplexity may retain, copy, distribute and otherwise use Search Data for its lawful business purposes, including the improvement and development of products and services."

linkup: "Client grants Linkup a worldwide right to use, reproduce and modify the Client Data, including prompts, for the purposes of providing, maintaining, developing, training"

tavily: "we may use certain portions of your query data to improve our responses to future queries"..."We may share your query data with third-party search index providers (e.g., Google)"


Kagi also has an API. People who hate ads are probably the same folk that should be paying for Kagi. That's the sane alternative world where companies respect their users.

Oh, you got me so excited. I've had a Kagi sub for 3 years, but their API is still in closed beta. I guess I could (and should reach out and ask for access).

If your volume is low enough, it should be pretty fine. It can just piggy back onto your personal browser cookies for Cloudflare.

That's not how it works. Whether local or hosted, every modern model has a cutoff date for its training data, and can be leveraged by agents / harnesses / tools to fetch context from the internet or wherever.

Local ones that support tool use can do the same

You can do that locally too!

Reddit has always been fake, but it used to be a real person performing creative writing pretending to be a true story. Now it's spammed out slop at scale.

You need massively expensive hardware to run them, and they aren't as good. It's pretty clear the base price of AI tools is way higher than we are being charged right now.

I wouldn't call my $2k Strix Halo computer "massively expensive", and it runs e.g. Qwen 3.6 27b brilliantly, with tons of memory to spare and is a full x86 powerhouse pulling 120w at absolute max.

IMO the programming world is far too myopic about / insistent on using laptops, especially macbooks. Just because a crappy deal exists doesn't mean everyone is forced to take it. Local AI is a high performance computing problem and laptops are fundamentally a crappy form factor for it; buy an efficient desktop computer and be surprised at what's possible even with today's crazy prices.


Is code content though? This would seem more applicable to video where people are being deceived by AI content masking as real. Rather than internal application code.

It's also easy to overwhelm reviewers with far more code than they can possibly review. And it's also the hardest stuff to review where the code at surface level looks totally fine, but takes long hours of actual testing to make sure it works.

I get the feeling the culture in radio is just not the same as regular open source. The free unrestricted sharing of things is an unusual quirk in the world rather than the norm.

In my experience, amateur radio (both licensed and license-free) and 3d printing both seem to have cultural perspectives on open source that differ considerably from the regular open source software community.

But while in 3d printing, outside of hardware, that difference often feels confused (eg, I've seen the Multiboard creator post compliments online about models that blatantly violate his own license), in radio the difference often feels hostile. You have OpenGD77, for example, with its 'we were never GPL' rug-pull that was likely illegal (they had outside contributions) [1]. You have Meshcore with its 'we are open source, except...', and, as you can see in this thread, a difficulty actually finding parts of the code. You have the heavy cultural push against uSDX (seemingly open hardware+source) toward truSDX (DRM-encumbered), and what seems like the quiet acceptance of things like QMX, where you can solder together a radio with DRM that prevents you from installing your own firmware. You even have digital modes that are legally required to be publicly documented, and actually aren't in any meaningful way: VARA FM is probably the worst offender [2], but even modes that are in-crowd enough to be advertised in FCC license exam questions are often effectively proprietary and legally dubious.

What's particularly foreign to me about the culture is that oftentimes, much of the community seems to support behavior that seems malicious from an open source perspective, and attack the open source proponents.

[1]: https://hackmd.io/@ajorg/opengd77-is-closed [2]: https://themodernham.com/reverse-engineering-vara-fm-part-1-...


If it's done in a background process then it won't impact the speed of the tool at all. When the choice is between getting data to help improve the tool at the cost of "bad manners" whatever that means, the choice is pretty easy.

There are far too many factors to assign the quality of microsoft's products to telemetry.

Having the data doesn't mean you will act on it. And doesn't mean microsofts interests are aligned with the users.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: