Hacker Newsnew | past | comments | ask | show | jobs | submit | leoncos's commentslogin

Agreed. The limitations of human context window and communication bandwidth restrict the complexity of large-scale software.

LLM will have an extremely large context window and extremely high communication bandwidth in the future. Therefore, even more complex large-scale software will emerge.


Perhaps advanced AI isn't cheaper than humans.

Assuming the intelligence of a model continuously improves with scale, the token price of the best model will become increasingly expensive.

I know that tokens are currently experiencing rapid price drops, but they will eventually encounter physical limitations.


Why? What physical limitation will dictate that we can't have 1B tokens for cheap?

Assume you build a machine that can simulate some system 1:1. Then it means the machine is exactly the same as the system, and the cost of running it will no less than the system itself.

If you want to reduce the cost but still get something useful, you have to make some abstraction, and we all know that any abstraction is leaky.


Thermodynamics is a harsh mistress

We are very, very far from thermodynamic limits. Lots of people have done the math, and current-gen systems use ~1000000000x more power than the Landauer limit, and ~100000x more power than ideal digital implementation on existing CMOS.

Currently, most AI systems work so that there is a large pool of memory on one side, compute on other side, and a very fat pipe between them. 90%+ of all energy goes into moving data from one side to the other, and selecting the specific element you wish to use from the large pool of ram. The energy cost of holding that data in memory and reading it from the memory cells, and the energy cost of doing the actual computation with low-precision FP are both trivial in comparison.

The systems are built this way because this is the most flexible architecture, and can be used for many different kinds of workloads. But the workload of a transformer in no way requires this flexibility. All the data is fairly local to the execution units that consume it. If you design a system as full PIM, where each ALU is associated and located with the small storage pool that contains only the elements used by that alu, and then tile that out to implement the full model, you cut out most of the energy cost of moving data. The cost is you need much more silicon to implement a working system, but the benefit is not just improved energy-efficiency, but also token speed and silicon efficiency.

The industry is moving towards such designs, with many startups working towards it with different approaches, Nvidia's recent aquisition* of Groq, etc. There is a well-understood path towards ~1000x higher token speeds at ~1000x better energy efficiency, that requires no new innovations, just investment of money into specialization.

There are even more gains if you move the weights into ROM, but that would require you to specialize not just for a specific type of model, but also for a specific set of model weights, ala Taalas.

I find the AI discourse is diseased because on one side you get people breathlessly overestimating the current state of the industry and progress that's going to happen in the next ~2 years, and on the other side people assume that the technology as is is what it will always be and completely ignore that the industry is aware of and actively working towards many ways to improve hardware, it's just that complex leading edge silicon chips take years to take from idea to working products, and transformer inference was only very recently proven to be a market large enough to specialize for.


the first law of thermodynamics is you do not talk about thermodynamics

The last sentence in the article is correct:

"Maybe I should consider transforming my woodworking hobby into a profession."

As an AI optimist, I think all forced labor should eventually be done by AI. People can then spend their time pursuing their own hobbies. Just as many people still play Go after AlphaGo appeared, because they genuinely love the game.

In the future, coding may return to being an art form. People will no longer focus on utility alone, but instead on the enjoyment of the process of writing code itself.


> As an AI optimist, I think all forced labor should eventually be done by AI. People can then spend their time pursuing their own hobbies. Just as many people still play Go after AlphaGo appeared, because they genuinely love the game.

And what sort of economic system do you imagine will be in place to support billions of people being able to just play Go all day long? How do you imagine the large capitalistic global powers transitioning into that state?


I think that huge deflation will follow for everything except land value.

If automation makes producing food so cheap that it is almost free than it is ridiculously easy to acquire it. Similarly automated construction.

The way I see it the economy will point towards outer space. That’s where most jobs and flow of economy will be.

However most people will have 10x times uplift in purchasing power compared to today so their relative poverty will be ridiculous for us to call it the poverty but they will still think they are poor and troubled.

Generally I don’t think it will be utopia for the people living in that moment but if you look from medieval times at today it looks like utopia for serfs from the past. You however wouldn’t call it an utopia because your standards grew as fast as your purchasing power.

I think that rich and poor will be separated by accessibility to anti age treatment and other bodily improvements.

The tragedy of the poors in the future will be living measly 80 year old life like a today millionaire and that will be considered lower class. Those people with wrinkles we don’t want to look at because of uncomfortable pangs of guilt.


Food is already extremely cheap to produce. So cheap, that we waste approximately 50% of it, before it even gets into households. Yet we are still forced to use money to buy it in stores, instead of getting it at no or almost no cost. I think they will find ways to keep everything costing quite a bit.

A lot of this is because food is hard.

For example: people in developing countries throw away more food then in developed countries although it is relative to their income much more value. The reason is because they often do not own fridges, use a lot of rice which spoils fast or have difficulties with the food supply chain in other forms.

Food waste is a bad indicator for food value.


> In the future, coding may return to being an art form. People will no longer focus on utility alone, but instead on the enjoyment of the process of writing code itself.

Great, if someone will find it in them to pay me. Real bad, if not.


So you believe that your work will be done by AI and you will enjoy life more? This is not a loaded question, just trying to understand what your future ideal day / week would look like as an "ai optimist".

That’s just not economically viable. Even if it becomes viable after some singularity event the path there will be 1000 the upheaval seen during the wipe out of manufacturing and mining

Many behaviors are determined by hormones. Men are no exception. When calm, men tend to prefer intellectual women, but when they're impulsive after drinking in a bar, they prefer sexy women.

It is OK! Don't forget we are animals first. Obviously we need that neocortex to keep decisions in check!

Women with higher testosterone levels prefer more risks.

For example: when women are starting dating and fell in love: their testosterone level go up for 2 months


In my experience it’s usually the other way around

It is a curious thing how drinking affects people in different ways. Some people get reflective, others belligerent.

Endocrine and nervous system function take a dramatic shift when processing alcohol.

The aggressive drunks didn't get there overnight. They drink more heavily and regularly. Their nervous system is amped up even when "sober", but it takes months to years to fully recover from that state and there might even be permanent nerve damage. All this to say, they will just drink again to calm their nerves and are easily pissed off. That's alcoholism for you.

On the flip side, some of the most body aware and even keeled people I've ever known are recovered alcoholics. There's a reason alcohol has been a rite of passage for most of human history.


[flagged]


Nope

Great work! I really hope it can be designed to be agent-friendly. The current CodeX/Claude code sandbox functionality is very limited; it would be wonderful to use this as a sandbox.

I use bubblewrap inside Ubuntu inside WSL. https://github.com/nix-tools/bubblebox

I had issues with bubblewrap inside the container and creating namespaces. Need to dig into a bit further.

Are you saying you're having problems with bubblewrap in general, or specifically with bubblebox that I linked to?

Feel free to submit bug reports. I actively use it on NixOS, Ubuntu WSL and nix-darwin.


Maybe you should put it in SHOW HN

When I use Codex/Claude to complete a computer vision task, such as extracting assets from an image, OpenCV is their default solution. However, I believe that using YOLO and other methods is outdated. The best solution now is to directly use Nano Banana or other AI image models. A paper has proven that image generation models can perform most CV tasks well. I believe the new OpenCV should become a wrapper for VLM or AI image models.

Whenever you can run a model like Nano Banana or other vision-LLM with the same compute and time performance/restrictions as an OpenCV or YOLO call, you can make that comparison. Until then, I would not call YOLO and OpenCV outdated, it's simply wrong. There's a time and place for big V-LLMs just as there is a time and place for more "traditional" computer vision methods.

I can get great results from a YOLO model with 30M to maybe 300M params. To get decent CV from a LLM 8B params is the absolute minimum, closer to 30B for interesting tasks

I might be on board about LLMs being the future of OCR (though many would disagree), but for general CV they are very inefficient for very limited benefit


They can however be extremely useful for curating training data. Also things like SAM and the DINO (/grounding dino) models.

Also if they are better then you can also have a flow that’s cheap model -> marginal cases go to more complex thing (and a chain of these).

The yolo models are really shockingly good for their cost and how well they can work with not much training data as well.


>for very limited benefit

Due to how simple they are to work with they will become popular. Compare NLP before and after GPT-3. GPT-3 majorly brought down the complexity and skill needed for doing NLP tasks even if traditional NLP is much much faster. Ultimately ease of development will win out and the industry will work towards optimizing running such LLMs to make it cheap enough to run.


I've built hardware with a pi zero 2 + pi cam running a mildly fine-tuned YOLO doing local-only object detection as a USB-OTG device, in a use case where any off-device API calls would have been totally unacceptable, and where the object detection was part of the human interaction loop with a hard ceiling of 300ms on the total interaction time of which the object detection was only one process among many.

We're not going to fit Nano Banana or anything like it on a device with 512MB RAM and a GPU old enough to be irrelevant, and again, API calls just aren't on the menu.


> API calls just aren't on the menu

Even if they were an option, your 300ms latency requirement would exclude them anyway.


That is a very uninformed view. Real time CV is not going to be doing that anytime soon.

Great, let me know when those models can run on-server and process/analyze streams of ID images with less than 100ms of latency. You’ll need to make sure you have a massive set of training data including all manner of slightly blurred and slightly distorted ID cards

Exactly, and all on an embedded system with quite restrictive settings and no overclocked Intel lastest generation combined with NVIDIA's 10k graphic cards.

Embedded systems can make network calls to powerful, GPU equipped servers.

Sure. Claude does that. "Cogitated for 1m 50s" doesn't work for real-time applications.

You can submit many queries in parallel to increase throughout. Smaller models and faster hardware can reduce the time per query too.

None of that gets you the 100ms response time the parent poster talked about, for something like "who is at my doorbell?" real-time uses.

Ok. Claude will not work for this use case because none of the sample data (weirdly blurry ID images) is in the training data.

They really shouldn't, though.

It can offer a ton of user value. There is a whole industry built upon this idea, Internet of Things.

IoT wasn't not built on "send all the data off to a hosted GenAI". It predated them by quite a few years.

The GPUs were doing video transcoding instead of GenAI.

You can run OpenCV on a GPU-less Raspberry Pi or other IoT device just fine.

And most IoT devices aren't doing video transcoding at all. You're making some very odd assertions in this thread.


>And most IoT devices aren't doing video transcoding at all.

The data gets streamed to the cloud where servers with GPUs transcode it. I'm pointing out that IoT devices historically have reached out to servers with GPUs even before GenAI.


Most IoT devices have no camera and communicate with servers that have no need for a GPU at all.

do you realize how many edge or unconnected nodes do OpenCV work?

some SBC w/ an industrial camera that is doing pick-place or go/no-go operations on a conveyor belt against a singular object type doesn't need a huge image-gen/llm model governing it.

I mean have you even considered the kind of performance an opencv function can get w/ just mask-matching? I mean even with a fancy YOLO model these answers get thrown out in 1.5-50ms ; this is just a wholly different time scaling.


100.000 pictures take a lot of time with LLMs.

Its a lot better, faster, cheaper to use LLMs for initial labeling together with hand finetuning and then training YOLO with this.

Training YOLO takes a few hours and is then very fast.


If I want to identify and measure the size of round things in my orange sorter machine, I shouldn't have to resort to an unnecessarily complicated solution just because some AI bros can't understand that not everything needs to be an AI model.

Like, the AI model tools already exist, all that would be accomplished if OpenCV pivoted would be to take it away for people who want to do low-level vision programming. It wouldn't add anything useful to the world, just destroy an excellent library.


"When I use..."

Dude, in business we think in terms of large numbers, internationally easily in billion times processing images. This wouldn't cut it.

Also, do you buy the mega expensive super individually designed shoes from the best shoemaker there is to march along though some dirt or simply stick to gumboots?

OpenCV is used behind the scenes for many of the fancy stuff those major AI provider pretend to do. Claude is a huge system and not a LLM anymore.


I am confused, how can functions that output images help with functions that should take images as input?

They’re multimodal LLMs trained for image generation. Turns out that if you want to generate images you gotta know what things look like.

That's not helpful my brother. If you have details share them, if not, don't pretend you are more illuminated than me.

Is the image(text) function reversible? Or are they brute force searching a nearest neighbor like word2vec/hash brute forcing.


Google recently released their paper "Image Generators are Generalist Vision Learners" about exactly this. They fine tuned Nano Banana pro into what they call Vision Banana which can do segmentation etc.

https://arxiv.org/abs/2604.20329


very interesting, it seems that they use image(image,text) functions to process/filter images, effectively generating arbitrary bitmap(image), where bitmap is of the same dimension as image.

Some politicians are impeccable; if you ask them thorny questions like scandals, they always throw out a new question to change the topic.

I don't think anyone is born like that. Politicians are trained for it. I remember a podcast where they talked about Al Franken and how it was difficult to get him to stop answering questions. The goal: one, maybe two or three talking points at any given time and no matter what question anyone asks, it is your job as a politician to give a non answer and pivot to the point of the day.

Yes, I realize how easily language can be manipulated.

For example, when some people in high positions enjoy privileges, politicians will defend them by talking about their contributions, and the topic shifts from privileges to contributions. Similarly, when a few bad people emerge from a certain ethnic group, politicians will constantly emphasize these few bad people to negate the entire ethnic group and call for action against the group. The most crucial factors should be whether contributions and privileges are commensurate, and the degree of correlation between the ethnic group and individual events. But nobody discusses this.


It's especially frustrating watching congressional hearings. Since both "sides" are aware that the cameras are rolling and that they are there to score points/create soundbites (rather than actually learn from each other--god forbid) it's just both sides talking past each other and not doing the analytical work of a good conversation.

Even when I'm on one side of the argument, it's just as frustrating to hear my own side just move on to their next pre-written question/response instead of engaging with the underlying issues. I want substantive debate and discussion and possibly consensus, but that's sadly not the reality in most cases of import.


There is no "our side" and that's the problem. There are issues with a clear majority 80% plus voters agree on and steadily over decades and yet veto points (filibuster, committee chairs, holds) plus donor capture means a motivated minority with money can block majority-supported policy indefinitely. You can always have arguments with philosophy or case law or whatever that for example carried interest loophole is good for America but overwhelming majority of US Americans support scraping it. Why haven't been able to do that? How many people benefit from this loophole? (Estimates are just a few thousands of people who benefit, not millions in a country of over three hundred million). Similarly, the IRS Direct File system was a modest improvement over the status quo. Why was it scrapped? How many people benefit from this? Remember SVB? Remember how everyone who opposed TARP suddenly supported bailing out SVB depositors just because now these were companies in which they had invested? The point is there can't be a real debate when the outcome of the debate determines your paycheck.

Often they will just talk around a question too. They will be asked if they will give everyone free ice cream if elected, and they will just talk about how great ice cream is, how important ice cream is, etc, but never actually answer the question.

I'm surprised there isn't a term for doing that.


It's not slick, but I've always labeled this as; answering the question they want to answer (rather than answer the question that was asked).

Isn’t that just dodging the question?

This interviewer didn't let it go: https://www.youtube.com/watch?v=pyqnu6ywhR4

This is a basic survival skill in politics, and not just for scandals.

Let's take Bernie Sanders, because he's well-known in Vermont for being happy to go off-script and actually talk to people. During my only personal conversation with him, he was delighted to discover that a small, local event actually served excellent chicken. (Apparently politicians eat a lot of rubbery chicken.)

But at that same event, Bernie was approached by a woman asking some conspiracy-tinged question. And he very gracefully deflected and changed the topic. I think that just about anyone who interacts with the public is likely to pick up some version of this skill eventually.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: