Hacker Newsnew | past | comments | ask | show | jobs | submit | more Uehreka's commentslogin

But after that, all the other things on the page are AWESOME! I’m super stoked about the proper HDR support and all the new node improvements.


Yeah. the HDR support is very nice. ACES got their system right the 2nd time around thankfully.


ACES 1.x was quite old and released at a time where HDR displays were pretty much non-existent. ACES 2.x is not perfect but trying to provide display rendering transforms that hit contradictory requirements is really hard, e.g., need to have a really nice rolloff and desaturation towards white whilst being able to reach corners of the gamut.


Is this the first blender release where you can change the working color space? I thought that you could in previous versions but it caused issues with some nodes.

Now I want to look into it more, but I'd imagine that "Blackbody" and sky generation nodes might still assume a linear sRGB working space.


> Now I want to look into it more, but I'd imagine that "Blackbody" and sky generation nodes might still assume a linear sRGB working space.

Since people are always asking for “real world examples”, I have to point out this is a great place to use an agent like Claude Code or Codex. Clone the source, have your coding assistant run its /init routine to survey the codebase and get a lay of the land, then turn “thinking” to max and ask it “Do the Blackbody attribute for volumes and the sky generation nodes still expect to be working in linear sRGB? Or do they take advantage of the new ACES 2.0 support? Analyze the codebase, give examples and cite lines of code to support your conclusions.”

The best part: I’m probably wrong to assert that linear sRGB and ACES 2.0 are some sort of binary, but that’s exactly the kind of knowledge a good coding agent will have, and it will likely fold an explanation of the proper mental model into its response.


why ACES and not something like P3?


Display P3 (distinct from cinema display P3, because names are hard ig) is used as a render target color space. ACES (and its internal color spaces) are designed as working spaces.

If you make a color space for a display, the intent is that you can (eventually) get a display which can display all those colors. However, given the shape of the human color gamut, you can't choose three color primaries which form a triangle which precisely contain the human color gamut. With a display color space, you want to pick primaries which live inside the gamut; else you'd be wasting your display on colors that people can't see. For a working space, you want to pick primaries which contain the entire human color gamut, including some colors people can't see (since it can be helpful when rendering to avoid clipping).

Beyond that, ACES isn't just one color space; it's several. ACEScg, for example, uses a linear transfer function, and is useful for rendering applications. A colorist would likely transform ACEScg colors into ACEScc (or something of that ilk) so that the response curves of their coloring tools are closer to what they're used it (i.e. they have a logarithmic response similar to old-fashioned analogue telecine machines).


no monitor uses ACES so it always needs to be converted to P3 to even see what you're doing right?

or you are saying if there is some intermediate transform that makes color go beyond P3 it will get clipped? then I understand...


Yeah, like, let’s say that in your compositing workflow you increase exposure then decrease brightness. If your working color space is too small, your highlights will clip when you increase exposure, then all land flat at the same level when you decrease brightness. If your working space is bigger than the gamut people can see, but your last step is to tone map into Display P3, you’ll appreciate the non-clipped highlights, even if your eyes could never comprehend what they looked like in the post-exposure-boost-pre-brightness-drop phase of the pipeline.


From what I read Rec2020 is about as wide as ACEScg, so using ACEScg will as likely clip as Rec2020, no?


The key point is that your ray tracing color space and your display color space don't need to be the same thing. Even if your monitor only displays SRGB colors, it still can be useful to have more pure primaries in your rendering system.


> or you are saying if there is some intermediate transform that makes color go beyond P3 it will get clipped?

Exactly! The conversion between ACES (or any working color space) and the display color space benefits from manual tweaking to preserve artistic intent.


One issue I often run into with this stuff is the tightly coupled nature of things in the real world. I’ll fashion an example:

Let’s say you break a job down into 3 tasks: A, B and C. Doing one of those tasks is too much for an LLM to accomplish in one turn (this is something you learn intuitively through experience), but an LLM could break each task into 3 subtasks. So you do that, and start by having the LLM break task A into subtasks A1, A2 and A3. And B into B1, B2 and B3. But when you break down task C, the LLM (which needs to start with a fresh context each time since each “breakdown” uses 60-70% of the context) doesn’t know the details of task A, and thus writes a prompt for C1 that is incompatible with “the world where A1 has been completed”.

This sort of “tunnel vision” is currently an issue with scaling 2025 agents. As useful context lengths get longer it’ll get easier, but figuring out how to pack exactly the right info into a context is tough, especially when the tool you’d reach for to automate it (LLMs) are the same tool that suffers from these context limitations.

None of this means big things aren’t possible, just that the fussyness of these systems increases with the size of the task, and that fussyness leads to more requirements of “human review” in the process.


I've been experimenting with this with a custom /plan slash command for claude code, available here: https://github.com/atomCAD/agents

Planning is definitely still something that requires a human in the loop, but I have been able to avoid the problem you are describing. It does require some trickery (not yet represented in the /plan command) when the overall plan exceeds reasonable context window size (~20k tokens). You basically have to start having the AI consider combinatorially many batches of the plan compared with each other, to discover and correct these dependency issues.


>the LLM (which needs to start with a fresh context each time since each “breakdown” uses 60-70% of the context) doesn’t know the details of task A, and thus writes a prompt for C1 that is incompatible with “the world where A1 has been completed”.

Can't that be solved with sub agents? The main agents oversees on combines code and calls sub agents for each tasks.


I don’t think that’s clear at all. In fact the proficiency of LLMs at a wide variety of tasks would seem to indicate that language is a highly efficient encoding of human thought, much moreso than people used to think.


Yea it’s amazing that the parent post literally misunderstands the fundamental realities of LLMs and the compression they reveal in linguistics even if blurry is incredible.


I recall Apple contributing a Metal backend for Cycles back when the M1 Pro/Max came out. That was a big deal, made it actually possible to do renders on a Mac in non-insane amounts of time.


I get that you’re trying to help, but trying to find a bureaucratic/technical workaround through original research and proffering it as advice is not a super helpful thing to do right now. At this phase of the game, the best advice you can give people is to follow immigration lawyers and long-time activists on social media and do what they advise.

I know we’re all used to being the problem-solvers in the room, but this is a time where those of us without specific expertise need to take direction from those who do.


I get that you yourself have not researched this problem.

These are worthless?

Notice this section: "Carry with you evidence of lawful entry or current lawful status in the United States if you have it."

https://www.nilc.org/resources/know-your-rights-expedited-re...

Edit: I went to Medellín, Colombia recently, and going through immigration, I said that I was there for my birthday. The officer then asked me, "That was May xxth?" I responded, "No, my birthday is August yyth." She handed it back and waived me through.

Anyone making a mistake with details will see greater scrutiny.


Please just take the good advice you were given. This response only proves how out of your depth you are here.

We are currently trying to get my neighbor proof of citizenship so he can get out. He is a US citizen who had his passport on him when ICE took him. Now he has no passport.


This is precisely why the passport card should be carried, but the paper passport should be left at home.


Aside; If the months are correct, the birthday is very easy to guess.


> Regardless of your thoughts on immigration and ICE, if a cop tries to pull you over, and instead you decide to speed off and barricade yourself inside a daycare, you’re probably going to get arrested.

I view ICE as wholly illegitimate. If I were on the jury in this case, I would vote to acquit Ms. Galeano no matter what the prosecution said, and many people out there who have not said this online would do the same.

There’s no need to be so fatalistic.


> I view ICE as wholly illegitimate

Though you may find this surprising, your personal opinion on the legitimacy of an organization doesn’t actually have legal standing.


> legal standing

To paraphrase a show TFA’s author introduced me to:

What are laws? We just don’t know.


Immigration Enforcement is illegitimate apparently.


> Immigration Enforcement

You make it sound so reasonable when you say it like that. But here’s the thing: If you’re gonna die on the hill that “simply enforcing immigration laws” requires invading cities and detaining people without due process, everyday people are going to come to the conclusion that it might not be worth it. It’s wild how far you guys thought you could get with that phrasing.

Like at this point, y’all have done the Abolish ICE people a huge favor. It was much easier to call them anarchist weirdos when many people had never even seen an ICE officer much less had their lives affected by their activities. But now… well, unless conservatives do in fact succeed in ending elections (which I rank unlikely) I give it a >50% chance that ICE is abolished within 10 years.


My argument here is that there's people opposed to any immigration enforcement period.

Both parties promised me last election was the last.


Many people are woefully under qualified, we need to have a working society anyway.


Yeah, I'm not sure that baby-proofing everything as proposed here is going to result in a working society.

If we expected airplanes or cars to be able to be safely operated by people with zero understanding of how such vehicles work, nobody would be getting anywhere.

You eventually reach a level of stupidity and/or incompetence after which trying to alter the product to coddle those users becomes counterproductive.


> at the pace of current AI code development, probably one or two years before Pytorch is old history.

Ehhh, I don’t know about that.

Sure, new AI techniques and new models are coming out pretty fast, but when I go to work with a new AI project, they’re often using a version of PyTorch or CUDA from when the project began a year or two ago. It’s been super annoying having to update projects to PyTorch 2.7.0 and CUDA 12.8 so I can run them on RTX 5000 series GPUs.

All this to say: If PyTorch was going to be replaced in a year or two, we’d know the name of its killer by now, and they’d be the talk of HN. Not to mention that at this point all of the PhDs flooding into AI startups wrote their grad work in PyTorch, it has a lot of network lock-in that an upstart would have to overcome by being way better at something PyTorch can never be good at. I don’t even know what that would be.

Bear in mind that it took a few years for Tensorflow to die out due to lock in, and we all knew about PyTorch that whole time.


> a lot of network lock-in that an upstart would have to overcome by being way better at something PyTorch can never be good at

Higher level code migration to the newer framework, is going to 0. You ask your favorite agent (or intern) to port and check that the migration is exact. We already see this in the multitude of deep-learning frameworks.

The day one optimization trick that PyTorch can't do but another framework can, which reduce your training cost 10x and PyTorch is going the way of the dodo.

The day one architecture which can't be implemented in PyTorch get superior performance, and it's bye bye python.

We see this with architectures which require real-time rendering like Gaussian Splatting (Instant Nerf), or the caching strategies for LLM sequence generation.

Pytorch's has 3 main selling point :

- Abstracting away the GPU (or device) specific code, which is due to nvidia's mess : custom optimized kernels, which you are forced to adapt to if you don't want to write custom kernels.

If you don't mind writing optimized kernels, because the machine write them. Or if you don't need Cuda because you can't use Nvidia hardware because for example you are in China. Or if you use custom silicon, like Grok and need your own kernels anyway.

- Automatic differentiation. It's one of its weak point, because they went for easy instead of optimal. They shut themselves off some architectures. Some language like Julia because of the dynamic low-level compilation can do things Pytorch won't even dream about, (but Julia has its own problems mainly related to memory allocations). Here with the pytorch's introduction of the "scan function"[2] we have made our way full circle to Theano, Tensorflow's/Keras ancestor, which is usually the pain point of the bad automatic differentiating strategy chosen by Pytorch.

The optimal solution like all physics Phds which wrote simulations know, is writing custom adjoint code with 'Source Code Transformation' or symbolically : it's not hard but very tedious so it's now a great fit for your LLM (or intern or Phd candidate running 'student gradient descent') if you prove or check your gradient calculation is ok.

- Cluster Orchestration and serialization : a model can be shared with less security risks than arbitrary source code, because you only share weights. A model can be splitted between machines dynamically. But this is also a big weakness because your code rust as you become dependent of versioning, you are locked with the specific version number your model was trained on.

[2] "https://docs.pytorch.org/xla/master/features/scan.html


What would stop PyTorch from implementing whatever optimization trick becomes important? Even if it requires a different API.


There are two types of stops : soft stops, and hard stops.

- Soft stops is when the dynamic graph computation overhead is too much, which mean you can still calculate, but if you were to write the function manually or with a better framework, you could be 10x faster.

Typical example involve manually unrolling a loop. Or doing kernel fusion. Other typical example is when you have lots of small objects or need to do loops in python because it doesn't vectorize well. Or using the sparsity efficiently by ignoring the zeros.

- Hard stop is when computing the function become impossible, because the memory needed to do the computation in a non optimal way explode. Some times you can get away with just writing customized kernels.

The typical example where you can get away with it are custom attention layers.

Typical example where you can't get away are physics simulations. Like for example the force is the gradient of energy, but you have n^2 interactions between the particles, so if you use anything more than 0 memory preserved during the forward pass per interaction, your memory consumption explode. And typically with things like Lagrangian or Hamiltonian neural networks where you look the discover dynamics of an energy conserving system, you need to be able differentiate at least three times in a row.

There are also energy expanding stops, where you need to find work-around to make it work like if you want to have your parameters changing shape during the optimization process like learning point clouds of growing size, and they spread you thin so they won't be standardized.


The reason a lot of people don’t do this is because Claude Code lets you use a Claude Max subscription to get virtually unlimited tokens. If you’re using this stuff for your job, Claude Max ends up being like 10x the value of paying by the token, it’s basically mandatory. And you can’t use your Claude Max subscription for tools other than Claude Code (for TOS reasons. And they’ll likely catch you eventually if you try to extract and reuse access tokens).


Is using CC outside of the CC binary even needed? CC has a SDK, could you not just use the proper binary? I've debated using it as the backend for internal chat bots and whatnot unrelated to "coding". Though maybe that's against the TOS as i'm not using CC in the spirit of it's design?


That's very much in the spirit of Claude Code these days. They renamed the Claude Code SDK to the Claude Agent SDK precisely to support this kind of usage of it: https://www.anthropic.com/engineering/building-agents-with-t...


> catch you eventually if you try to extract and reuse access tokens

What does that mean?


I’m saying if you try to use Wireshark or something to grab the session token Claude Code is using and pass it to another tool so that tool can use the same session token, they’ll probably eventually find out. All it would take is having Claude Code start passing an extra header that your other tool doesn’t know about yet, suspend any accounts whose session token is used in requests that don’t have that header and manually deal with any false positives. (If you’re thinking of replying with a workaround: That was just one example, there are a bajillion ways they can figure people out if they want to)


How do they know your requests come from Claude Code?


I imagine they can spot it pretty quick using machine learning to spot unlikely API access patterns. They're an AI research company after all, spotting patterns is very much in their wheelhouse.


a million ways, but e.g: once in a while, add a "challenge" header; the next request should contain a "challenge-reply" header for said challenge. If you're just reusing the access token, you won't get it right.

Or: just have a convention/an algorithm to decide how quickly Claude should refresh the access token. If the server knows token should be refreshed after 1000 requests and notices refresh after 2000 requests, well, probably half of the requests were not made by Claude Code.


When comparing, are you using the normal token cost, or cached? I find that the vast majority of my token usage is in the 90% off cached bucket, and the costs aren’t terrible.


If I understand transformers properly, this is unlikely to work. The whole point of “Large” Language Models is that you primarily make them better by making them larger, and when you do so, they get better at both general and specific tasks (so there isn’t a way to sacrifice generality but keep specific skills when training a small models).

I know a lot of people want this (Apple really really wants this and is pouring money into it) but just because we want something doesn’t mean it will happen, especially if it goes against the main idea behind the current AI wave.

I’d love to be wrong about this, but I’m pretty sure this is at least mostly right.


I think this is a description of how things are today, but not an inherent property of how the models are built. Over the last year or so the trend seems to be moving from “more data” to “better data”. And I think in most narrow domains (which, to be clear, general coding agent is not!) it’s possible to train a smaller, specialized model reaching the performance of a much larger generic model.

Disclaimer: this is pretty much the thesis of a company I work for, distillabs.ai but other people say similar things e.g. https://research.nvidia.com/labs/lpr/slm-agents/


Actually there are ways you might get on device models to perform well. It is all about finding ways to have a smaller number of weights work efficiently.

One way is reusing weights in multiple decoders layers. This works and is used in many on-device models.

It is likely that we can get pretty high performance with this method. You can also combine this with low parameter ways to create overlapped behavior on the same weights as well, people had done LORA on top of shared weights.

Personally I think there are a lot of potential ways that you can cause the same weights to exhibit "overloaded" behaviour in multiple places in the same decoder stack.

Edit: I believe this method is used a bit for models targeted for the phone. I don't think we have seen significant work on people targeting say a 3090/4090 or similar inference compute size.


The issue isn't even 'quality' per se (for many tasks a small model would do fine), its for "agentic" workflows it _quickly_ runs out of context. Even 32GB VRAM is really very limiting.

And when I mean agentic, i mean something even like this - 'book a table from my emails', which involves looking at 5k+ tokens of emails, 5k tokens of search results, then confirming with the user etc. It's just not feasible on most hardware right now - even if the models are 1-2GB, you'll burn thru the rest in context so quickly.


Yeah - the whole business model of companies like OpenAI and Anthropic, at least at the moment, seems to be that the models are so big that you need to run them in the cloud with metered access. Maybe that could change in the future to sale or annual licence business model if running locally became possible.

I think scale helps for general tasks where the breadth of capability may be needed, but it's not so clear that this needed for narrow verticals, especially something like coding (knowing how to fix car engines, or distinguish 100 breeds of dog is not of much use!).


> the whole business model of companies like OpenAI and Anthropic, at least at the moment, seems to be that the models are so big that you need to run them in the cloud with metered access.

That's not a business model choice, though. That's a reality of running SOTA models.

If OpenAI or Anthropic could squeeze the same output out of smaller GPUs and servers they'd be doing it for themselves. It would cut their datacenter spend dramatically.


> If OpenAI or Anthropic could squeeze the same output out of smaller GPUs and servers they'd be doing it for themselves.

First, they do this; that's why they release models at different price points. It's also why GPT-5 tries auto-routing requests to the most cost-effective model.

Second, be careful about considering the incentives of these companies. They all act as if they're in an existential race to deliver 'the' best model; the winner-take-all model justifies their collective trillion dollar-ish valuation. In that race, delivering 97% of the performance at 10% of the cost is a distraction.


> > If OpenAI or Anthropic could squeeze the same output out of smaller GPUs and servers they'd be doing it for themselves.

> First, they do this; that's why they release models at different price points.

No, those don't deliver the same output. The cheaper models are worse.

> It's also why GPT-5 tries auto-routing requests to the most cost-effective model.

These are likely the same size, just one uses reasoning and the other doesn't. Not using reasoning is cheaper, but not because the model is smaller.


But they also squesed a 80% cut in O3 at some point, supposedly purely on inference or infra optimization


> delivering 97% of the performance at 10% of the cost is a distraction.

Not if you are running RL on that model, and need to do many roll-outs.


No I don’t think it’s a business model thing, I’m saying it may be a technical limitation of LLMs themselves. Like, that that there’s no way to “order a la carte” from the training process, you either get the buffet or nothing, no matter how hungry you feel.


Unless you're programming a racing sim or maybe a CRUD app for a local Kennel Club, perhaps?

I actually find that things which make me a better programmer are often those things which have the least overlap with it. Like gardening!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: