Reminds me that I found an alternative way of sampling an SDF: First take a samp...

shiandow · 2025-08-04T13:18:35 1754313515

Technically that only requires calculating one extra row and column of pixels.

It is indeed scale invariant but I think you can do better, you should have enough to make it invariant to any linear transformation. The calculation will be more complex but that is nothing compared to evaluating the SDF

NohatCoder · 2025-08-04T14:04:41 1754316281

I do believe that it is already invariant to linear transformations the way you want, i.e. we can evaluate the corners of an arbitrary parallelogram instead of a square and get a similar coverage estimate.

shiandow · 2025-08-04T16:15:09 1754324109

Similar maybe but it can't be the same surely? Just pick some function like f(x,y) = x-1 and start rotating it around your centre pixel, the average (s1+s2+s3+s4) will be the same (since it's a linear function) but there's no way those absolute values will remain constant.

You should be pretty close though. For a linear function you can just calculate the distance to the 0 line, which is invariant to any linear transformation that leaves that line where it is (which is what you want). This is just the function value divided by the norm of the gradient. Both of which you can estimate from those 4 points. This gives something like

    dx = (s2 - s1 + s4 - s3)
    dy = (s3 - s1 + s4 - s2)
    f  = (s1+s2+s3+s4)/4
    dist = f / sqrt(dx*dx + dy*dy)

NohatCoder · 2025-08-04T18:52:22 1754333542

My function approximate coverage of a square pixel, so indeed if you rotate a line around it at a certain distance that line will clip the corners at some angles and be clear of the pixel at other angles.

brookman64k · 2025-08-04T13:17:44 1754313464

Would that be done in two passes? 1. Render the image shifted by 0.5 pixels in both directions (plus one additional row & column). 2. Apply above formula to each pixel (4 reads, 1 write).

ralferoo · 2025-08-04T13:48:32 1754315312

That'd be one way of doing it.

You don't technically need 4 reads per pixel either, for instance you can process a 7x7 group with a 64-count thread group. Each thread does 1 read, and then fetches the other 3 values from its neighbours and calculates the average. Then the 7x7 subset of the 8x8 write their values.

You could integrate this into the first pass too, but then there would be duplication on the overlapped areas of each block. Depending on the complexity of the first pass, it still might be more efficient to do that than an extra pass.

Knowing that it's only the edges that are shared between threads, you could expand the work of each thread to do multiple pixels so that each thread group covers more pixels the reduce the number of pixels sampled multiple times. How much you do this by depends on register pressure, it's probably not worth doing more than 4 pixels per thread but YMMV.

NohatCoder · 2025-08-04T13:43:30 1754315010

You certainly could imagine doing that, but as long as the initial evaluation is fairly cheap (say a texture lookup), I don't see the extra pass being worth it.