More

diyer22 · 2025-11-27T14:03:00 1764252180

A record of getting burned by a “sloppy, low-quality, and irresponsible” paper from Big Tech.

diyer22 · 2025-10-16T16:41:23 1760632883

Thanks for all the great feedback! I've created a Twitter thread to discuss future development and share updates. Would love to connect with you all there:

https://x.com/diyerxx/status/1978531040068321766

Getting started on Twitter is so tough—engaging with my posts would really help me out a lot!

diyer22 · 2025-10-12T10:53:49 1760266429

I believe DDN is capable of handling TTS (text-to-speech) tasks, because with the text condition, the generation space is significantly reduced.

And it's recommended to combine it with an autoregressive model (GPT) for more powerful modeling capabilities.

diyer22 · 2025-10-12T05:57:52 1760248672

During neural network training, the ground truth (GT) must be known to compute the loss.

In DDN, the GT is only used to calculate the loss and guide sampling; it never becomes an input to the model.

diyer22 · 2025-10-11T20:34:32 1760214872

Exactly what i think!

- The DDN single-shot generator architecture is more efficient than diffusion.

- DDN is fully end-to-end differentiable, allowing for more efficient optimization when integrated with discriminative models or reinforcement learning.

- Moreover, DDN inherently avoids mode collapse.

These points are all mentioned in the blog: https://github.com/Discrete-Distribution-Networks/Discrete-D...

diyer22 · 2025-10-11T20:26:29 1760214389

Thanks for the idea, but DDN and flow can’t be flipped into each other that easily.

1. DDN doesn’t need to be invertible. 2. Its latent is discrete, not continuous. 3. As far as I know, flow keeps input and output the same size so it can compute log|detJ|. DDN’s latent is 1-D and discrete, so that condition fails. 4. To me, “hierarchical many-shot generation + split-and-prune” is simpler and more general than “invertible design + log|detJ|.” 5. Your design seems to have abandoned the characteristics of DDN. (ZSCG, 1D tree latent, lossy compression)

The two designs start from different premises and are built differently. Your proposal would change so much that whatever came out wouldn’t be DDN any more.

godelski · 2025-10-16T06:23:22 1760595802

Fwiw, I'm not convinced its a Flow and that's my niche. But there are some interesting similarities that actually make me uncertain. A deeper dive is needed.

But you address your points

  > 1. DDN doesn’t need to be invertible

The flow doesn't need to be invertible at every point in the network. As long as you can do the mapping the condition will hold. Like the classic layer is [x_0, s(x_1)*x_0 + t(x_1)]. s,t are parametrized by an arbitrary neural network. But some of your layers look more like invertible convolutions.

I think it is worth checking. FWIW I don't think an equivalence would undermine the novelty here.

  > 2. Its latent is discrete, not continuous.

That's perfectly fine. Flows aren't restricted that way. Technically all flows aren't exactly invertible as you noise the data to dequantize it.

Also note that there are discrete flows. I'm not sure I've seen an implementation where each flow step is discrete but that's more an implementation issue.

  > 3. As far as I know, flow keeps input and output the same size so it can compute log|detJ|.

You have a unet, right? You're full network is doing T:R^n -> R^n? Or at least excluding the extra embedding information? Either way I think you might not interested in "Approximation Capabilities of Neural ODEs and Invertible Residual Networks". At minimum their dimensionality discussion and reference to the Whitney Embedding Theorem is likely valuable to you (I don't think they say it by name?).

You may also want to look at RealNVP since they have a hierarchical architecture which does splitting.

Do note that NODEs are flows. You can see Ricky Chen's works on i-resnets.

As for the Jacobian, I actually wouldn't call that a condition for a flow but it sure is convenient. The typical Flows people are familiar with use a change of variables formula via the Jacobian but the isomorphism is really the part that's important. If it were up to me I'd change the name but it's not lol.

  > 5. Your design seems to have abandoned the characteristics of DDN. (ZSCG, 1D tree latent, lossy compression)

I think you're on the money here. I've definitely never seen something like your network before. Even if it turns out to not be its own class I don't think that's an issue. It's not obviously something else but I think it's worth digging into.

FWIW I think it looks more like a diffusion model. A SNODE. Because I think you're right that the invertibility conditions likely don't hold. But in either case remember that even though you're estimating multiple distributions that that's equivalent to estimating a single distribution.

I think the most interesting thing you could do is plot the trajectories like you'll find in Flow and diffusion papers. If you get crossing you can quickly rule out flows.

I'm definitely going to spend more time with this work. It's really interesting. Good job!

diyer22 · 2025-10-11T17:10:38 1760202638

Yes, there is a transform that make final size of stem features remains unchanged

diyer22 · 2025-10-11T06:09:34 1760162974

It's just a coincidence—the guided images used for ZSCG all come from Celeb-A, whereas the DDN model was trained only on FFHQ.

Besides, I feel the red shoulder strap/blob is reconstructed rather poorly.

diyer22 · 2025-10-11T06:01:42 1760162502

This understanding is incorrect. The video samples all the leaf nodes of the entire tree only to visualize the distribution in latent space. In normal use, only the L outputs along a single path are generated.

throwaway314155 · 2025-10-11T21:29:28 1760218168

Interesting, thanks for clarifying.

diyer22 · 2025-10-11T04:24:56 1760156696

I understand that by "discrete number" you mean the selected output of each layer.

Both the "feature" and the "selected output" are designed to be passed to the next layer.

cttet · 2025-10-11T06:01:52 1760162512

Oh it is selected output, yes I meant that I was a bit confused. So in the initial design when you first tried it, you passed both to the next layer? or it is part of where you find out to perform better?

diyer22 · 2025-10-11T06:24:05 1760163845

Even in the earliest stages of the DDN concept, we had already decided to pass features down to the next layer.

I never even ran an ablation that disabled the stem features; I assume the network would still train without them, but since the previous layer has already computed the features, it would be wasteful not to reuse them. Retaining the stem features also lets DDN adopt the more efficient single-shot-generator architecture.

Another deeper reason is that, unlike diffusion models, DDN does not need the Markov-chain property between adjacent layers.

cttet · 2025-10-11T06:28:33 1760164113

Thanks! Really like your intuition!