I think this is not quite the right analogy. A better analogy is procedurally generated music, because that’s what model-generated music is. But just like with LLM code generation, the input to the program is natural language (or maybe multimodal image/audio/whatever), and the program is implicitly defined by learning from examples of music.
I think a lot of the issues are the same. Like you might expect the model to go off the rails if you venture away from the bulk of the training distribution. Or maybe the b most effective way to use it creatively is in some kind of interactive workflow revising specific chunks of the project instead of vibe-coding/composing from whole cloth.
I think it’s just scale to the moon rhetoric, like “what if we used 100x more compute?”. Since the units are power and not energy, I’m going with 10 GW continuous load (for training? inference?) but I think it’s not exactly meant literally
I think this is just loose terminology, instead of squaring they should have said “multiply by the complex conjugate”, which is what you do to quantum mechanical wavefunctions to obtain real-valued probability amplitudes
This paper “were RNNs all we needed?” explores this hypothesis a bit, finding that some pre-transformer sequence models can match transformers when trained at appropriate scale. Though they did have to make some modifications to unlock more parallelism
Are there particular libraries that make your setup difficult? I just manually set the index and source following the docs (didn’t know about the auto backend feature) and pin a specific version if I really have to with `uv add “torch==2.4”`. This works pretty well for me for projects that use dgl, which heavily uses C++ extensions and can be pretty finicky about working with particular versions
This is in a conventional HPC environment, and I’ve found it way better than conda since the dependency solves are so much faster and I no longer experience PyTorch silently getting downgraded to cpu version of I install a new library. Maybe I’ve been using conda poorly though?
I don’t think so? the double descent phenomenon also occurs in linear models under the right conditions. My understanding of this is that when the effective model capacity is exactly equal to the information in the dataset, there is only one solution that interpolates the training data perfectly, but when the capacity increases far beyond this there are many such interpolating solutions. Apply enough regularization and you are likely to find an interpolating solution that generalizes well
This is already what the funding agencies do! The merit review process solicits outside expert assessment of the importance, feasibility, and potential impact (including economic development and societal impact) of the research, and the funding agencies do their best to maintain a balanced portfolio of research that is promising for advancing national priorities
By all means we should discuss the transparency of this process, what those national priorities are, and exactly what we (collectively as taxpayers) the risk-reward tradeoff should be. But let’s not pretend that the funding agencies don’t already view science as a public investment, or be too hasty about dismissing the potential medium term economic value of research into for example geology and geochemistry on mars
This is all awesome, but a bit off topic for the thread which focuses on AI for science
The disconnect here is that the cost of iteration is low and it’s relatively easy to verify the quality of a generated C program (does the compiler issue warnings or errors? Does it pass a test suite?) or a recipe (basic experience is probably enough to tell if an ingredient sends out of place or proportions are wildly off)
In science, verifying a prediction is often super difficult and/or expensive because at prediction time we’re trying to shortcut around an expensive or intractable measurement or simulation. Unreliable models can really change the tradeoff point of whether AI accelerates science or just massively inflated the burn rate
They have some interesting analysis of the elastic deformation that happens during the rolling process (as opposed to the ball just falling or sliding). Turns out it’s pretty sensitive to the elastic constant of both the ball and the wall
I think a lot of the issues are the same. Like you might expect the model to go off the rails if you venture away from the bulk of the training distribution. Or maybe the b most effective way to use it creatively is in some kind of interactive workflow revising specific chunks of the project instead of vibe-coding/composing from whole cloth.