With previous attempts at this problem the shaded examples could be quite misleading because details that appeared to be geometric were actually just painted over the surface as part of the texture, so when you took that texture away you just had a melted looking blob with nowhere near as much detail as you thought. I'd reserve judgement until we see some unshaded meshes.
Seems like a tougher nut to crack than image generation was, since there isn't a bajillion high quality 3D models lying around on the internet to use as training data, everyone is trying to do 3D model generation as a second-order system using images as the training data again. The things that make 3D assets good, the tiny geometric details that are hard to infer without many input views of the same object, the quality of the mesh topology and UV mapping, rigging and skinning for animation, reducing materials down to PBR channels that can be fed into a renderer and so on aren't represented in the input training data, so the model is expected to make far more logical leaps than image generators do.
I know where I could get several hundred terabytes (maybe an exabyte? It’s constantly growing) of ultra high quality STL files designed for 3D printing. I just don’t have the storage or the knowledge of how to turn those into a model that outputs new STL files.
I’d imagine it’d require a ton of tagging, although I have a good idea of how I could leverage existing APIs to tag it mostly automatically by generating three still image thumbnails of the content, then feeding that through CLIP, and verifying that all two or three agree on what it’s an STL of, and manually tag the ones that fail that test.
> since there isn't a bajillion high quality 3D models lying around on the internet to use as training data
There aren't a bajillion high-quality 3D models of everything, but there are an unbounded number of high-quality 3D models of some things, due to the existence of procedural mesh systems for things like foliage.
You could, at the very least, train an ML model to translate images of jungles into 3D meshes of the trees composing them right now.
Although I wonder if having a few very-well-understood object types like these, to serve as a base, would be enough to allow such a model to deduce more generalized rules of optics, such that it could then be trained on other object categories with much smaller training sets...
It almost seems easier, in that you have an arbitrary # of real world objects to scan and the hardware is heavily commoditized (IIRC iPhones have this built in at highres now?)
In context, the conversation was beyond a dichotomy - thankfully. Having only 2 choices leaves conversation at people insisting one is better, and becomes an argument about definitions where people take turns alternating being "right" from the viewpoint of a neutral observer.
It's proposing a solution to the author's observation that everyone is doing it in second order fashion and missing a significant amount of necessary data.
The implication is that rather than doing it the hard way via the already-obtained 2nd order dataset, it'll be easier to get a new dataset, and getting that dataset will be significantly easier that it was to get the second-order dataset, as you don't need to worry about aesthetic variety as much as teaching what level of detail is needed in the mesh for it to be "real"
I don't think they have a specific use-case for this model, they're throwing ideas at the wall again in the hopes some of them stick and eventually turn into another product. The paper doesn't discuss any of the problems that would need to be solved in order to easily generate game-ready assets so I think it's safe to assume that it currently doesn't.
For games at the very least you need to consider polygon budget, getting reasonably good UVs, and generating materials which fit into a PBR shader pipeline, at least if it's going to work with rendering pipelines as we know them today (as opposed to rendering neural representations directly, which is a thing people are trying to do but is totally unproven in production).
I'd be willing to bet you could create a diffusion model to map unrefined meshes to UV-fixed and remeshed surfaces. If you had a large enough library of good meshes you just programmatically mess 'em up and use that as the dataset.
That's assuming your generator produces a normal map, the ones I've seen do not, the only texture channel they output is color. That being the one channel that a model trained on images is naturally equipped to produce.
You can generate pretty reliable texture depth maps from just an image. It’s going to be trash if you’re trying to generate the depth for the entire 3D model but I presume it’s going to go a good job with just texture. Then you just use a displacement based on the depth map.
Only if you have multiple images of the same areas so that you can extract actual position. And there is no guarantee that multiple pictures of the same model have the same detail, much less in a manner that can be triangulated with accuracy. A lot of the photogrammetry algorithms discard points that don't match certain error-bars.
So yes, there might be a wooden frame in the middle of that window, but does it match the math on both angles of it? Doubt it.
I don't know much about 3D printing, would be very interested in learning more about this idea if you'd be so kind as to expand on it. Could I have AI spend all day auto scanning what teens are doing on instagram, auto generate toys based on it, auto generate advertisements for the toys, auto 3D print on demand?
I think their suggestion was more "I have a photo of a cool horse, and now I would like a 3D model of that same horse."
Another way of looking at it, 3D artists often begin projects by taking reference images of their subject from multiple angles, then very manually turning that into a 3D model. That step could potentially be greatly sped up with an algorithm like this one. The artist could (hopefully) then focus on cleanup, rigging, etc, and have a quality asset in significantly less time.
The question is whether this actually "creates a 3d model based on the picture",
or if it "finds an existing model that looks similar to the picture and texture map it".
Hypothetically, sure, assuming the parent comment that these meshes are sufficient for modelling is correct and that you can find any teens who want a non-digital toy.
I think a good hobbyist application for this would be something like modelling figurines for games, which is already a pretty popular 3D printing application. This would allow people with limited modelling skills to bring fantastical, unique characters to life “easily”.
OP is suggesting that this (AI model? I honestly am behind on the terminology) could replace one of the common steps of 3D printing - specifically, the step where you create a digital representation of the physical object you would want to end up with.
There are other steps to 3D printing in general, though; a super rough outline:
- Model generation
- "Slicing" - processing the 3D model into instructions that the 3D printer can handle, as well as adding any support structures or other modifications to make it printable
- Printing - the actual printing process
- Post-processing - depending on the 3D printing technology used, the desired resulting product, and the specific model/slicing settings, this can be as simple as "remove from bed and use" to "carefully snip off support structures, let cure in a UV chamber for X minutes, sand and fill, then paint"
As I said before, this AI model specifically would cover 3D model generation. If you were to use a printing technology that doesn't require support structures, and handles color directly in the printing process (I think powder bed fusion is the only real option here?), the entire process should be fairly automatable - a human might be needed to remove the part from the printer, but there might not be much post-processing to do.
The rest of your desired workflow is a bit more nebulous - I don't know how you would handle "scanning what teens are doing on instagram", at least in a way that would let you generate toys from the information; generating and posting the advertisement shouldn't be too hard - have a standardish template that you fill in with a render from the model, and the description; printing on demand again is possible, though you'll likely need a human to remove the part, check it for quality and ship it. You could automate the latter, but that would probably be more trouble than it's worth.
Interesting, to be clear I don't think this is a good idea and it's kinda my nightmare post capitalism hell. I just think it's interesting this could be done now.
On finding out what teens want, that part is somewhat easy-ish, I guess you'd need a couple of agents, one that is scanning teen blogs for stories and then converting them to key words, then another agent that takes the key words (#taylorswift #HaileyBieberChiaPudding #latestkdrama etc) into Instagram, after a while your recommend page will turn into a pretty accurate representation of what teens are into, then just have an agent look at those images and generate difs of them. I doubt it would work for a bunch of reasons, but it's an interesting thought experiment! Thanks!
Looking forward to experimenting with this.