I agree that it's not possible to have them do 13th century architectural style ...

lukev · 2025-03-12T22:51:00 1741819860

I hope you're right. Are you aware of any image-gen models that apply chain-of-thought style reasoning (either agentic or via reinforcment learning to shape outputs?)

For example, consider this imagery from today's challenge: https://firebasestorage.googleapis.com/v0/b/fastab-f08e9.app...

These are some incredible monoliths: if they were real, I feel like I would have heard about them? And if they did... that's so cool. But because it's AI generated, I have a very low confidence level that this ever existed at all. Which is sad.

tralarpa · 2025-03-13T10:51:52 1741863112

[Spoiler] I guess it's this: https://madainproject.com/northern_stelae_park

Which is funny, because the monoliths in the AI video look more eroded than the real ones today.

This looked like a nice idea at first glance. At second glance, it's really bad because you have to assume that everything you see in these videos can be wrong or misleading.

samplank2 · 2025-03-12T22:58:47 1741820327

No, not aware of image models that do chain-of-thought reasoning. But there are vision models that do it, so you can have them review the generated images and iterate on the prompts.

nl · 2025-03-12T22:51:01 1741819861

Reasoning models aren't needed for this. The loss function for the image models needs to take year into account.

This is entirely possible, as the incredible accuracy[1] of non-generative picture location models (a very similar problem) shows.

[1] https://paperswithcode.com/sota/image-based-localization-on-...

littlestymaar · 2025-03-12T22:52:05 1741819925

Why not using img2vid starting from an historically accurate picture or painting?

samplank2 · 2025-03-12T23:33:46 1741822426

This does use img2vid but with AI generated images. Using real pictures or paintings could definitely be fun too.

peishang · 2025-03-13T01:55:33 1741830933

You might look into era specific LoRas if they exist, and if not consider training a few to help better capture architectural detail from that specific time frame.

samplank2 · 2025-03-13T02:07:50 1741831670

good idea! It would be fun to have a ton of LoRas for different places x eras