After generating 5000 images with these tools, I believe the killer app will be the one that gives the artist the most control. I want a view and a scene and be able to manipulate both in real time.
Like,
View: 50mm film, wide-angle
Scene: rectangular room with window -> show preview
Scene: add table -> show preview
Scene: move table left -> show preview
Scene: add mug on table -> show preview
View: center on mug
Right now, there’s little control and it’s a lot of random guessing, “Hmm what happens if I add these two terms?”
Have you seen the img2img results? You draw kind of a crappy Microsoft Paint style image, give it some text for how you want it to actually look, and it does the transformation.
The natural language alone is one of the worst ways to control image generation. The model knows how to generate anything, but it's own "language" is nothing like yours. It's like writing in Finnish, twisting it in such a way that it would yield coherent Chinese poems after Google Translate. You will end up inserting various garbage into your input and not getting the result you like anyway. img2img gives much better result because you can explain your intent with higher order tools than just textual input alone.
What would be best is to properly integrate models like that into some painting software like Krita. Imagine a brush that only affects freckles, blue teapots, fingers, or sharp corners. (or any other thing in a prompt) Or a brush that learns your personal style and transfers it onto a rough sketch you make, speeding up the process. Many possibilities.
I think they are already making an img2img plugin for Photoshop. Watch the demo, it's kind of impressive. [0] It's just a rudimentary prototype of what's possible with a properly trained model, but it already looks like a drop-in replacement for photobashing (as an example).
It's all about generation time. If generation was faster, the UI could preemptively show you a lot of variations based on suggested keywords. And also you could click things and get immediate results.
Currently it takes my mid end PC (2070 Super) 10 seconds per image, which is too slow. You would need to get generation time below 1 second to be quite productive. I guess you can already achieve that with something like triple 3090s?
I think the ideal UX will be the ability to markup images with little comments and have it adapt accordingly. The prompt interface is bad. One of the biggest reasons being that you have virtually no control on the spatial aspect of your additions. Being able to say "add an elephant here and remove this lamp" will be big. Being able to do so with a doodle of an elephant to suggest posing will be even better.
Reminds me of the holodeck scene where Picard(?, Edit, Geordi) reconstructs a table with what I, at the time, thought was a pretty vague set of specifications.
Turns out the Star Trek predicted 2020's style AI behaviour rather well. Considering nuclear war is then due in 2026, that's disconcerting.
An odd one, that. After all the lore (geddit?) about Data and his brother being unique and special for their unrivalled artificial intelligence, it turned out all you have to do to exceed that is just vaguely ask a standard-issue ship computer to do so.
I think the size of the enterprise and its fusion reactor is quite an unfair advantage. Was Data really supposed to be smarter than the enterprise especially when it can read Data's mind state in order to fulfill the prompt?
I suppose the EMH is (or at least was, pre-mobile emitter from the future) a thin client for the Voyager computer.
Still seems odd that it's only apparently Data, Moriarty and the Doctor that have demonstrated the Federation actually can make pretty general AI with the tools it already has on starships (and conveniently always on the ship with all those film crews on it making the Historical Records).
Surely under the crust of some demon class planet there's a bank of millions of times that power bring used for...something.
There's probably a rule against making AI that you're allowed to break in the delta quadrant though.
There’s no direct canon confirmation, but it seems quite plausible that it was, in fact, the Bynars who provided the technological leaps necessary for the Enterprise computer to generate Moriarty and other proto-sentient characters. Riker and Picard both comment on the realism and perception of Minuet, created by the Bynars on the holodeck after their upgrades.
And there is a direct canon line from Moriarty through to the EMH and later sentient holograms via Lt. Barclay.
I make tools for artists and am afraid to incorporate AI generation because I am pretty sure then everyone will just discount work creates with my tool, assuming all of it was AI generated, and then no artists will want to use it.
What I am actually leaning towards is a tool for users to "enhance" art with AI, but only if the artist allows it.
Like,
View: 50mm film, wide-angle
Scene: rectangular room with window -> show preview
Scene: add table -> show preview
Scene: move table left -> show preview
Scene: add mug on table -> show preview
View: center on mug
Right now, there’s little control and it’s a lot of random guessing, “Hmm what happens if I add these two terms?”