Discover more from The Algorithmic Bridge
OpenAI Has Just Killed Prompt Engineering With DALL-E 3
You can now get high-quality images that depict complex scenes by default
Change of plans. I have the third part of the OpenAI series ready to publish but they just announced an unexpected release: DALL-E 3.
It’s significant for three reasons.
First, OpenAI has more than enough resources to combat Google and its other competitors on the language front while developing fantastic image models.
Second, the state of the art for image generation has taken another leap forward. If not quality-wise—we’ll have to test how well it stands against Midjourney—surely in the raw skill of the systems (i.e. what they can do at an arbitrary quality level).
Third, if OpenAI’s depiction of the model's capabilities is faithful, prompt engineering for AI art is done. I will comment a bit on the relevance of this one, but first, let's see what OpenAI says about its new release.
What DALL-E 3 can do
“DALL·E 3 understands significantly more nuance and detail than our previous systems, allowing you to easily translate your ideas into exceptionally accurate images.”
This has been one of the main shortcomings of models like Midjourney and Stable Diffusion (and the previous DALL-E versions, too); it was very hard to create a prompt that could accurately translate your mental image of a scene into the model. OpenAI seems to have solved just that with DALL-E 3.
Here's an example from the blog post:
“DALL·E 3 can accurately represent a scene with specific objects and the relationships between them.”
Neither Midjourney nor Stable Diffusion allow you to do this—solitary characters and objects are easy and the quality is high, but scenes where different objects have to follow specific relationships described in the prompt? That was an unsolved challenge.
Sam Altman predicted a while ago that prompt engineering was a temporary phase of generative AI. I agreed back then but argued that it could take a lot of time to get the models to the point where we wouldn't need to translate our ideas into a language they could understand. It seems that milestone, at least for image generation models, has been achieved.
This means that the entry barriers that somewhat “gatekept” the ability to create amazing images with AI are being demolished fast. Visual creativity is being democratized.
What this means for traditional artists and the creative community is a question we should discuss. On the one hand, it's great to be able to create great art without deep prompt engineering expertise (for instance the Stable Diffusion + ControlNet style currently trending on social media isn't straightforward at all), but on the other, I can't help but feel that we—we, humanity—are losing something every time we take a step in this direction.
What do you think?
Some other details about DALL-E 3
On top of that, text in images is no longer an issue (although other models had already solved that one), and ditto for six and seven-finger hands.
DALL-E 3 is in research preview but Plus and Enterprise users will have access in October. It will be later available for the rest in OpenAI Labs. As with DALL-E 2, the images are the property of the creator and can be printed and commercialized.
OpenAI has gone a step further to connect DALL-E 3 with ChatGPT so that the latter can act as a creative partner. Many people have already experimented with this, so it's not a new feature, but OpenAI has significantly reduced the friction in going from an idea to an image.
OpenAI has also taken an important step to protect the livelihood of living artists, which is an important step to find common ground with them (and avoid future lawsuits): DALL·E 3 will decline requests to copy their styles (which is arguably the strongest criticism from the creative community) and they will be able to “opt their images out from training of our future image generation models.”
There you have it, DALL-E 3—a nice surprise for the middle of the week.