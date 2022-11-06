“Robots telling stories at night around a campfire.” Credit: Author via Midjourney v4

DALL·E API, Midjourney v4, and the benefits of hiding prompts

OpenAI finally made DALL·E available through an API. This news comes quite late given the popularity of Stable Diffusion (SD), but it’ll still spark the emergence of new gen AI companies. The reason is DALL·E—in contrast to SD—removes the burden of “good prompting” from the user by hiding additions they automatically include to make the images more appealing.

Levelsio (creator of InteriorAI and AvatarAI) tweeted about this recently: “most of us already automated prompt writing away with a front end interface with big buttons and selectors. Regular people don't have the time to figure out prompts.” I agree that, although prompt engineering will be ubiquitous, the ability required to obtain good results will go down over time, as companies hide the complexity of prompts behind the scenes.

Midjourney does something very similar to always generate beautiful images—and they just took it to the next level with the release of v4. The new version is significantly better than anything I’ve seen. Here’s a side-to-side comparison of “a penguin in Venice” between Midjourney v3 and v4:

Midjourney v4 was trained from scratch (it doesn’t use SD or the DALL·E API).

It can handle more complex prompts, it’s better with small details, and, maybe most importantly, it’s better with multi-object scenes:

Although it doesn’t seem to have mastered compositionality: