DALL·E 2 Will Disrupt Art Deeper Than Photography Did
The end of art or a new beginning?
Are we going to soon witness the creative genius of a virtual Picasso or Leonardo da Vinci?
We could be on the way with the latest trend in the AI field: visual generative models. While systems like GPT-3 create text from text, others like DALL·E 2 — a wordplay between Spanish painter Salvador Dalí and Pixar’s cute robot WALL·E — can create visual art from words. As you can see from the cover image of this article, this tech gives a new meaning to the idiom “an image is worth a thousand words.”
OpenAI, the company behind DALL·E 2, announced the model in April. In just a couple of months, it has sounded all the alarms that AI is getting too close to human-level visual creativity. It can create anything you can imagine — and anything you can’t — from a single sentence.
And OpenAI isn’t alone. A month ago Google Brain announced Imagen, a similar — and even slightly better — model. And just yesterday, another team from Google Brain released yet another model called Parti.
And that’s without mentioning all the projects coming from low-budget research groups that don’t count on the resources of the Googles and OpenAIs of the world. Some examples are brand new research labs like Midjourney, free apps like Dream, open-source alternatives like Hugging Face’s DALL·E mini (now called Craiyon), or the collab notebooks (e.g. disco diffusion) that originate from the work of a few independent researchers like Katherine Crowson and Ryan Murdock, who started the movement among digital artists.
Now that the tech is proven and the uncertainty removed, the whole AI community is in a race to build the next best text-to-image model. And they aren’t stopping anytime soon. This will have important consequences. Today, I’ll focus on how this technology will disrupt the visual arts. DALL·E 2, the main star of today’s issue, will accompany us on this journey.
Looking inside DALL·E 2
In case you’re not familiar with DALL·E 2, here are some of its early creations. This is a mere glimpse, as right now there are 3 million+ DALL·E 2 images rolling on the internet (if you want to try it, sign up on this waitlist — and be patient).
The images you see above are generated in a matter of seconds. You write a description in English and DALL·E 2 does the rest. How is this possible? The short explanation: This is the result of a combination of powerful computers, tons of data from the internet, and smart algorithm design.
But that‘s too general. That’s why I have right here a practical way to describe DALL·E 2’s workings as ELI5 (explain like I’m 5). Anyone can understand it with this high-level analogy. I just need one thing from you, you have to take a pencil and a piece of paper and analyze your thinking process while doing these two exercises.
It won’t be difficult, I promise.
First, think of drawing something simple. For example, a house surrounded by a tree and the sun in the background sky. Don’t draw it yet! Only visualize what the drawing would look like. The house. The tree. The sun. All together. The mental imagery that appeared in your mind just now is the human analogy of DALL·E 2’s internal visual representation. It has learned to match text descriptions with visual imagery by training on a lot of caption-image pairs from the internet. At this point, neither you nor DALL·E 2 know exactly how the drawing will turn out, but you know the main features that should appear. Going from a sentence to abstract imagery is the first step in DALL·E 2’s creation process.
Now, you can draw. It doesn’t need to be good, don’t worry! Translating the imagery you have in your mind into a real drawing is the second step of DALL·E 2’s creation process. You now could easily redo your drawing from the same description with similar features but a totally different final look, right? DALL·E 2 can, too. It can keep the essential features and vary the rest, creating distinct original images each time.
Of course, this is a superficial analogy, as there are important differences between human mental processes and DALL·E 2 statistical processes. AIs don’t have brains. But it serves our purposes for now.
In case you want to explore other DALL·E 2 creations, the best resources are DALL·E 2’s official Instagram page, the subreddit r/dalle2, and the hashtag #dalle on Twitter (but be careful not to confuse it with DALL·E mini, which is significantly lower quality).
Also, if you want to read more on DALL·E 2; what it is, and how it works from the good and bad sides, here’s a free link to my Towards Data Science non-technical analysis, “DALL·E 2, Explained: The Promise and Limitations of a Revolutionary AI.”
Beyond photography: DALL·E 2 is a meta-tool
DALL·E 2 (or Imagen or Parti) isn’t public so for now its reach remains limited. But it won’t be for long. Notably, this technology has enormous potential to disrupt industries and jobs, and also our understanding of human creativity and art.
Google and OpenAI may not release the models to everyone (OpenAI is now granting access to 1000 people per week, up to +10.000 to date can use DALL·E 2 out of +1 million that have signed up on the waitlist), but the fact that this tech is possible is enough to inspire copycats that may not be so careful. DALL·E 2 and its cousins are sparking a cultural debate that will only grow in importance.
Is AI going to change our relationship with art? Are models like DALL·E 2 just the beginning of an ever-improving form of technology that will eventually surpass our creativity? Can these systems disrupt art like photography did last century? These questions point to an impending cultural shift that will go as far as to turn upside down our understanding of what it means to be human.
When I first made contact with this emerging AI art scene a few months ago, I realized how this tech could produce an effect very similar to what photography did to art in the 20th century. AI could become a new tool for artists or their demise.
In a previous article, I wrote:
“You don’t take a photograph, you make it.”
— Ansel Adams
Photography wasn’t considered an art in the beginning. It was promptly embraced by the general public but was perceived as a mechanistic way to record the fleeting moments of life. It was a skill, a matter of technical ability, not an art.
Yet it changed our relationship with art as we knew it. It liberated painters from the task of capturing reality as it is. It allowed them to move, free of ties, in the vast latent space that art offers. They could focus on the abstract, on painting ideas to transmit feelings that photos fail to catch. They could break the rules and explore the limits of what’s possible, finding meaning in questions that only exist within us.
Photography also democratized the access to paintings — as taking a snapshot of reality was suddenly a matter of minutes and not days. At the same time, at least in the view of the elites, it cheapened the unique talent-based beauty of hand-made paintings. Not everyone was happy, but technology always finds a way. Photography came to stay and most will now agree with Adams’ quote above: Photography is art.
Now is the turn for artificial intelligence to disrupt the world of visual art. AI art generators will also push artists to redefine their relationship with art.
Artists might feel threatened by systems like DALL·E 2 in the beginning, as they were by photography a hundred years before. However, photography didn’t stop artists from finding new ways to express themselves. On the contrary, it provided new opportunities. They could “focus on the abstract, on painting ideas to transmit feelings that photos fail to catch.”
It’s reasonable to expect that AI will have similar effects. But, if we take a closer look, there’s a key difference between a camera and DALL·E 2 that makes the latter especially worrisome.
To understand why, we need to think about the reasons artists moved apart from realistic styles when photography came along. Cameras competed frontally with artists at capturing reality. They were cheaper, faster tools that didn’t require anywhere near the same skill to use as a brush and a canvas to produce high-fidelity pictures. Photography effectively displaced artists, who had to reinvent themselves by searching in the unexplored corners of art.
Now, looking back with the benefits of hindsight, we know artists took advantage of the insights photography provided and learned to experiment freely in new creative playgrounds. But DALL·E 2 may pose a greater threat than cameras. When I was analyzing some of its creations I realized something. Look at these:
Because it was trained on millions of images from the internet, DALL·E 2 can create paintings in any style you can possibly imagine.
You can use it to create photorealistic images, but also pen drawings, pixel art, digital art… You can ask it to paint in any historical style like surrealism, cubism, baroque, etc. You can ask it to paint in the style of particular painters like Dalí, Picasso, or da Vinci.
You can even ask it to paint in made-up styles that you think have a particular idiosyncrasy. DALL·E 2 will probably produce a nice result with the features you are looking for, as long as you’re clear in your request.
Photography displaced artists and DALL·E 2 might, too. But here’s the critical difference between the two: cameras don’t learn. Even if artists go on to develop new techniques and styles to differentiate themselves from AI-generated art — which until recently had a very idiosyncratic aspect —, a new DALL·E 3 or 4 will appear and will learn from those new styles to further displace artists.
AI can learn and that’s what makes DALL·E 2 and other similar models especially threatening.
They’re not painting tools, they’re learning tools. Meta-tools.
DALL·E 2 seems to have a powerful ability to paint, but that’s a misunderstanding. What these systems have is the meta-ability of learning to paint — anything in any style, already invented or to be invented. This is a crucial difference and it’s what distinguishes DALL·E 2 from any previous disruptive single-purpose technology.
Artists can escape a camera because it’s static in its abilities. It can take pics and nothing else. Photography replaced a single art style and became an art in itself. But DALL·E 2 can chase down artists wherever they go in the vast space of uncharted creative territory. AI wouldn’t replace just one style, it would replace all styles.
Even if the problem of disrupting art isn’t new, the way AI could do it is. And that’s scary.
A silver lining
But, like any other tool — even if a meta-tool — DALL·E 2 has limitations. And it's with those limitations in mind, that we have to think about visual generative models, and AI in general.
The most important one is that it doesn’t understand that the objects it paints are a reference to physical objects in the world. If you ask DALL·E 2 to draw a chair it can do it in all colors, forms, and styles. And still, it doesn’t know we use chairs for sitting down. Art is only meaningful in its relationship with either the external physical world or our internal subjective world — and DALL·E 2 doesn’t have access to the former and lacks the latter.
It’s true it can combine simple concepts or words into complex creations that mix them into a semantically coherent whole. For instance, after being prompted with “a shipping container with solar panels on top and a propeller on one end that can drive through the ocean by itself. The self-driving shipping container is driving under the Golden Gate Bridge during a beautiful sunset with dolphins jumping all around it”, DALL·E 2 created this amazing piece:
But it has very limited generalization capabilities. Like any other deep learning model (or, more broadly speaking, any AI), DALL·E 2 can, at best, extrapolate within the training set distribution. This means it can’t understand the underlying realities in which its paintings are based and generalize those into novel situations, styles, or ideas it has never seen before.
This limitation is key because we humans can do it very easily.
We do it all the time.
Picasso and Braque invented cubism inspired by previous ideas. But they didn’t just combine them into a straightforward product. They started a multidimensional art movement that went well beyond the ideas that motivated it. Cubism was larger, far more than the simple sum of its predecessors.
Yes, Picasso was an exceptional artist, but the core argument rings true for all of us. We can employ our unmatched cognitive prowess to go beyond what exists. That’s the essence of human creativity.
AI may be able to follow us wherever we go in the abstract space of creative invention, but we’ll still get there first.
We’re the pioneers.
The original creators.
And this doesn’t seem to be changing anytime soon.
As somebody deeply interested in the digital domain but coming from the arts/design field, I find the discussion about art generated by AI tech a little disturbing. If I had no idea of the means of making the visuals, for the most part they look... well, horrible. I have yet to see artworks created using AI tech that I would say looks like anything but bizarre kitsch. It's as if a whole lot of pre-existing visual elements, some linked to the desired theme , some less so, have been put into a blender, given a quick whizz, then dumped on the page.
Perhaps that's how computing engineers see the creative process - a near random rearrangement of existing stuff. It's as if someone went to the refrigerator, randomly selected anything in there (it's all food, right?) , whizzed it up, cooked it a bit. - and there's your meal, human.
In the products os DALL-E and GPT-3, I see little of the elegant restraint of a simple line on a page that evokes human experience: the kind you might see in, for instance, Eugene Delacroix's drawing of a mounted man attacking a panther from 1854: the fury and passion of the moment is condensed into a few scribbles, brilliantly capturing the essence of the moment.
In contrast, most AI tech's attempts at the art caper seems to ponderously and ham-fistedly pile more stuff into the mix, randomly chopping lumps out of it and scrambling the whole lot into the nearest proximate cliche.
I know that it's very early days with the intriguing new field of technology and the analogy with photography is very apt. But, oh dear, there's still a very long way to go.
Fascinating and really frightening.
I agree on the fact that humains might probably have to exploit implicit aspects, in particular cultural ones. Maybe jokes are going to become compulsery.