How Today's AI Art Debate Will Shape the Creative Landscape of the 21st Century
How the singular features and the lack of regulation of AI systems make this situation uniquely challenging.
AI art systems are in vogue. Although they’ve existed for a few years now, 2022 will be remembered as the year the AI art revolution began. AI tech companies — big and small, for-profit and non-profit — have been developing text-to-image generative models that are sending shockwaves across the creative world — that not long ago felt so safe from AI.
DALL·E 2 (OpenAI) is arguably the model that sparked this transformative trend, but there are many more. Some are private models that companies have announced but never released, like Imagen and Parti (Google Brain) or Make-A-Scene (Meta AI). Other models are in the stage of open beta (anyone can access them through a form or waitlist), like the aforementioned DALL·E 2 or Midjourney. And the vast majority fall into the category of open-source models. Here we find the popular DALL·E mini (now renamed Craiyon), the Colab notebooks that started everything (The Big Sleep, VQGAN-Clip, and Disco Diffusion), and soon-to-be-released models like Stable Diffusion (Stability.ai).
With such a broad array of options, anyone has the chance to experience firsthand the emergent AI art scene. Most people will use these models recreationally to see what the fuzz is about. But others plan to take advantage professionally and/or for commercial purposes (full disclosure: I’m in this group). It’s here where this story starts and why it’s paramount that we talk about it. Let’s contextualize with some recent events.
OpenAI announced on July 20th that they’d be releasing their flagship model DALL·E 2 as an open beta. This means anyone on the waitlist (1M+ people and counting) will soon have access. More importantly, they’ll allow using the generated images commercially. Midjourney and Stable Diffusion, the other two comparable—and even better—models quality-wise will allow it, too. Midjourney paid members can sell their generations and Stable Diffusion is open source — which means that very soon, developers will have the tools to build paid apps on top of it.
A whole new industry is appearing before our eyes because, even if the best models are open-source (Stable Diffusion is generally agreed to be state-of-the-art now), most users will gladly pay a price for no-code ready-to-use services. Soon, these paid apps will become ubiquitous and everyone working in the visual creative space will face the decision to either learn/pay to use them or risk becoming irrelevant.
Digital AI-savvy artists that know how to leverage these models to enhance their toolkits and improve their creations are notably happy. A new world full of opportunities lies before them. But other artists (who seem to fall on the more traditional side, although the boundaries aren’t well defined), don’t share the sentiment.
OpenAI scraped the web without retributing artists to feed an AI model that would then become a service those very same artists would have to pay for. “A bullshit deal,” he says.
Award-winning artist Karla Ortiz sides with OReilly. She argued in a Twitter thread on August 1st that companies like Midjourney — to which she refers directly — should give artists the option to “opt-out” from being used explicitly in prompts intended to “mimic [their] work.”
She closed the thread with a cooperative tone that opens the door to a very needed debate: “I think its an exciting time but this new tech must be handled ethically, transparently and carefully!”
Concept artist and illustrator RJ Palmer also raised concerns on Twitter. He wrote a short thread on Stable Diffusion a few days ago. “What makes this AI different is that it’s explicitly trained on current working artists. [It] even tried to recreate the artist’s logo of the artist it ripped off,” he said. “As an artist I am extremely concerned.”
Are OReilly, Ortiz, and Palmer just conservative artists who oppose and reject the natural progress of technology? I don’t think so. The situation is much more complex than that. To give you an estimate of the scope of their concerns — and the degree to which other people agree with them —, Palmer’s first Tweet amassed 85K likes in a single day.
That’s not to say they’re necessarily right in what they argue, but they’re right in that there’s something to argue about. We can’t move forward without having this conversation if we want to collaboratively define a welcoming future for all.
It’s not clear at all what we should do with this technology. Should it be regulated? Should anyone have access to it for free? In case it’s paid, should companies retribute artists whose work they used to train the models? Should people be able to reproduce artists’ styles with a couple of words? Too many unanswered questions for companies, users, and regulators to ignore.
(It’s worth repeating that I use AI art tools professionally so my personal interest somehow conflicts with the stance I’m taking here. But I think finding an equilibrium through debate is the right approach to solve such a complex issue.)
This longish introduction acts as a starting point to frame this debate. In the next sections, I’ll help you make sense of the state-of-the-art of AI text-to-image models and will explain what makes these conversations uniquely challenging (AI’s unique features make these models qualitatively different from previous creative tools — which refutes the argument that AI art tools are analogous to cameras). I’ll explore the singular privilege that AI companies have due to the lack of adequate regulation in their space. And I’ll disentangle the question of inspiration vs. copying and what differentiates humans from AI generative models in this regard.
It’s a longer article than usual, but very much worth a read. These conversations will reshape and redefine the creative landscape around the world in the coming years and are necessary to bridge the gap between the current chaotic state of affairs and a calmer future at the intersection of freedom for AI advocates and justice for affected artists.
What makes AI art models different than other creative tools
One of the arguments I’ve found says that AI art models aren’t fundamentally different than a camera, edition software like Photoshop, or even a brush and a canvas — they’re just another creative tool in the artist’s toolkit. A natural continuation of this argument defends that AI art should be treated like any other form of art.
I agree with this, but only in part (here’s my essay on why “DALL·E 2 will disrupt art deeper than photography did”). I believe AI-generated art is art in some way (art is defined not only by the intention of the artist but also by the sensations it infuses on the beholder). And it’s also true that AI art models are tools (as a reader pointed out in one of my latest articles, they don’t have volition, agency, or human-like creativity). Still, all AI art models have two features that essentially differentiate them from any other previous creative tool — and put them into a new category.
First, opacity. You’ve probably heard the black-box analogy for neural networks (although AI art models can differ structurally, all are neural networks). The opacity argument can be summarized like this: we don’t know precisely or reliably how they do what they do and we can’t look inside to find out. We don’t know how AI systems represent associations of language and images, how they remember what they’ve seen during training, or how they process the inputs to create those impressive original visual creations.
Here are two examples that showcase just how “black-boxy” these systems can be.
Here I prompted Midjourney with “Limerence, — stylize 20000.” (Limerence means being infatuated with another person.) I used only one word (limerence) and one parameter (stylize, which makes the AI depart from your input and explore the creative space more freely). Where did that woman come from? The dress? The flowers? The style? The colors? I don’t know. The creators of Midjourney don’t know. And Midjourney itself can’t explain it.
For this one I used “a red fox looking at the sky, distant mountains, huge green trees, bright stars, night, in a symbolic and meaningful style, artstation, 8K, — stylize 7500.” Where’s the red fox looking at the sky? Why the main element is a tree with red blooming flowers? Same answer.
I used the “stylize” parameter to illustrate how extreme the opaque nature of AI can get. The degree to which these models show opacity depends on many factors, but they’re never anywhere near full transparency. The point I’m making here is that, whatever the AI model does internally, we can’t know.
This opacity makes generative visual models, in particular (and AI systems in general), pertain to a new category of tools. A camera, in contrast, can be completely understood. Every element. Every mode. Every output. You can always inspect the settings of a camera to analyze a photo you just took or learn what you should tweak to get a different result.
In the case of AI art, the intention I may have when I use a particular prompt is largely lost in a sea of parameters within the model. However it transforms my words into those paintings, I can’t look inside to study or analyze. AI art models are (so far) uninterpretable tools.
The second feature, that perfectly complements opacity, is deep learning-based AI’s inherent and untamable stochasticity. This means that if you use the same prompt with the same model a thousand times, you’ll get a thousand different outputs (even if using the “seed” parameter). Here are sixteen red and blue cats Midjourney created using the same seed. They are similar, but not the same. The differences are due to the stochasticity of the AI.
AI art models are non-deterministic. Cameras, photoshop, or a brush are deterministic in this sense; you can always trace back from output to input. And, if you apply the same input repeatedly — settings, clicks, or hand movements —, you’ll always get the same output. This leaves the artist’s intention intact, unaffected by the tool.
Can I say I created the above images with the help of a tool? I don’t know how the AI helped me (opacity) and couldn’t repeat them even if I wanted to (stochasticity). It’s fairer to argue that the AI did the work and my help was barely an initial push.
Savvy AI artists may force the system hard enough to leave marks of their intention on the final piece and, in some cases, finding the right prompt can take a significant amount of work. But that won’t be the case for most people, and this conversation shouldn’t focus only on the best but also the worst-case scenarios.
A very anomalous space of non-accountability for tech companies
But why does it matter if AI art models are different from other tools? We can simply adapt to their unique features — as we always do — and move on, right? Well, maybe. We have to go beyond the technical domain to understand why this case is especially complex.
The argument with which I started the previous section (are AI art models like photography?) is tricky because AI models are owned by companies that live in “a very anomalous space of non-accountability,” as tech ethics expert Gemma Galdón says. If you buy a camera you own it and the photos you make with it, but laws regarding AI models are still in the making. There’s a notable lack of regulation on what these companies can or can’t do regarding the training and deployment of these models.
As OReilly says, “DALL·E has harvested vast amounts of human creativity — which it did not pay for and does not own or credit.” He’s right. OpenAI (like Google, Meta, Midjourney, or Stability.ai) has scraped the web to amass tons of data to feed DALL·E 2. Where does that data come from? Under which copyright does it fall? How can people license their AI-generated creations? Do they ask for permission from the original creators? Do they retribute them? Many questions with unsatisfactory answers.
OpenAI dismissed these concerns and did it anyway. Now, DALL·E is a service that artists will have to pay for if they don’t want to become irrelevant soon (other options are available, but all ready-to-use apps will most likely remain paid services). DALL·E or Midjourney won’t replace all artists, but they will certainly impact the demand. And we’re just at the very beginning of this technology. How good will future AI art models become? We don’t know, but the amount of money behind these initiatives should give you a hint.
What we have thus far is a handful of companies that are wealthy and in most cases opaque, building models that are opaque and stochastic. But I don’t want to put all companies in the same bag. They aren’t equal.
Google and Meta’s intentions are kept 100% private. Do they want to eventually embed these models into their already popular products or services? How did they train them? OpenAI's alleged mission is to build AGI to benefit everyone. Did they need to disrupt an already hard-to-get-into field without retributing artists and making them pay for a service they built on top of those very artists’ efforts? Have Midjourney or Stability.ai, who lean more towards freedom for the user, considered what their models will be used for once anyone can get their hands on them? Fewer guardrails mean more misinformation and more bias.
Modus operandi and goals vary, but all these tech companies have one thing in common: They all take advantage of a notorious lack of regulation. They decide. And we adapt.
This is a recipe for disaster, because companies, and by extension users and even the models themselves, become “injudgeable”. How can something be judged when the rules we’d use to judge it are nonexistent? To the opacity and stochasticity of AI art models, we have to add the injudgeability of tech companies that own those models. This further opens the doors to plagiarism, copying, and infringement of copyright — and copyleft — laws.
Inspiration or copying?
OReilly finishes his rant with an interesting argument: “because it’s a black box, passing off DALL·E images as one’s work it’s always going to be akin to plagiarism.” We know AI art models can reproduce existing styles with high fidelity, but does that make them plagiarizers automatically? Let’s find out.
To disentangle the “copying vs inspiration” question let’s use the example Ortiz writes about in her thread. She says one very common practice among users is to explicitly ask the AI model to reproduce the style of popular artists (alive or deceased) to enhance their creations. AI artists defend this behavior by arguing that humans also take inspiration from others. Artists learn by studying greater artists. They reproduce and mimic other people’s work until they can grow past that and develop their personal style. Asking the AI for another artist’s style is no different.
However, not everyone thinks that parallelism is valid. Ortiz says that she’s seen “folks on [Midjourney’s] discord not only use my name, but trying to figure out how to train [their] ai to mimic my work.” I’ve also seen it and I’ve done it myself.
Let me show you just how much this can influence the output. Here are three prompts to Midjourney: “Two people,” “Two people, in the style of Picasso,” and “Two people, in the style of Dali.” Now, you tell me which is which.
Ortiz has a point. Picasso and Dali are very famous, but this has been repeatedly proved with less well-known artists (as Palmer showed in his thread). Many of them are alive and currently working and may not want to participate in the AI art movement. As far as I know, no company has asked permission to use the artists’ work for this purpose — which although may not be illegal, isn’t, in any way, fair.
(As a note, I want to say that we have to distinguish between the companies — that are scrapping the web indiscriminately to feed their models with artworks from many, many different artists — and people who are actively trying to make those models less opaque by analyzing how they respond to different prompts. Palmer’s examples point to the second case. Given that this technology exists, it’s better to have more information about how it works, not less.)
Let’s get to the question: Is this copying or inspiration? In the strict sense, Midjourney, DALL·E 2, and Stable Diffusion have learned from those artists. If we don’t go deeper into what it means for an AI to learn, there’s no reason to believe this is copying — they learned from artists and are therefore inspired by them. But of course, this argument isn’t valid as we can’t compare AI’s with humans’ ways of learning.
But we can compare AI art models with human artists in the terms I’ve defined throughout the article. The two features I talked about, opacity and stochasticity, that made AI art models stand out from other creative tools, are also shared by humans. Human brains are also opaque — we can’t dissect one and analyze how it learns or paints — and stochastic — the number of factors that affect the result is so vast, and the lack of appropriate measuring tools so determinant, that we can consider a human brain non-deterministic at this level of analysis.
This makes human brains pertain to the same category as AI art models. Also, and this is key to consider, both AI art models and humans have the ability to copy, reproduce, or plagiarize. Even if not to the same degree — expert artists can reproduce styles they’re familiar with, but AI’s superior memory and computation capability make copying a style a kid’s game — both can do it.
Humans have a harder time copying, but we certainly can. If we don’t do it is because we choose to limit the amount of influence we allow previous artists to have in our work. We constrain it enough for it to remain in the realm of inspiration without crossing the lines of illegality.
And that’s precisely what makes us different than AI art models. Unlike them, we’re judgeable because those “lines of illegality” exist to keep us in check. Regulation for humans is mature enough to precisely define the boundaries of what we can or can’t do. Vague arguments (e.g. This is just inspiration) that can be used both ways are meaningless when the law settles the debates before they start (in the case of humans) and also when its non-existence makes it impossible to debate (in the case of AI art models). It’s precisely for that reason that companies and users can do whatever they want with these non-regulated AI art models.
We aren’t going to decide what’s inspiration and what’s plagiarism just by looking at a prompt and an image. We have to first define impartial rules of use that take into consideration the unique characteristics of AI art models, the pace of progress of these tools, and whether or not artists want to be part of the emerging AI art scene.
Artists, illustrators, and designers are facing increasing competition from people that can reproduce or copy styles with impunity because the corresponding regulation is non-existent. AI companies can train, deploy, and offer these AI art models as paid services for the same reason.
AI art models that are opaque, stochastic, very capable of copying, and injudgeable can’t be subject to current frameworks of thought. They belong to a new category of tools. The singular nature of AI art models and the lack of regulation is an explosive mix that makes this situation uniquely challenging. And, as Ortiz argues, extremely necessary to talk about, debate, and discuss until we arrive at a consensus that creates common ground.
That’s how we’ve always done it with other technologies. Hopefully, it won’t be different with AI art models. But they advance very fast — regulatory organisms should speed up if they want to catch up before the chaos engulfs us all.