GPT-4Chan: 'The worst AI ever'
The limits of open-source AI.
The democratization of large language models is key for the healthy advancement of AI, as I’ve previously argued. Now, the AI community is starting to make it a reality through open-source AI models— code, weights, and datasets — for training and open platforms for inference. Notable institutions like EleutherAI or Hugging Face and projects like BigScience (they’re training BLOOM, a 176B open multilingual model) are growing in prominence due to their unique role in this regard.
Even Meta — often despised as being uncaring of the effects of its tech — has open-sourced a pre-trained large language model the size of GPT-3, called OPT (Open Pretrained Transformer), as well as the code to train it. And just three days ago, Google announced it’s also entering the open-source space by sharing the trained weights of the Switch Transformer models, including the Switch-C, a 1.6T-parameter sparse model.
Delegating the power to decide who uses or not these incredible tools to a few powerful and resourceful companies like OpenAI was not sustainable (it’s worth remembering that although Meta and Google are open-sourcing a couple of models, most of their AI remains private).
It’s ironic that is precisely OpenAI the latest to feel the consequences of proprietary AI. DALL·E mini, a Hugging Face-hosted open-source cousin of DALL·E 2, is now significantly more viral than the original. To give you some numbers, DALL·E 2 generated 3 million images from April 6th to May 18th whereas DALL·E mini is generating 50 million a day. That’s a 700x increase. Although OpenAI is now adding 1000 users per week, they can’t compete against open-source, even if we factor in the obvious difference in quality (DALL·E 2 produces way better images than DALL·E mini).
It now seems clear that the advantages of breakthrough AI, like DALL·E 2 or GPT-3, should benefit and be available to all. However, defending open-source AI isn’t as simple as it seems. And today I bring you a very recent case that reflects this very fact.
I’ve considered not publishing this article because I could bring unwanted attention to this specific example just by writing about it. While the thesis I’m defending is precisely to help avoid cases like this in the future, publishing it could achieve the contrary. On second thought — and the reason why I’ve finally decided to publish it — I realize that while the downsides will be limited to this particular case, the benefits of raising awareness will be evergreen (if I achieve the goal of making you, the reader, think about this).
When open source goes wrong
The case features ML researcher and YouTuber Yannic Kilcher who, a few weeks ago, decided to fine-tune EleutherAI’s pre-trained GPT-J-6B model on a dataset comprising 3.5 years' worth of 4chan posts from a particularly toxic board of the site, /pol.
As a “prank and light-hearted trolling,” as Kilcher later told Motherboard, he decided to release the model on 4chan to see how users would react. In the span of 48 hours, several bots powered by the model generated over 30,000 posts. Kilcher seemed delighted that GPT-4chan “perfectly encapsulated the mix of offensiveness, nihilism, trolling, and deep distrust” typical of the /pol board in which it was unleashed.
He published a YouTube video entitled “This is the worst AI ever” where he explained the whole process and finally uploaded the model to Hugging Face so anyone could download it and use it for research — and maybe release it on another social network where it could perpetuate the noxiousness.
Given that he was perfectly aware of the extreme toxicity of the model, and even acknowledged it at the end of the video when he said “for many reasons this model isn't ready to be deployed anywhere,” I wonder why he released it to the world. The model could have a genuine research interest, but that wouldn’t compensate for the potential repercussions if the model fell into the hands of people with malicious intentions.
In some way, training the model to see how it compares to existing models — which are generally intended to reduce toxicity — could be a fruitful scientific experiment. Kilcher himself evaluated the model on a few benchmarks and found that GPT-4chan surpassed GPT-3 and GPT-J on TruthfulQA (which probably says more about the benchmark than the models).
He could even defend his decision of sharing a limited number of outputs on the same 4chan site it was trained from — as it “just outputs the same style” — as long as it was framed as a rigorous experiment.
But why make it public and certainly lose control of the downstream consequences?
The limits of open source
Safety and ethics AI researchers voiced concerns on Hugging Face and Twitter over the model. Dr. Lauren Oakden-Rayner gave an analogical perspective comparing AI practices with those well established in the medical field. For her, Kilcher’s experiment was out of the healthy boundaries that should always be present when doing open-source science. “[Kilcher’s experiment] would never pass a human research ethics board,” she argued on Twitter.
Other scientific fields implement guidelines to prevent harm when sharing knowledge or research material, but it’s seldom considered when talking about AI models. Dr. Oakden-Rayner says “open science and software are wonderful principles but must be balanced against potential harm,” and I very much agree. Borrowing from medical research, she proposes a solution that involves completing a registration form, describing the intended research, and signing an agreement on data use.
João Gui, a researcher at Cohere — a company dedicated precisely to training and deploying large language models (LLMs) while keeping high safety and ethics standards — argued that, given that the model had been already downloaded over a thousand times, it was better to remove the weights completely before more harm was done.
In response to these concerns, Hugging Face, which originally removed just the inference widget, decided to completely block any downloads to the model indefinitely. They acknowledged GPT-4chan as an unprecedented case and tried to come up with a gating method intended to provide access only for research purposes. Just yesterday, they announced the model will remain blocked as they “couldn’t identify a licensing / gating mechanism that would ensure others use the model exclusively for research purposes.”
They gave priority to the potential harms the model could generate over allowing reproducibility of the model, which in this case seems more than reasonable.
Some questions arise from this situation: Is it okay to ban a model that was fine-tuned on an open-source dataset using an open-source pre-trained model? GPT-4chan is obviously an extreme case as it was explicitly trained from a toxic dataset, but what about more nuanced cases? Can the AI community ensure gating mechanisms will be enough to prevent harm from language models? What happens when the harm is not that obvious, like in carefully designed disinformative political campaigns?
Professor Gary Marcus expressed these concerns the best when he said: “Whatever one thinks, Yannic Kilcher has exposed how easy it is to make LLMs do really unpleasant things … No one has a clue how to stop this.”
Safety > Openness > Privacy and control
We have three options.
First, we could keep this tech closed. This is what big tech companies have been doing and it isn’t the way toward democratic AI. OpenAI decides who does and doesn’t use their systems (GPT-3 first, now DALL·E 2). Google or Microsoft simply keep their models private (with limited exceptions), eventually embedding them into their technology without letting the AI community inspect their limitations and deficiencies.
Another option is, as EleutherAI and Hugging Face advocate, open-sourcing AI models, particularly those that require huge amounts of compute and data to create. For instance, GPT-J, which was the basis of GPT-4chan, is a model created by EleutherAI as the first step towards an open-source GPT-3. They did it with the best intentions in mind, but as it happens, someone took advantage and created something evil out of good intentions. Open-sourcing everything may be better than keeping everything private but, as we’ve seen today, it can also be dangerous.
There’s a huge array of options between absolute privacy/control and absolute openness. Is somewhere in there that we’d find the best approach for large AI models. Creating ethics review mechanisms that ensure only good intentions in downstream uses are allowed is a great idea. Dr. Oakden-Rayner proposes that “model custodians” (Hugging Face in this case) should be held responsible for the use of the data and models they host. Good gating mechanisms would prevent most misuse of their models.
Even if we used open-source combined with ethics and safety criteria to decide who gets to use what (that is, putting safety over openness over privacy, which makes sense), sometimes there can’t be a guarantee that no harm would be done eventually (even if in very indirect ways that would leave us with no means to trace down the causes). That makes some of the risks of this tech inevitable. The AI community should aim at reducing both the degree of harm and its inevitability.
If the inevitable risks are assessed as too big or too uncontrollable it’d be better to ban the tech temporarily — or even completely — than to open it partially transforming the risk into real harm. However, this only makes sense if the ban comes from a place of open source-friendliness (i.e. OpenAI not releasing GPT-2 initially doesn’t count, but Hugging Face blocking GPT-4chan a posteriori does).
A firewall is better than no wall until better measures, methods, or mechanisms are in place. In the end, the medical community wouldn’t publish chemical data of a deadly virus in the name of open source, and the physics community wouldn’t publish exact instructions to build a Hydrogen bomb in the name of open source.
We should never forget that AI technology is powerful, more so thanks to the internet, and it should be constrained to the same rules that apply to other established fields. The space of non-accountability it enjoys now is an anomaly, a by-product of the speed at which it develops, but in no way desirable. The freedom some think they have now can easily be the suffering of others.
While I respect that some people might have concerns about downstream consequences and (as a result) may very well decide not to publish stuff for that reason, I see it as more of a matter of personal priorities than one of ethics. If I found some way to use GPT-4chan for harm, and I did so, then the responsibility for that harm would lie solely with me. At its core, sharing an AI model is just telling other people how to use math for a certain purpose. So while acting (or, in this case, not acting) out of concern for others' interests is a noble thing to do out of personal choice, I wouldn't call it anything close to a moral duty in this case, because nobody has a positive right to not have other people know things. I realize that they *do* have a right to not have harm done to them, but merely releasing a model to the public doesn't violate this right: while one might be of the opinion that releasing the model does "a harm to society", my view is that society as a whole doesn't have rights; only individuals do. So no one's rights are actually violated until the point that a specific individual person suffers harm as a result of it, which doesn't happen merely by sharing a model alone.