OpenAI Has the Key To Identify ChatGPT's Writing
They'll add a secret watermark to the AI's creations. Will they share the means to see it?
This article is a must-read for institutions that want to prevent plagiarism, impersonation, and abuse of ChatGPT. Here’s what OpenAI conceives as a solution to those problems.
ChatGPT is the tech news of the week—or the year.
For me, it’s the world’s best chatbot. Alex Cohen says it’s “the most incredible tech to emerge in the last decade” and Genevieve Roch-Decter claims it’s one of the “most disruptive technologies ever created.”
Given how much it fails—and its inherent limitations—that may be a stretch, but it’s undeniable that ChatGPT is revolutionary. If not technically speaking, at least as measured by the unprecedented popularity and speed of adoption—propitiated by the free website and the friendly UI/UX—and the impression it’s caused in anyone that has tried it.
Predictions range from ChatGPT being the Google slayer to the tipping point for the outdated education system, to maybe the greatest consumer product since the iPhone.
Are these forecasts accurate or mere exaggerations?
We don't know yet but, regardless of what you think of OpenAI’s latest creation, the topic I’ve picked for today’s post will surely affect how you view its future and the future of AI writing tools in general.
The Algorithmic Bridge is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Blurring the frontier between human and AI creation
As a writer, one of my greatest concerns is ChatGPT’s ability to generate human-looking writing that we can’t differentiate from real human work. Imagine not being able to tell apart Shakespeare’s poetry from the industrialized product of a hollow AI.
If we extrapolate this argument to the future it points to (however distant), we find a society that no longer can trust the written word—for anything, in any of its forms.
This prospect entails, without a doubt, a breach in the fabric of society. It’d be a phenomenon of historical consequences, maybe as significant as the invention of the printing press—although in the opposite direction.
But we may still have time.
I’ve argued before that AIs like ChatGPT can’t create highly engaging literature or thought-provoking essays. As statistical charlatans, they can only regurgitate what they encounter in the “safest spots” of their latent space (the AI analog of our mental representation of concepts), which implies they haven’t quite captured the finest features of the most exquisite human prose.
Yet I’m well aware a crisp style or a unique voice aren’t always synonyms for success. Sheer quantity can’t substitute quality, but it surely can occupy the limited space of our attention. That’s why I take my fellow writers’ fears very seriously—I believe that, if not all, some of them are in real danger of experiencing a sharp reduction in demand for their skills.
This is the double-edged nature of technology. It creates new worlds full of possibilities while at the same time erasing others we take for granted.
Is there anything we can do to protect ourselves from this? And here I don’t mean just writers, but everyone: AI-generated text threatens the priceless gift of knowing when we access each other’s minds.
In my essay on ChatGPT I argued for a solution:
“Human writing has characteristics that can, using the right tools, reveal authorship. As LMs become masters of prose, they may develop some kind of writing idiosyncrasy (as a feature and not a bug).
Maybe we could find the AI’s styleme (like a fingerprint hidden in language) not simply to distinguish ChatGPT from a human, but to distinguish its style from all others.”
As it turns out, OpenAI’s Scott Aaronson (also a CS professor at the University of Texas) is implementing a mechanism to simulate a stylema in ChatGPT’s creative process. It seems I wasn’t so far off after all.
OpenAI’s plan to identify ChatGPT’s outputs
On November 28th, Aaronson wrote in his blog (“My Projects at OpenAI” entry):
“My main project so far has been a tool for statistically watermarking the outputs of a text model like GPT. Basically, whenever GPT generates some long text, we want there to be an otherwise unnoticeable secret signal in its choices of words, which you can use to prove later that, yes, this came from GPT. We want it to be much harder to take a GPT output and pass it off as if it came from a human. This could be helpful for preventing academic plagiarism … mass generation of propaganda … or impersonating someone’s writing style in order to incriminate them.”
As Aaronson says, an invisible “conceptual” watermark is what they need to make it “much harder to take a GPT output and pass it off as if it came from a human.” This feature could prevent misinformation, plagiarism, impersonation, cheating, etc., because what most malicious use cases share is that the user has to “conceal ChatGPT’s involvement.”
OpenAI already has a “working prototype” that he says “seems to work pretty well:”
“Empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT. In principle, you could even take a long text and isolate which parts probably came from GPT and which parts probably didn’t.”
This means a couple of paragraphs are enough to tell if the content came from ChatGPT or not.
(Note that Aaronson doesn’t refer to ChatGPT explicitly but to a generic “GPT”. My guess is that all of OpenAI’s language models will integrate the watermarking scheme, likely including the next iteration of ChatGPT.)
Although the specifics of how the mechanism works are too technical to cover here (if you’re interested, check out Aaronson’s blog. It’s very good!), it’s worth mentioning a few relevant details buried in the jargon:
First, users won’t have the means to see the watermark (DALL-E’s, on the contrary, was visible and easily removable) unless OpenAI shares the key. I doubt anyone will find a direct way to remove it.
However, second, although the watermark it’s hard to bypass with trivial approaches (e.g. remove/insert words or rearrange paragraphs), it’s possible (e.g. Aaronson mentions that paraphrasing ChatGPT’s outputs with another AI would remove the watermark just fine).
Third, only OpenAI knows the key. They can share it with whoever they want so third parties can, too, assess the provenance of a given piece of text.
Finally, what I consider the most critical aspect of this: the watermark won’t work with open-source models because anyone could go into the code and remove the function (the watermark isn’t inside the model, but as a “wrapper” over it).
I’ll come back to this last point later because it’s a matter of time before open-source companies catch up—and, despite open-source’s virtues, it’s not always a good thing if done carelessly.
I’ll end this section with Aaronson’s reflections on AI and writing style:
“…I can think of writers—Shakespeare, Wodehouse, David Foster Wallace—who have such a distinctive style that, even if they tried to pretend to be someone else, they plausibly couldn’t. Everyone would recognize that it was them. So, you could imagine trying to build an AI in the same way. That is, it would be constructed from the ground up so that all of its outputs contained indelible marks, whether cryptographic or stylistic, giving away their origin. The AI couldn’t easily hide and pretend to be a human or anything else it wasn’t.”
Maybe this is the goal we should strive for: the birth of the AI stylema.
The watermarking scheme may change how we view ChatGPT
Although Aaronson’s long-term vision is appealing, a “statistical watermark” isn’t the same thing as having an indelible style. However, given that the goal is finding a way to recognize text from ChatGPT (or other language models), it’d suffice.
The watermark could resolve ChatGPT’s most dangerous hypotheticals. If OpenAI decided not just to implement it, but to share the key with institutions, companies, and universities (I don’t know if this is the case) they could prevent harmful uses.
But people may not like it.
ChatGPT amazed people despite its potential downstream issues. That’s because it allows them to do so much without coming across—or realizing they’ve come across—the model’s deficiencies.
For instance, an interesting, non-harmful application of ChatGPT is creativity enhancement. The creative process is by definition boundless—there’s no inherent wrongness—so people may transform what otherwise are critical flaws into a new form of co-creativity. Other popular uses people have found are letting ChatGPT take over parts of their jobs or replacing Google search (hopefully, with subsequent fact-checking).
The watermark wouldn’t hinder any of that as long as the use of these AI systems becomes generally accepted (ChatGPT’s products may be of lower value otherwise) and we learn to fit them into our current workflows (I use Grammarly to correct writing and it doesn’t feel like cheating in any sense).
But I don’t think this will satisfy people. Users don’t like to be told what they can or can’t do. For instance, they don’t like OpenAI’s safety filters—it was one of the first features people attacked (I’ve found it annoying at times but I get the reasons why they exist). They want freedom of choice, even if their actions could enter illegal or unethical territory.
It wouldn’t surprise me if most people despised the watermark idea.
If we take as reference what’s happened in the visual side of generative AI, we observe there’s little common ground between advocates of letting people choose individually and supporters of carefully studying broader implications of the emerging tech.
What happened when OpenAI set up guardrails around DALL-E’s generations? In the name of democracy, Stability.ai redefined the generative AI landscape with the release of the first high-quality open-source AI art model: Stable Diffusion.
History will repeat itself. Open source will, once again, be the main character.
Open source: hero or villain?
An open-source ChatGPT would allow people to maximize its potential without having to worry too much about anything. Many would happily choose freedom even if at a high cost. As soon as an open-source version of ChatGPT comes out, everyone will turn their backs on OpenAI.
It may still take some time, though. CarperAI (a group within Stability.ai) recently said that making an open-source ChatGPT would require more resources than they possess:
But, whenever it happens, a paid proprietary watermarked ChatGPT won’t be able to compete with a “free” (running models on GPU costs money), open-source, non-marked version—even if of a lower quality.
My current concerns—temporarily stopped by OpenAI’s watermarking solution—will surface again.
By then, this powerful technology will be out of control—not just free of any hypothetical watermark, but also free of the strong filters OpenAI has set up so ChatGPT is better aligned with humans and less prone to generate misinformation and make up facts.
In case it doesn’t look like it, I want to note that I’m generally in favor of open-source. I don’t think this groundbreaking world-changing technology should be proprietary. Yet, I think that mindlessly open-sourcing a ChatGPT-like model would not only not help with that problem but add others on top.
Unless we work to solve the tech’s flaws from the ground up, I don’t think it’ll ever be ready to be, free of ties, in the hands of anyone. Premature open-sourcing could be worse than privacy and control.
The good aspects of making ChatGPT open-source wouldn’t compensate for the problems it’d create.
I believe that openness is preferable to privacy and control and, at the same time, that safety should be the top priority. As I wrote in an essay on this topic:
“There’s a huge array of options between absolute privacy/control and absolute openness. Is somewhere in there that we’d find the best approach for large AI models. Creating ethics review mechanisms that ensure only good intentions in downstream uses are allowed is a great idea.
A firewall is better than no wall until better measures, methods, or mechanisms are in place. In the end, the medical community wouldn’t publish chemical data of a deadly virus in the name of open source, and the physics community wouldn’t publish exact instructions to build a Hydrogen bomb in the name of open source.
We should never forget that AI technology is powerful, more so thanks to the internet, and it should be constrained to the same rules that apply to other established fields. The space of non-accountability it enjoys now is an anomaly, a by-product of the speed at which it develops, but in no way desirable. The freedom some think they have now can easily be the suffering of others.”
It’s paramount that OpenAI implements the watermark. It wouldn’t hinder use, but unveil ChatGPT’s involvement in any given piece of work. I don’t think that can be seen as bad except for people with malicious intentions—or people who believe their freedom is above other people’s wellbeing.
Open source is critical for the democratization of technology, but we have to always put the upside against the downsides and assess that it’s better to do it that way. For now, even if in the hands of a private company, ChatGPT’s potential harm is moderately under control. It won’t be if, before its intrinsic flaws are solved, someone releases an open-source version.
I won’t end this piece on a high note.
I have to confess that I defend this stance out of responsibility, not out of hope. I don’t think open-source movements can be stopped (maybe just for a year or two) and I think—much to my regret—that it’d be better if we, for now, let OpenAI control ChatGPT.
Eventually, open-source AI will take us into uncharted territory. We may have to say goodbye to our ability to trust the written word. We’ll have to redefine our sha(tte)red reality from there.
The Algorithmic Bridge is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Hi Alberto, great piece as always. Thanks for educating me on these issues. Your writing is about perfect for me, as it introduces me to new information without overwhelming my comprehension. You'd make a good teacher I suspect.
As expressed earlier, I agree with your conclusion that attempts to control this technology, or any technology, are likely doomed to failure. Particular products can be made safe and responsible, but not the technology as a whole. Any technology is only as safe as the human beings running it.
If it's true that "we may have to say goodbye to our ability to trust the written word" and the careers of many writers are at risk, and our reality is being shattered, the question that arises here is...
What benefits of AI justify such a loss?
I don't claim to know, that just seems an interesting question to me. Probably not useful, but interesting at least.
You say "If we extrapolate this argument ... we find a society that no longer can trust the written word". But I am not scared. I read corporate media and things written by corporations. Corporations say things that humans would not say, that would damage humanity, all the time. This is not very much worse. AI is very good at doing things in a mediocre manner, as an incompetent person could do them. We already have to watch out for mediocre thought, mediocre writing, mediocre reporting and mediocre products. This seems to be just more of the same.