Discover more from The Algorithmic Bridge
The AI-Generated Education Issue That No One Detected (Double Pun Intended)
Let's end the conversation on AI detectors once and for all
Here’s an opinion I hold, apparently unpopular among college professors: Falsely accusing a student of cheating with AI is way worse than being tricked by all of them.
Here’s another one, apparently unpopular among AI detector builders: A detector that works most of the time but not always (without us knowing a priori which way the coin will fall on) is way worse than another that didn’t work at all.
Am I alone in thinking this?
Teachers are giving in to the easier option, possibly under the illusion that AI text detectors like GPTZero are good enough (I use “AI detector” here as an AI tool designed to detect AI-made from human-made text).
They are rightfully anxious and worried: The school year starting this fall will be harder than usual; no one can predict the impact generative AI will have on education. We are all figuring it out on the go.
This preemptive action by teachers—indiscriminately using faulty detection tools at their disposal—however well-intended, is just adding yet another problem on top of the upcoming AI-cheating epidemic.
The systematization of this issue, as anecdotal accounts suggest is already happening, is not only a profound social and educational failure but truly frustrating and disheartening for students who are willing, against peer pressure, to do things the right way (not just to abide by the rules but first and foremost to learn).
But, shouldn't AI detectors detect AI? Why is any of this happening in the first place?
Teachers: Don’t use AI detectors
Let's begin with the obvious but again, maybe not-so-obvious, hard truth: We can't trust AI detectors and they shouldn’t even be commercialized or promoted.
There’s an urge to find a remedy to the wave of AI tools that are making essay writing—well, pretty much any writing task that involves generation and creativity—obsolete. So it’s understandable that teachers have found relief in AI detectors.
In some cases, they are well aware detectors aren’t perfect and after an appropriate cost-benefit analysis consider it’s worth taking the risk of some false positives (i.e., unwarranted accusations) if that ensures they get no false negatives (i.e., no cheater leaves unpunished). But I'm sure—gut feeling, no data—that this is the minority; in most cases, teachers are naively well-intentioned: They simply fail to factor in the possibility that AI detectors may be flawed, not flawless.
This is their fault for not making the extra effort to understand the technology they’re using but also the media and detector-making companies’ fault for their tendency to exaggerate the abilities of these detectors in an attempt to both win the fight over our attention and convince us that the problem of AI-generated text flooding our channels of information isn’t that serious. And also, they really truly wish detectors worked perfectly.
I think it's safe to assume that most teachers don't just want detectors to catch AI cheaters but to do so infallibly and reliably. That's, of course, a laudable position, but not as achievable as it may seem from the superficial and logical-but-wrong assumption that the semantic relation between “generator” and “detector” implies equal but opposing capabilities.
Not only do generators exist a few steps ahead technology-wise than detectors, but the type of task they do becomes acceptable quality-wise at a much lower threshold. The precision required for a detector to be considered a usable tool is nearly 100%; we’re much more generous with generators and their confabulatory creativity.
So, the answer is a hard NO. AI detectors are not a solution. They don’t always catch cheaters. They sometimes falsely accuse non-cheaters. And there's no way around this, as we’ll soon see. They work way, way worse than generators at the task they’re supposed—and badly designed—to do.
All attempts to build an AI detector have failed
This didn’t stop researchers from trying, though. After ChatGPT was released and in parallel to reasonable worry, anticipation for a solution grew among teachers, fueled more by the anxiety of not being able to tell apart AI-made from human-made writing than by the promised efficacy of the solution.
After testing several approaches, the worst augurs came true and evidence settled on the unreliability and inaccuracy of detectors: neither universities, companies, nor independent researchers have managed, in the seven months since OpenAI turned the world upside down, to create an AI detector that matches the quality and ability of AI generators—even those much worse than ChatGPT.
(If you want to read a more in-depth analysis of how exactly detectors work and fail, I recommend you to check this overview by AI researcher Sebastian Raschka where he reviews the four main types of detectors and explains how they differ. For a hands-on assessment, I loved this article by Benj Edwards on Ars Technica.)
Five researchers from the University of Maryland published on June 28th what is possibly the most important study on the topic so far. Here’s their dismal conclusion [paragraph breaks mine]:
“In this paper, both empirically and theoretically, we show that several AI-text detectors are not reliable in practical scenarios.
Empirically, we show that paraphrasing attacks, where a light paraphraser is applied on top of a large language model (LLM), can break a whole range of detectors, including ones using watermarking schemes as well as neural network-based detectors and zero-shot classifiers. Our experiments demonstrate that retrieval-based detectors, designed to evade paraphrasing attacks, are still vulnerable to recursive paraphrasing.
We then provide a theoretical impossibility result indicating that as language models become more sophisticated and better at emulating human text, the performance of even the best-possible detector decreases.
For a sufficiently advanced language model seeking to imitate human text, even the best-possible detector may only perform marginally better than a random classifier.”
AI text generators like ChatGPT are trained, then fine-tuned, and then reinforced on human-written text, so as they improve it’s expected that some human-written text gets inevitably flagged as AI-written and vice versa (isn’t that exactly what companies like OpenAI and Google pretended; that we couldn't tell AI and human apart?)
Watermarking doesn’t work either
Maybe the solution lies in making AI-generated text intrinsically detectable by algorithms. Can AI tools be watermarked? Can companies design a sort of digital styleme into their products? As I wrote on December 2, just three days after ChatGPT was released:
“… human writing has characteristics that can, using the right tools, reveal authorship. As LMs become masters of prose, they may develop some kind of writing idiosyncrasy (as a feature and not a bug).
Maybe we could find the AI’s styleme (like a fingerprint hidden in language) not simply to distinguish ChatGPT from a human, but to distinguish its style from all others.”
But again, no luck. From the U. of Maryland study:
“We show that even LLMs protected by watermarking schemes can be vulnerable against spoofing attacks where adversarial humans can infer hidden LLM text signatures and add them to human-generated text to be detected as text generated by the LLMs, potentially causing reputational damage to their developers.”
Despite the discouraging evidence, AI companies are reportedly committed to marking AI-generated content internally detectable (OpenAI tried this initially but, as far as I know, they didn’t achieve the desired results).
They're incentivized by a forced preference toward human-made text data to train their future models. If they can't distinguish the output of their own models from the rest of the internet, they risk heading to a “model collapse” scenario in the not-so-far future. They're using synthetic data already but it only works if it's highly-curated and high-quality.
Random AI-generated text spread all over the internet by millions of daily users is bad for teachers but also for the companies making the models. Yet, regardless of such a strong incentive to get it right, not even the most talent-dense, deep-pocketed, AI-savvy companies have been capable of solving this problem infallibly and reliably (OpenAI removed its AI detector for lack of accuracy, which I consider the adequate move; the page now returns a “not found” error).
We all were counting on AI detectors to stop the seemingly unstoppable great AI flood and teachers in particular to prevent the AI-cheating epidemic, but I think it’s safe to say that we can dismiss them.
As Wharton professor and AI enthusiast, Ethan Mollick told Ars Tecnica:
“I can speak from the perspective of an educator working with AI to say that, as of now, AI writing is undetectable and likely to remain so, AI detectors have high false positive rates, and they should not be used as a result."
But avoiding a bad strategy is not a strategy: what can teachers do to prepare for the AI-driven forced overhaul of the education system? I will explore this question in my next article, as a follow-up to this one.