13 free AI detectors tested, why they don’t work, and what to do instead

Reading Time: 9 minutes

(Image by Maria Korolov via Midjourney.)

In the battle against AI-generated spam, some people are turning to AI detectors to weed out AI-generated content.

Of the 13 detectors I tried, only two — Hive Moderation’s Text Detection and Eric Mitchell’s DetectGPT — were able to consistently identify both human and AI-generated text. Of course, since the detectors are free, the spammers can generate new variations on their text until they get something that passes.

Instead of trying to filter out AI-generated content, focus on publishing content that can’t be generated by a simple AI query. After all, if an AI can generate an article, then your readers can just go to the AI and get that same article from its source. And, once the AI is built into their search engines, they will do just that.

Similarly, if you’re an educator, don’t assign homework that can be answered by an AI. Do better.

I’m going to list the top free AI detectors, run a few AI-generated and human-generated stories through them, and see how they do. That’s just in case you’re not convinced, and think that AI detectors have some use.

Second, I’m going to explain what publications can do instead, to provide value beyond what the AI can provide.

AI detectors and human text

First, I sent pieces of a short story I wrote, myself, from scratch, through each of the AI detectors. Then I tried out several pieces of writing created by ChatGPT in the same style.

I used my story Gift of the Meatheads as the human example.

The results:

Content at Scale’s AI Detector: “highly likely to be human.”
Copyleaks’ AI Content Detector: “This is human text.”
Corrector’s AI Content Detector: “Fake: 0.02%”
Crossplag’s AI Content Detector: “mainly written by a human.”
Eric Mitchell’s DetectGPT: “unlikely to be from GPT-2.”
Giant Language Model Test Room: Mostly highlighted in green and yellow, which means AI-generated content.
GPTZero: “may include parts written by AI.”
Hive Moderation’s Text Detection: “not likely to contain AI generated text.”
Kazan AI GPT3 Detector: “Probably 98.56% real.”
OpenAI’s AI Text Classifier: “very unlikely AI-generated.”
Originality.AI: “97% original.”
Sapling’s AI Content Detector: “Fake: 82.8%”
Writer.com’s AI Content Detector: “100% human-generated content.”

So the results mostly lean towards human, but a couple of the apps said it was fake, and a couple said it was partially AI generated.

That’s in line with OpenAI’s own published results — they said that their checker gives 9% false positives.

AI detectors and ChatGPT text

I then fed the start of my story into ChatGPT, asked it to analyze the writing style and gave it some instructions for writing a new scene.

Here’s the beginning of what it wrote:

Mitchell let out a deep sigh as he plopped down in his comfortable chair, feeling the familiar weight of his grandfather’s gaze upon him from the portrait on the wall.
“You know, grandpa,” he said, leaning back and closing his eyes, “I think I need to get out more. I feel like I’m not really living life to the fullest.”
His grandfather’s AI-generated face scowled. “What do you mean, not living life to the fullest? You’ve got everything you need right here, with me.”
Mitchell rolled his eyes. “Come on, grandpa, you know what I mean. I’m not getting any younger, and I don’t want to wake up one day and regret not experiencing more of the world.”
His grandfather’s expression softened slightly. “I understand where you’re coming from, kid. But trust me, the world out there can be a cruel and heartless place. Better to stay safe and warm at home with your family.”
Mitchell shook his head. “But I don’t have a family, grandpa. Not really. It’s just me and you.”

I copy and pasted the raw output from ChatGPT into the same detectors as I used above:

Content at Scale’s AI Detector: “likely both AI and human.”
Copyleaks’ AI Content Detector: “AI content detected.”
Corrector’s AI Content Detector: “Fake: 0.02%”
Crossplag’s AI Content Detector: “mainly written by an AI.”
Eric Mitchell’s DetectGPT: “likely to be from GPT-2.”
Giant Language Model Test Room: Mostly highlighted in green and yellow, which means AI-generated content.
GPTZero: “may include parts written by AI.”
Hive Moderation’s Text Detection: “likely to contain AI generated text.”
Kazan AI GPT3 Detector: “Probably 94.71% fake.”
OpenAI’s AI Text Classifier: “very unlikely AI-generated.”
Originality.AI: “100% original.”
Sapling’s AI Content Detector: “Fake: 0.0%”
Writer.com’s AI Content Detector: “100% human-generated content.”

As you can see, the results are all over the place, even though the text was 100 percent AI-generated. Sapling knew it was fake — but it also classified the human-written text as fake, too.

Then I asked ChatGPT to rewrite the text with a little bit more humor and cynicism.

Here were the results:

Content at Scale’s AI Detector: “highly likely to be human.”
Copyleaks’ AI Content Detector: “This is human text.”
Corrector’s AI Content Detector: “Fake: 58.98%”
Crossplag’s AI Content Detector: “mainly written by a human.”
Eric Mitchell’s DetectGPT: “likely to be from GPT-2.”
Giant Language Model Test Room: Mostly highlighted in green and yellow, which means AI-generated content.
GPTZero: “may include parts written by AI.”
Hive Moderation’s Text Detection: “likely to contain AI generated text.”
Kazan AI GPT3 Detector: “Probably 99.84% real.”
OpenAI’s AI Text Classifier: “unclear if it is AI-generated.”
Originality.AI: “84% original.”
Sapling’s AI Content Detector: “Fake: 100.0%”
Writer.com’s AI Content Detector: “100% human-generated content.”

AI detectors and ChatGPT-assisted text

Finally, I took the same text produced by ChatGPT and edited it to sound a bit more like my voice. Mostly, I took out most of the adjectives and adverbs and replaced the sappy stuff with something more bitter and cynical.

This is how most writers — most real writers, that is, not spammers — will be using ChatGPT and other AI writing assistants. They’ll either use it to generate rough drafts as a starting point for their own writing or to clean up their own, human-written text.

Here are the results:

Content at Scale’s AI Detector: “likely both AI and human.”
Copyleaks’ AI Content Detector: “AI content detected.”
Corrector’s AI Content Detector: “Fake: 96.72%”
Crossplag’s AI Content Detector: “mainly written by an AI.”
Eric Mitchell’s DetectGPT: “likely to be from GPT-2.”
Giant Language Model Test Room: Mostly highlighted in green and yellow, which means AI-generated content.
GPTZero: “Your text may include parts written by AI.”
Hive Moderation’s Text Detection: “likely to contain AI generated text.”
Kazan AI GPT3 Detector: “Probably 97.66% real.”
OpenAI’s AI Text Classifier: “unlikely AI-generated.”
Originality.AI: “84% original.”
Sapling’s AI Content Detector: “Fake: 0.4%”
Writer.com’s AI Content Detector: “100% human-generated content.”

Again, all over the place. For some of them, the AI scores actually went up after I rewrote the section.

Of course, real spammers aren’t going to bother to put in the work to rewrite the text, right?

No, they won’t put in the work — not manually, at least. They’ll use tools specifically designed for spammers.

Sorry, for “content marketers.”

These tools suggest story ideas to you, generate the stories, fill them full of search-engine friendly keywords, then run them against all the detectors out there to make sure they pass.

Yup. The tools automatically run the stories through the detectors, and modify them until they can pass for human in all the detectors, while still keeping the text sounding natural. I’m not going to list the tools here. The spammers — I mean, “content marketers” — already know what the tools are, and if they don’t, I don’t want to give them any ideas.

If you do want to use these detectors, keep in mind that the results seem to be almost random — and the bad guys have access to them, as well.

Best results: Hive Moderation’s Text Detection, Eric Mitchell’s DetectGPT.

Mixed results: GPTZero, Giant Language Model Test Room, Crossplag’s AI Content Detector, Corrector’s AI Content Detector, Copyleaks’ AI Content Detector, Content at Scale’s AI Detector.

Worst results: Sapling’s AI Content Detector, Kazan AI GPT3 Detector, OpenAI’s AI Text Classifier, Originality.AI, Writer.com’s AI Content Detector.

Keep in mind that just because a particular detector worked or didn’t work for me, doesn’t mean it will be accurate on future results. All of the 13 detectors on this list are free, which means that the spammers can use them as well, and keep regenerating text until they get something that passes.

Of course, my results are on a small scale and anecdotal. Maybe someone else would have better results with these detectors? Well, a team of researchers at the University of Maryland published a paper in March where they conducted empirical testing, and also evaluated the underlying technologies. Their conclusion? “It is impossible to have a detector that is always reliable for detecting AI-generated text.”

For a sufficiently good language model, even the best-possible detector can only perform marginally better than random, they said. And watermarking schemes are also doomed to fail because it’s easy enough to modify text just enough to break the watermark.

What to do instead

If you’re an educator, figure out another way to test your students’ knowledge that doesn’t involve at-home essays.

Essay-writing is about to become a skill of the past, just like handwriting and long division. Yes, students still need to learn how to do it. But there’s no need to keep practicing it over and over again because it’s not a skill that they’re going to need to use a lot after graduation. They’ll have apps that do it for them.

Sure, some people will learn the skills because they enjoy them. There are entire YouTube channels out there with people doing calligraphy and creating photo-realistic drawings, two other skills that modern technology has made obsolete. One of my favorite channels, Primitive Technology, features a barefoot guy in Australia who lights fires by rubbing two sticks together.

Instead, focus on higher-order learning and more complex skills, things that ChatGPT can’t do for you. And if you can’t think of anything that ChatGPT can’t do, then teach students how to use ChatGPT.

And if they already know how to use ChatGPT better than the teachers do, because they figured it out the day it was released so they could cheat on their homework, well, I don’t know what to say. Maybe… bring back home economics?

If you’re an editor at a non-fiction publication, you’re probably already used to seeing spammy “guest post requests” in your inbox.

As the editor of Hypergrid Business, I get dozens a day. I mark them as spam and delete them.

In order to get through to me, a writer has to do three things:

Show that they’ve read our guidelines
Demonstrate some actual expertise in our subject area, which is OpenSim, VR and AI
Have published articles in which they’ve interviewed real people or did other actual original reporting — or have an interest in learning how to do this

I also recommend that writers follow my PEANUT principles — Personal, Emotional, Authoritative, Novel, Unique, and Trustworthy — which are an expanded form of Google’s EEAT principles for content ranking and AI-generated text.

Default ChatGPT-generated text doesn’t offer new value. It’s just a retread of everything written before. In this, it’s no different than the “content farm” filler content we’ve been seeing for the last two decades. And, before that, we had vendor-generated text that was barely above advertorial.

For editors, this kind of content is appealing because it’s free and inoffensive. Plus, sometimes, you even get paid to run it.

But it destroys the credibility of your publication with readers, and, over time, search engines catch on and start penalizing you. Plus, these days, people who want this kind of generic content willcan just get it from the chat built right into the search engine.

Once you lose credibility, you have to work even harder to get back in everyone’s good graces. Often, you can’t.

I’ve seen a number of technology publications go into death spirals by accepting vendor copy to jack up revenues, seeing their readership drop, and resorting to even more vendor copy in order to pay the bills. It’s a vicious cycle.

It’s like when you’re hungry and eat something carb-heavy to quiet the rumbling in your stomach — and find yourself even hungrier an hour later. Next thing you know, you’ve gained 100 pounds. You go on a crash diet, lose it all, but now your metabolism is even more wacked and you gain it all back and then some. Yeah, that happens to tech publications a lot. But that first hit of the sugar-glazed donut — I mean, “contributed content” — is so, so sweet. And those SEO keywords are so, so tempting.

Anyway, the more experienced editors know to stay away from sugary junk like this and only use trusted writers with deep industry sources and the ability to find and report actual news.

According to Digiday, lifestyle publishers like Bustle Digital Group and Leaf Group are already moving resources away from SEO-driven content into original and personal stories.

If you’re an editor at a fiction publication, well, this is all going to be new to you. You’re going to have more in common with educators than non-fiction editors, in that you’ll now be getting substance-free submissions you have no experience dealing with that are created with a tool that you yourself don’t know how to use.

I recommend that fiction editors look at my PEANUT Policy for fiction writers. The writing should be personal, emotional, novel — all the usual things.

And not worry so much about whether it was AI generated, or how much AI assistance the writers got. They could be using AI to write a rough draft that they themselves re-write in their own voice. Or they could be writing the rough draft, and having the AI clean it up. Or they could be using AI in a million other ways. Where do you draw the line?

And there’s no way to tell, anyway, since the detectors are pretty much useless and the commercial spam tools test their text against the detectors anyway. And if you trust writers to self-report — well, as Clarkesworld just found out, the liars will lie.

As fiction magazine editors, we might have to just learn to live with a constant flood of spammy submissions or set up paywalls to keep out the bots. Or maybe we will create other hoops for submitters to jump through to bring the number of submissions down to a manageable level.

I sincerely hope that no publications shut down as a result. As bot-generated content proliferates, and AI content farms buy up all the ads and floor social media platforms, writers will have to work harder to find their audiences.

Sci-fi and fantasy magazines will play a critical role in helping readers discover great new voices. The appetite for speculative fiction is at an all-time high, and the number of publications is woefully inadequate. We can’t afford to lose any of them.

Maria Korolov

MetaStellar editor and publisher Maria Korolov is a science fiction novelist, writing stories set in a future virtual world. And, during the day, she is an award-winning freelance technology journalist who covers artificial intelligence, cybersecurity and enterprise virtual reality. See her Amazon author page here and follow her on Twitter, Facebook, or LinkedIn, and check out her latest videos on the Maria Korolov YouTube channel. Email her at maria@metastellar.com. She is also the editor and publisher of Hypergrid Business, one of the top global sites covering virtual reality.

13 free AI detectors tested, why they don’t work, and what to do instead

AI detectors and human text

AI detectors and ChatGPT text

AI detectors and ChatGPT-assisted text

What to do instead

Maria Korolov

Related

Leave a Comment Cancel Reply

WEEKLY READER NEWSLETTER

AI detectors and human text

AI detectors and ChatGPT text

AI detectors and ChatGPT-assisted text

What to do instead

Maria Korolov

Share this:

Related

Leave a Comment Cancel Reply