SUBSCRIBE

AI By Eye: Improving Your AI Writing Detection Skills—And Where Detection Tools Fit

The black and white image shows a city scene with a large sky scraper in the middle. Embedded on the large tower building is a QR code pattern in white. In the paved road leading to the block is also a QR code like pattern. The street is fairly empty in the city with a few trees lining the side walks. The shot is taken from a low angle, so it as if the user is looking up to the tower block building.
[adsanity align='alignnone' id=67243]
[adsanity align='alignnone' id=68581]

--- Advertisement ---

By Max Spero, CEO and Co-Founder, Pangram

When it comes to checking whether a student assignment is an original product, teachers often ask us if it’s best to scan a student assignment for AI first, then have the teacher review it. Or is it better for the teacher to review it first, then run it through an AI detection tool? In principle either way works, but recent research from the University of Maryland offers an interesting take on the relationship between the teacher, the writing student, AI-generated content and tools including the so-called plagiarism checkers.

At Pangram, our initial advice is: take a look at the work first, then use a review tool. That way, an originality score provided by the tech does not influence the human judgement. But with the right process, there is no harm in doing it the other way around—scan first, then assess. If a high probability score for AI use is returned, maybe read that paper a little more thoroughly. Of course, this makes sense as long as you count on a tool with virtually no false positives.

Over the past 2 years, my team and I talked to hundreds of professors, college and university leaders about the use of AI, its impact on learning and academic integrity, and the technology and techniques to detect AI use in writing. Most have been surprised to learn that people can actually spot text created by AI rather accurately. The paper, from researchers at the University of Maryland, Microsoft, and the University of Massachusetts, Amherst may shed light on why some people know when a writer is trying to pass off writing by a LLM as being their own.

Yes, you can detect AI by eye—5 lessons from the UMD research

The preprint (pending peer review) describes the research in which University of Maryland researcher Jenna Russell and her team hired 9 native English speakers online to annotate an article of under a thousand words that was either human or AI-written, assigned at random. The human-written content was sourced from renowned American publications such as The New York Times, National Geographic or AP; while the AI-written content was generated to match length and structure, and kept the title and subtitle of the original article. For some of the experiments, they put AI-generated content through paraphrasing and “humanization” prompts. (Find the prompts for the research, including the generation prompt, at Russell’s GitHub repository). The human rater then had to provide a verdict, a 1-to-5 confidence score, a couple of supporting highlights as pieces of evidence, and a small paragraph justifying their call. The subjects produced a total of 1,790 verdicts on 300 articles, mostly from 5 annotators under a within-subjects design. 5 AI detectors —GPTZero, Fast-DetectGPT, Binoculars, IBM RADAR and Pangram— were also used to generate verdicts.

[2501.15654] People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text • arxiv

The UMD study found that annotators who are regular users of LLMs like ChatGPT were nearly infallible in spotting AI in other people’s work. In fact, the humans who participated in the research had a better accuracy rate than every AI detection technology in the study except one. The detector that matched the accuracy of the human annotators was ours, Pangram. I believe this is because we designed our AI writing detector to replicate as close as possible the steps an effective human detector of generative AI works.

Here are five lessons, and practical tips, us human, particularly educators, can apply when considering the source of a piece of text.

Lesson 1: Review the text with the intention to specifically look for AI

The annotators were instructed to specifically look for AI-generated writing. Their assignment was to ferret out the authentic from the imitation. Other research has shown that educators are unlikely to catch AI if they are not specifically looking for AI, or perhaps if they’re not quite familiar with how AI text feels like.

The tip: Be on the lookout for GenAI, not to be suspicious by default, but to have a more refined evaluation technique. When reading, practice looking for AI writing, and start to get familiar with the differences. This is where running papers through a detector after your eyeball review helps.

Lesson 2: Get a second (or third) opinion from colleagues or from AI

Annotators were not acting alone. The researchers asked them to review submissions then emit a verdict of AI or non-AI. They created a final score, based on a majority vote of the five human experts. Having multiple eyeballs improved accuracy.

The tip: Ask a colleague or a well-seasoned TA to read the papers you’ve reviewed. This isn’t necessarily practical in every setting, but if the stakes are high and a handful of papers are in question, having another person’s perspective can be critical. Equally helpful is a solid AI detector. Do your research and find one that suits your needs, and ideally provides independent, third-party evidence regarding its accuracy.

Lesson 3: Use LLMs and get familiar with how they work

The annotators were frequent users of LLMs. They knew what AI text sounded like because they spent time getting to know them, and their output. Generative AI models use unique patterns in words, phrases, sentences, and general structures. They are not particularly creative when it comes to connecting ideas. Their logic is not always sound, and when it is it has recognizable features. We’ve shared a lot of the tips in this useful guide to spotting AI writing patterns.

The tip: Ask ChatGPT or the AI of your preference to do the assignments you give your students, several times, to see possible patterns you can check in assignments. Vary the prompts to get more formal, more conversational, less academic, or even add spelling errors. Over time this will help you gain a sense for common AI artifacts, and it will make it easier to spot them in student work.

Lesson 4: Be clear when and how AI is allowed in assignments and assessments, and be specific about the consequences

For assignments in which AI use is limited or prohibited, if you’ve been learning about AI and plan on getting a second or third review, we also highly recommend telling your students that you are doing so. Clearly informing students of the expectations, and informing them that you are checking – even how you’re checking – can deter unauthorized AI usage in the first place.

The tip: Engage in real conversations about AI with your students, and set clear rules and boundaries on when generative AI content is allowed. Many institutions are implementing an ethics pledge for students.

Lesson 5: If using an AI detector, beware of those connected to humanizers and other cheating tools

Not all AI detectors are good. Not all of them are even designed to detect AI accurately. Some provide misleading results, either unintentionally or by design.

The tip: Stay away from AI detectors from companies that sell services to improve, fix, or modify writing. as these are “humanizers.” These sell “correction” or “improvement” services only when they detect a high amount of AI. Not only do these companies want to scare students into using their services, they are cheating services in disguise, not to mention potentially taking advantage of student work for their models without notice, let alone retribution.

To sum up

If you’re a teacher or a professional involved in assessments or text review duties, we suggest learning all you can about how GenAI and LLMs work, how AI processes and spits out assignments, and how it answers questions. The goal here is to be in a position where you can confidently trust your gut based on solid experience, and having a source for second or even third opinions.

It may feel counterintuitive for me, as CEO of an AI writing detection company, to talk about humans detecting AI. The fact of the matter is, we want to know as much as possible about the real and very human relationship between teacher and student, and how it’s reflected in writing education. Our vision to become an ally of teachers, that augments their ability rather than replace it, and to honor and empower their effectiveness as educators, is what we believe will set us apart from what the market is offering right now. It is a vision shared by a growing number of users and investors.

Max Spero is the co-founder and CEO of Pangram

About the author

Max Spero is the co-founder and CEO of Pangram. He and his friend and fellow Stanford alum, Bradley Emi, founded their first AI company in 2023. Today, Pangram Labs is the most accurate AI detector on the market. Read a comparison of multiple AI detectors including Pangram here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

The Latest

The eLearn Podcast

[adsanity align='alignnone' id=68581]

--- Advertisement ---

Subscribe to our newsletter

Education technology has the power to change lives. 

To get the latest news, information and resources about online learning from around the world by clicking on the button below.