Decoding the Machine.
Understanding the terminology behind Large Language Models helps you better detect and ethically use them.
B. Burstiness
METRICSimple Definition
A measure of the variation in sentence structure and length throughout a document. AI tends to be monotonous (low burstiness). Humans write in bursts—short sentences followed by long, complex ones.
CORE METRIC
While Perplexity measures word choice, Burstiness measures structural rhythm. AI can fake perplexity, but faking burstiness is much harder for current models.
F. False Positive
ERROR TYPESimple Definition
An error where the detection tool incorrectly identifies human-written text as AI-generated. This is the worst-case scenario in academic settings.
SYSTEM SAFETY
Our system is tuned to minimize false positives. We’d rather miss an AI paper (False Negative) than accuse an innocent student.
G. Generative AI (GenAI)
The broad category of Artificial Intelligence that can create new content—including text, images, audio, and code—in response to prompts. Unlike traditional AI which classifies data (e.g., spam filters), GenAI creates new data.
H. Hallucinations
RISK FACTORSimple Definition
When an AI confidently states a fact that is completely false or made up. It’s like an AI “dreaming” or lying because it prioritizes sounding fluent over being factual.
Technical Deep Dive
Hallucinations occur because LLMs do not “know” facts; they predict tokens. If the most probable next token creates a false statement, the model will generate it without validation capability.
DETECTION TIP
While our detector focuses on syntax, factual errors are a huge “tell” for human readers. If a paper cites a book that doesn’t exist, it’s a hallmark of AI hallucination.
L. Large Language Model (LLM)
CORE TECHSimple Definition
A type of AI trained on massive amounts of text data. Think of it as a super-advanced “autocomplete” that predicts the next likely word in a sentence based on billions of examples it has read.
Technical Deep Dive
LLMs utilize the Transformer architecture, which allows them to pay attention to different parts of a sentence simultaneously (self-attention mechanism) to understand context, rather than reading sequentially. Examples include GPT-5, Claude 4, and Llama.
WHY IT MATTERS FOR DETECTION
Because LLMs are probabilistic engines, they tend to choose the most statistically probable words. Our detector looks for this “smoothness”—human writing is often jagged and unpredictable compared to an LLM.
Check text for LLM patterns →M. Machine Learning (ML)
A subset of AI where computers learn from data without being explicitly programmed for every rule. Instead of writing code that says “if X then Y”, engineers feed data to the machine and let it figure out the patterns (the rules) itself.
N. Natural Language Processing (NLP)
FIELDSimple Definition
The branch of AI focused on helping computers understand, interpret, and manipulate human language. It’s the “bridge” between computer code (0s and 1s) and human speech (words and meaning).
HOW WE USE IT
We use NLP techniques to “read” your essay not just for keywords, but for syntactic structures, grammatical dependencies, and semantic coherence.
P. Perplexity
METRICSimple Definition
A measurement of how “surprised” an AI model is by a piece of text. Low perplexity means the text is predictable (likely AI). High perplexity means the text is complex and chaotic (likely Human).
CORE METRIC
This is one of the two main signals GPTZero uses. We visualize this on the scanner page as the “Complexity” bar.
T. Token
The basic unit of text that an LLM processes. A token is not always a full word; it can be part of a word (like “ing” or “pre”). For example, the word “learning” might be split into “learn” and “ing”.
Fact: 1,000 tokens is roughly equal to 750 words in English. This is why API pricing is often per-token.
T. Temperature
A parameter in AI models that controls randomness. High temperature (e.g., 0.9) makes the AI more creative but prone to hallucinations. Low temperature (e.g., 0.1) makes it focused and deterministic.