DICTIONARY V2.0

Decoding the Machine.

Understanding the terminology behind Large Language Models helps you better detect and ethically use them.

B – Burstiness F – False Positive G – GenAI H – Hallucinations L – LLM M – Machine Learning N – NLP P – Perplexity T – Token

B. Burstiness

METRIC

Simple Definition

A measure of the variation in sentence structure and length throughout a document. AI tends to be monotonous (low burstiness). Humans write in bursts—short sentences followed by long, complex ones.

CORE METRIC

While Perplexity measures word choice, Burstiness measures structural rhythm. AI can fake perplexity, but faking burstiness is much harder for current models.

F. False Positive

ERROR TYPE

Simple Definition

An error where the detection tool incorrectly identifies human-written text as AI-generated. This is the worst-case scenario in academic settings.

SYSTEM SAFETY

Our system is tuned to minimize false positives. We’d rather miss an AI paper (False Negative) than accuse an innocent student.

G. Generative AI (GenAI)

The broad category of Artificial Intelligence that can create new content—including text, images, audio, and code—in response to prompts. Unlike traditional AI which classifies data (e.g., spam filters), GenAI creates new data.

H. Hallucinations

RISK FACTOR

Simple Definition

When an AI confidently states a fact that is completely false or made up. It’s like an AI “dreaming” or lying because it prioritizes sounding fluent over being factual.

Technical Deep Dive

Hallucinations occur because LLMs do not “know” facts; they predict tokens. If the most probable next token creates a false statement, the model will generate it without validation capability.

DETECTION TIP

While our detector focuses on syntax, factual errors are a huge “tell” for human readers. If a paper cites a book that doesn’t exist, it’s a hallmark of AI hallucination.

L. Large Language Model (LLM)

CORE TECH

Simple Definition

A type of AI trained on massive amounts of text data. Think of it as a super-advanced “autocomplete” that predicts the next likely word in a sentence based on billions of examples it has read.

Technical Deep Dive

LLMs utilize the Transformer architecture, which allows them to pay attention to different parts of a sentence simultaneously (self-attention mechanism) to understand context, rather than reading sequentially. Examples include GPT-5, Claude 4, and Llama.

WHY IT MATTERS FOR DETECTION

Because LLMs are probabilistic engines, they tend to choose the most statistically probable words. Our detector looks for this “smoothness”—human writing is often jagged and unpredictable compared to an LLM.

Check text for LLM patterns →

M. Machine Learning (ML)

A subset of AI where computers learn from data without being explicitly programmed for every rule. Instead of writing code that says “if X then Y”, engineers feed data to the machine and let it figure out the patterns (the rules) itself.

N. Natural Language Processing (NLP)

FIELD

Simple Definition

The branch of AI focused on helping computers understand, interpret, and manipulate human language. It’s the “bridge” between computer code (0s and 1s) and human speech (words and meaning).

HOW WE USE IT

We use NLP techniques to “read” your essay not just for keywords, but for syntactic structures, grammatical dependencies, and semantic coherence.

P. Perplexity

METRIC

Simple Definition

A measurement of how “surprised” an AI model is by a piece of text. Low perplexity means the text is predictable (likely AI). High perplexity means the text is complex and chaotic (likely Human).

CORE METRIC

This is one of the two main signals GPTZero uses. We visualize this on the scanner page as the “Complexity” bar.

T. Token

The basic unit of text that an LLM processes. A token is not always a full word; it can be part of a word (like “ing” or “pre”). For example, the word “learning” might be split into “learn” and “ing”.

Fact: 1,000 tokens is roughly equal to 750 words in English. This is why API pricing is often per-token.

T. Temperature

A parameter in AI models that controls randomness. High temperature (e.g., 0.9) makes the AI more creative but prone to hallucinations. Low temperature (e.g., 0.1) makes it focused and deterministic.