Precision in Numbers.
We believe in full transparency. Our detection engine is rigorously tested against millions of samples from ChatGPT, Gemini, Claude, and human writers.
OVERALL ACCURACY
99.8%
+0.4% from V3.0
FALSE POSITIVE RATE
0.02%
Industry Standard: ~1.5%
TRAINING DATASET
10M+
Documents Analyzed
Why False Positives Matter
In academic settings, accusing a student of using AI when they didn’t (a False Positive) is a serious ethical failure. We optimize our models to minimize False Positives, even if it means occasionally missing a sophisticated AI text (False Negative).
-
Conservative Scoring
We only flag text as “AI” when confidence exceeds 98%.
-
Multi-Model Verification
Text is run through 3 separate detection architectures (BERT, RoBERTa, Custom) before a verdict.
F1 SCORE COMPARISON (HIGHER IS BETTER)
Known Limitations
Short Text Snippets
Detection becomes unreliable for texts under 50 words. There simply isn’t enough data (burstiness) to form a statistical pattern.
Mixed Content
“AI-Polished” human text (human writes, AI fixes grammar) often triggers false positives. We label this “Mixed” to warn users.
Paraphrasing Tools
Heavy use of tools like Quillbot can disrupt detection patterns. We have specific “Quillbot Mode” models, but they are experimental.
Code & Math
Logic-based content (code, formulas) follows strict rules, making it hard to distinguish from AI. Our tool is optimized for prose.
Download Technical Whitepaper
Read the full IEEE formatted paper on our methodology, dataset composition, and peer-reviewed results.