AI Policy
All papers ingested into gitgap are interrogated for AI-generated content. Non-declared = treated as AI-free. The interrogation is not punitive — it is transparency infrastructure. Detection results are stored as provenance, not as exclusion gates.
Why non-declared = AI-free
Most journals and research systems have no AI disclosure field. When a paper is silent on AI usage, there are two interpretations: the authors didn't use AI, or the authors used AI and chose not to disclose it. There is no neutral reading of silence.
gitgap treats silence as an AI-free claim because:
- It is the most defensible interpretation under academic integrity standards — disclosure is the author's obligation
- It creates a consistent, auditable policy — every paper is treated the same regardless of source
- It builds a provenance record that has value even when the detection result is "no signals" — the interrogation happened, the result is stored
How interrogation works
Pass 1 — Disclosure scan (always runs, ~10ms)
The full text is scanned for explicit AI disclosure — specific tool names and
explicit disclosure phrases:
ChatGPT, GPT-4, Claude,
AI-assisted writing, language model was used to, etc.
If disclosure is found: ai_declared = 'yes'. No flag is set.
The AI detection score is not computed — the paper has already answered the question.
Generic terms like "language model" are not in the disclosure list — papers that study language models are not disclosing AI-assisted writing. Only explicit, specific tool names and disclosure phrases trigger the declared state.
Pass 2 — Heuristic analysis (always runs if not declared)
Four signals are scored independently and combined into a composite score:
| Signal | What it measures | AI pattern |
|---|---|---|
| Sentence length uniformity | Coefficient of variation in sentence length across the text | AI text: CV ≈ 0.15–0.35. Human academic: CV ≈ 0.45–0.85 |
| Hedge phrase density | Frequency of over-hedged language per 100 words | "it is important to note", "it should be noted", "it is evident that" |
| AI signature vocabulary | Presence of statistically AI-heavy phrases | "delve into", "underscore", "in the realm of", "as mentioned above", "pivotal role" |
| Generic transition density | Frequency of structured-list transition phrases | "furthermore,", "moreover,", "in conclusion,", "firstly,", "notably," |
Composite score < 0.40: result recorded, no escalation, no flag.
Pass 3 — LLM escalation (when heuristic ≥ 0.40, if API key is configured)
Papers that cross the heuristic threshold are escalated to an LLM judge. The LLM analyzes the text holistically — not just individual signals, but the overall prose character, specificity of empirical claims, and naturalness of sentence variation.
The LLM result replaces the heuristic score if it is higher (the LLM has broader
signal coverage). The method field records whether LLM escalation ran.
LLM escalation is optional — if no API key is configured, the heuristic score stands. This keeps the system functional in air-gapped or low-cost deployments.
What the flag means
ai_flag = 1 when: final score ≥ 0.62 AND no disclosure was found.
A flag is a transparency marker, not an exclusion gate. Flagged papers:
- Remain in the gap index
- Display a visible marker on the gap detail page ("⚠ AI signals detected — no usage declared")
- Are queryable via the API for downstream analysis
- Are not removed — the gap declaration may still be valid regardless of how the prose was written
The flag is a signal to the keeper reviewer and to downstream users. A keeper can still mark a flagged gap as pass — the provenance record shows both the AI detection result and the keeper's independent judgment.
What is not flagged
- Papers with explicit AI disclosure — these are correctly categorized as AI-assisted, not flagged
- Papers with low heuristic scores — scored and stored, result is "no signals"
- Papers where the gap declaration itself is strong regardless of prose quality
Retroactive interrogation
Papers ingested before this system was implemented have ai_interrogated_at = NULL.
A retroactive batch run is available via the admin panel. This will process all existing
papers and populate detection scores — useful for establishing a baseline across the
existing corpus.
For journals and authors
If your paper or journal uses AI-assisted writing and you want to ensure it is correctly categorized rather than flagged:
- Include an explicit AI disclosure in the paper text — acknowledgments, methods, or a dedicated disclosure statement
- Name the specific tool used (ChatGPT, Claude, Gemini, etc.) — generic phrases are less reliably detected
- When submitting via the API, the ingest endpoint accepts a
ai_declaredfield — set it to"yes"to bypass interrogation and record the disclosure directly