AI Policy

All papers ingested into gitgap are interrogated for AI-generated content. Non-declared = treated as AI-free. The interrogation is not punitive — it is transparency infrastructure. Detection results are stored as provenance, not as exclusion gates.

The doctrine: If a paper does not explicitly disclose AI usage, it is treated as an implicit AI-free claim. All implicit AI-free claims are interrogated.

Why non-declared = AI-free

Most journals and research systems have no AI disclosure field. When a paper is silent on AI usage, there are two interpretations: the authors didn't use AI, or the authors used AI and chose not to disclose it. There is no neutral reading of silence.

gitgap treats silence as an AI-free claim because:

It is the most defensible interpretation under academic integrity standards — disclosure is the author's obligation
It creates a consistent, auditable policy — every paper is treated the same regardless of source
It builds a provenance record that has value even when the detection result is "no signals" — the interrogation happened, the result is stored

How interrogation works

Pass 1 — Disclosure scan (always runs, ~10ms)

The full text is scanned for explicit AI disclosure — specific tool names and explicit disclosure phrases: ChatGPT, GPT-4, Claude, AI-assisted writing, language model was used to, etc.

If disclosure is found: ai_declared = 'yes'. No flag is set. The AI detection score is not computed — the paper has already answered the question.

Generic terms like "language model" are not in the disclosure list — papers that study language models are not disclosing AI-assisted writing. Only explicit, specific tool names and disclosure phrases trigger the declared state.

Pass 2 — Heuristic analysis (always runs if not declared)

Four signals are scored independently and combined into a composite score:

Signal	What it measures	AI pattern
Sentence length uniformity	Coefficient of variation in sentence length across the text	AI text: CV ≈ 0.15–0.35. Human academic: CV ≈ 0.45–0.85
Hedge phrase density	Frequency of over-hedged language per 100 words	"it is important to note", "it should be noted", "it is evident that"
AI signature vocabulary	Presence of statistically AI-heavy phrases	"delve into", "underscore", "in the realm of", "as mentioned above", "pivotal role"
Generic transition density	Frequency of structured-list transition phrases	"furthermore,", "moreover,", "in conclusion,", "firstly,", "notably,"

Composite score < 0.40: result recorded, no escalation, no flag.

Pass 3 — LLM escalation (when heuristic ≥ 0.40, if API key is configured)

Papers that cross the heuristic threshold are escalated to an LLM judge. The LLM analyzes the text holistically — not just individual signals, but the overall prose character, specificity of empirical claims, and naturalness of sentence variation.

The LLM result replaces the heuristic score if it is higher (the LLM has broader signal coverage). The method field records whether LLM escalation ran.

LLM escalation is optional — if no API key is configured, the heuristic score stands. This keeps the system functional in air-gapped or low-cost deployments.

What the flag means

ai_flag = 1 when: final score ≥ 0.62 AND no disclosure was found.

A flag is a transparency marker, not an exclusion gate. Flagged papers:

Remain in the gap index
Display a visible marker on the gap detail page ("⚠ AI signals detected — no usage declared")
Are queryable via the API for downstream analysis
Are not removed — the gap declaration may still be valid regardless of how the prose was written

The flag is a signal to the keeper reviewer and to downstream users. A keeper can still mark a flagged gap as pass — the provenance record shows both the AI detection result and the keeper's independent judgment.

What is not flagged

Papers with explicit AI disclosure — these are correctly categorized as AI-assisted, not flagged
Papers with low heuristic scores — scored and stored, result is "no signals"
Papers where the gap declaration itself is strong regardless of prose quality

Retroactive interrogation

Papers ingested before this system was implemented have ai_interrogated_at = NULL. A retroactive batch run is available via the admin panel. This will process all existing papers and populate detection scores — useful for establishing a baseline across the existing corpus.

For journals and authors

If your paper or journal uses AI-assisted writing and you want to ensure it is correctly categorized rather than flagged:

Include an explicit AI disclosure in the paper text — acknowledgments, methods, or a dedicated disclosure statement
Name the specific tool used (ChatGPT, Claude, Gemini, etc.) — generic phrases are less reliably detected
When submitting via the API, the ingest endpoint accepts a ai_declared field — set it to "yes" to bypass interrogation and record the disclosure directly

The goal is not to penalize AI use. The goal is to make the provenance of every paper in the gap index transparent and auditable. A paper that declares AI usage is as valid as one that doesn't — what matters is whether the declaration matches the reality.