Glossary

Terms used throughout gitgap, the NAUGHT→CAUGHT→FOUND lifecycle, and the gap detection pipeline. These definitions are operational — they reflect how the system uses each term, which may differ from casual usage.

Lifecycle states

TermDefinition
NAUGHT A declared research gap with no active resolution attempt. A gap in its natural state — extracted from a paper, indexed, awaiting a champion. NAUGHT is not a failure. It is the baseline state of all identified knowledge gaps.
CAUGHT A gap that has been matched to an active submission attempt. A cosmoid is assigned at catch time. The gap is "in play." CAUGHT does not mean resolved — it means someone is working on it. The catch confidence score measures how well the submission abstract aligns with the gap declaration.
FOUND A gap that has been resolved. The resolving paper has been accepted or published. found_at timestamp is set; the paper's cosmoid and DOI are stored. Lifecycle sealed. On the globe: gold ring.
REJECTED A resolution attempt that failed peer review. The attempt is not discarded — the rejection mode, internal notes, and pickup instructions are all preserved. The rejected trail is public. The next researcher can continue from exactly where the last one stopped.

Gap structure

TermDefinition
Gap endpoint A single extracted declaration of a research gap, tied to its source paper. One paper can produce multiple gap endpoints. Each endpoint is an independently trackable unit with its own lifecycle state.
Declaration text The verbatim or near-verbatim sentence(s) from the source paper that constitute the gap statement. The raw signal — preserved exactly as extracted.
Gateway term The primary disciplinary keyword that indexes the gap. Assigned by the extraction pipeline. Used for search, filtering, and structural hole detection. Examples: "Morton encoding", "Hubble tension", "predictive policing bias".
Confidence 0–1 score representing how clearly the extracted text constitutes a genuine gap declaration. High confidence = explicit, specific, actionable declaration. Low confidence = vague, generic, or ambiguous.
Phase Internal pipeline stage. Phase 1 = extracted from full text. Phase 2 = LLM-validated. Not the same as lifecycle state (NAUGHT/CAUGHT/FOUND/REJECTED).

People and roles

TermDefinition
Keeper A human reviewer who validates extracted gaps before they enter the live index. Keepers read each declaration and mark pass (genuine gap, enters index) or fail (not a genuine gap, excluded). The keeper is the only required human step in the pipeline.
Cosmoid A unique identifier assigned to a submission attempt in eaiou (the authoring layer). The cosmoid links a gap endpoint to a specific resolution effort — it is the provenance token for "who is catching this gap, and when."

Intelligence scores

TermDefinition
Bridge potential 0–1 score measuring how likely a gap is resolvable using techniques from a different discipline than the one that declared it. High bridge potential = the gap is a structural hole — a technique transfer opportunity. Computed from source discipline × technique origin detection.
CAP score Corpus-Appreciated Phenomenon score. Measures how "ripe" a gap is for resolution based on: Existence Consensus, Measurement Stability, Explanatory Entropy, Methodological Formalization, and Temporal Convergence Rate. Higher CAP = more evidence, more agreement, clearer measurement path.
Catch confidence Cosine similarity between the submission abstract and the gap declaration vector. Stored at catch time. Informational — the author drives the claim, not the score.
ADI Appreciated Duration Index. Gap age is a positive signal, not a deterrent. A gap that has been open for 15 years is a compounding opportunity — 15 years of other researchers failing to find a solution, which means the solution is genuinely novel when found.

Architecture

TermDefinition
Structural hole A gap addressable by techniques known in discipline A but not known or applied in discipline B. The technique origin is the bridge. Structural holes are the highest-value gaps in the index — they require no new science, only cross-disciplinary transfer. Browse at Structural Holes.
Convergence cluster A group of gap endpoints from different papers that are semantically near-identical (cosine distance < 0.25). A cluster with ≥3 members from ≥2 papers is an "agreed-upon gap" — the field has independently identified the same unresolved problem.
Tombstone A paper or gap marked as retracted, withdrawn, or otherwise removed from the live index. Tombstoned records are preserved for provenance — the trail of what existed is never deleted, only flagged.
Reconcile Periodic sync between gitgap and a journal source (PMC search, OAI-PMH feed). Adds new articles, detects tombstoned articles (no longer in the source), updates article counts. Idempotent — re-running reconcile never creates duplicates.
OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting. The universal journal feed protocol — used by OJS (Open Journal Systems), institutional repositories, and many independent publishers. Each journal has its own OAI endpoint URL. gitgap uses OAI-PMH for incremental journal harvesting.
Ingest pipeline The full processing chain from raw paper to indexed gap: fetch → parse → classify → embed → deduplicate → enrich → CAP score → keeper queue. Modular — each stage is independently testable and replaceable.