Workflow
The NAUGHT → CAUGHT → FOUND / REJECTED lifecycle. Every gap that enters gitgap follows this path. Nothing is discarded — every state is tracked, every failed attempt preserved.
States
NAUGHT
A gap has been declared in a peer-reviewed paper. The pipeline extracted the declaration from the discussion, limitations, or conclusions section, assigned a gateway term, computed a confidence score, and added it to the index. No active resolution attempt is linked yet. The gap is in its natural state — open, undirected.
NAUGHT gaps are the raw output of the ingest pipeline. They require keeper review before entering the live index. A keeper reads the extracted declaration and marks pass (enters the index) or fail (excluded as not a genuine gap declaration).
CAUGHT
A submission has been matched to this gap. A cosmoid (unique attempt identifier) is assigned. The gap is now "in play" — someone is actively working on resolving it. The catch confidence (cosine similarity between the paper abstract and the gap declaration) is stored as a quality signal.
CAUGHT does not mean the gap is resolved. It means the gap has an active champion. The gap stays CAUGHT until the attempt either succeeds (FOUND) or fails (REJECTED).
FOUND
The gap has been resolved. A paper addressing it has been accepted or published.
The gap lifecycle is sealed — found_at timestamp recorded, the resolving
paper's cosmoid and DOI stored. On the globe, FOUND gaps render with a gold ring.
REJECTED
A resolution attempt failed peer review. This is not a dead end — it is a data point. The rejected trail records:
- Rejection mode — why the attempt failed (methodology, scope, insufficient evidence, etc.)
- Rejection notes — internal editorial record
- Pickup instructions — what the next researcher needs to address before the gap can be closed
The pickup instructions are the critical value. The next researcher can continue from exactly where the last one stopped. The trail is public. The effort is not discarded.
Keeper Review
Keeper review is the human validation step between extraction and the live index. A keeper reads each extracted declaration and makes a binary judgment:
- Pass — this is a genuine, specific gap declaration. Enters the live index.
- Fail — this is generic, speculative, or not a true gap declaration. Excluded.
The keeper is the only required human step in the gitgap pipeline. Everything else is automated. Keepers can review individually at Gaps or batch-review using the bulk select interface.
Ingest cycle
- PMC or OAI-PMH source → paper harvested (new or updated)
- Full text parsed → abstract, methods, conclusions extracted
- Gap extraction → LLM identifies declaration sentences with gateway terms
- Embedding → vector generated for similarity search
- Deduplication → near-identical gaps merged (cosine distance < 0.10)
- Discipline enrichment → source discipline, bridge potential scored
- CAP score → corpus-appreciated phenomenon score computed
- Keeper queue → awaiting human verdict
- Live index → searchable, visible on globe
Bridge potential and Structural Holes
Some gaps are addressable using techniques known in one discipline but unknown in another. These are structural holes. A CS technique that solves a psychology problem is a structural hole — bridge potential 0.70–1.0.
Structural holes are the highest-value gaps in the index. They don't require new science — they require someone who knows both fields to recognize the transfer.
Browse structural holes at Structural Holes.