Przejdź do treści

Corpus Entry v1

Source schema: doc/schemas/corpus-entry.v1.schema.json

Machine-readable schema for curated corpus entries derived from accepted bundles or promoted knowledge artifacts.

Governing Basis

Project Lineage

Requirements

Stories

Fields

Field Required Shape Description
schema/v yes const: 1 Schema version.
entry/id yes string Stable identifier of the curated corpus entry.
source/type yes enum: transcript-bundle, knowledge-artifact, archival-package Primary source class from which the corpus entry was assembled.
source/id yes string Identifier of the primary source artifact.
content/ref yes string Stable reference to the curated content body.
domain/tags yes array Domain and topic tags assigned by curation.
quality/grade yes enum: low, medium, high Curation quality assessment of the entry.
risk/grade yes enum: low, moderate, high Risk classification relevant to later publication or training use.
training/eligibility yes enum: blocked, needs-review, approved Training eligibility state assigned to the corpus entry.
provenance/manifest yes string Reference to provenance manifest sufficient to reconstruct source lineage.
contains-human-origin no boolean Whether the curated entry preserves human-originated source material.
language no string Primary language of the curated content.
creator/refs no array Curator, secretary, or contributor references that should survive attribution-sensitive flows.
policy_annotations no object Optional implementation-local annotations that do not change the core corpus-entry semantics.

Conditional Rules

Rule 1

When:

{
  "properties": {
    "risk/grade": {
      "const": "high"
    }
  },
  "required": [
    "risk/grade"
  ]
}

Then:

{
  "properties": {
    "training/eligibility": {
      "enum": [
        "blocked",
        "needs-review"
      ]
    }
  }
}

Field Semantics

schema/v

  • Required: yes
  • Shape: const: 1

Schema version.

entry/id

  • Required: yes
  • Shape: string

Stable identifier of the curated corpus entry.

source/type

  • Required: yes
  • Shape: enum: transcript-bundle, knowledge-artifact, archival-package

Primary source class from which the corpus entry was assembled.

source/id

  • Required: yes
  • Shape: string

Identifier of the primary source artifact.

content/ref

  • Required: yes
  • Shape: string

Stable reference to the curated content body.

domain/tags

  • Required: yes
  • Shape: array

Domain and topic tags assigned by curation.

quality/grade

  • Required: yes
  • Shape: enum: low, medium, high

Curation quality assessment of the entry.

risk/grade

  • Required: yes
  • Shape: enum: low, moderate, high

Risk classification relevant to later publication or training use.

training/eligibility

  • Required: yes
  • Shape: enum: blocked, needs-review, approved

Training eligibility state assigned to the corpus entry.

provenance/manifest

  • Required: yes
  • Shape: string

Reference to provenance manifest sufficient to reconstruct source lineage.

contains-human-origin

  • Required: no
  • Shape: boolean

Whether the curated entry preserves human-originated source material.

language

  • Required: no
  • Shape: string

Primary language of the curated content.

creator/refs

  • Required: no
  • Shape: array

Curator, secretary, or contributor references that should survive attribution-sensitive flows.

policy_annotations

  • Required: no
  • Shape: object

Optional implementation-local annotations that do not change the core corpus-entry semantics.