Requirements 005: Transcript Segment and Bundle Schemas v1¶
Based on:
- doc/project/50-requirements/requirements-002.md
- doc/project/50-requirements/requirements-003.md
- doc/project/50-requirements/requirements-004.md
- doc/project/40-proposals/004-human-origin-flags-and-operator-participation.md
- doc/project/40-proposals/005-operator-participation-room-policy-profiles.md
Date: 2026-03-17
Status: Draft
Executive Summary¶
This document freezes v1 schema shape for TranscriptSegment and TranscriptBundle.
The goal is not maximal richness. The goal is a small, interoperable, audit-friendly contract that preserves:
- question and room identity,
- message provenance,
- human-origin semantics,
- visibility and consent basis,
- transcript integrity and redaction state.
Context and Problem Statement¶
requirements-004.md defines what the transcript and training pipeline must preserve, but it does not yet freeze the exact data shape.
Without a concrete v1 schema:
- transcript monitors may export incompatible payloads,
- archivists may lose provenance on ingest,
- curators may flatten
human-liveandnode-mediated-human, - training nodes may receive under-specified corpus metadata.
These schema contracts also sit in the middle of two upstream flows:
requirements-002.mddefines how correction and accepted learning outcomes arise in answer rooms,requirements-003.mddefines how approved artifacts later move into archivist and vault storage.
The transcript schemas therefore need to be compatible both with correction outcomes and with later archival packaging.
Design Principles¶
- Data first:
- schemas should be plain, portable, and easy to validate in
EDNorJSON. - Append-only provenance:
- transcript records describe observed events, not mutable object state.
- Minimal trusted core:
- the schema should preserve what later layers need, but not pull in room implementation details unnecessarily.
- Open model:
- extra fields MAY exist, but required fields and core enums must remain stable.
Enumerations¶
origin_class¶
Allowed values:
node-generatednode-mediated-humanhuman-live
operator_presence_mode¶
Allowed values:
nonemediateddirect-live
visibility_scope¶
Allowed values:
private-to-swarmfederation-localcross-federationglobal
consent_basis¶
Allowed v1 values:
not-requiredoperator-consultationexplicit-consentfederation-policypublic-scopeemergency-exception
redaction_status¶
Allowed values:
nonepartialfull-derived
room_policy_profile¶
Allowed values:
nonemediated-onlydirect-live-allowed
TranscriptSegment v1¶
Required fields¶
schema/vsegment_idquestion_idchannel_idmessage_idspeaker_refgateway_node_reforigin_classoperator_presence_modehuman_origintscontentvisibility_scopeconsent_basisprovenance_refs
Optional fields¶
redaction_markerscontent_hashlanguagereply_toattachmentspolicy_annotations
Field constraints¶
schema/vMUST equal1.segment_id,question_id,channel_id, andmessage_idMUST be stable strings.speaker_refMUST identify the semantic speaker at the room boundary.gateway_node_refMUST identify the node that injected the message into the room or relay path.human_originMUST be:falsewhenorigin_class = node-generatedtruewhenorigin_class = node-mediated-humantruewhenorigin_class = human-liveoperator_presence_modeMUST be:nonewhenorigin_class = node-generatedmediatedwhenorigin_class = node-mediated-humandirect-livewhenorigin_class = human-livetsMUST be an ISO-8601 UTC timestamp.contentMUST be a string or a structured content object with a stable textual projection.provenance_refsMUST be an array, even when empty.redaction_markersMUST describe removals or transformations rather than silently rewriting content history.
JSON example¶
{
"schema/v": 1,
"segment_id": "segment:01JNZ8V8EK7M94N8WQ4Q70WJ5V",
"question_id": "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9",
"channel_id": "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9",
"message_id": "matrix:$Qaf3M9d0event",
"speaker_ref": "nym:fed-pl:operator-7f3c",
"gateway_node_ref": "node:pl-wro-7f3c",
"origin_class": "human-live",
"operator_presence_mode": "direct-live",
"human_origin": true,
"ts": "2026-03-17T21:14:08Z",
"content": "I have seen this migration fail when the overlap window was too short.",
"visibility_scope": "federation-local",
"consent_basis": "federation-policy",
"provenance_refs": [
"room:fed-pl:question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9",
"event:matrix:$Qaf3M9d0event"
],
"language": "en",
"content_hash": "sha256:98f6bcd4621d373cade4e832627b4f6..."
}
EDN example¶
{:schema/v 1
:segment_id "segment:01JNZ8V8EK7M94N8WQ4Q70WJ5V"
:question_id "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9"
:channel_id "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9"
:message_id "matrix:$Qaf3M9d0event"
:speaker_ref "nym:fed-pl:operator-7f3c"
:gateway_node_ref "node:pl-wro-7f3c"
:origin_class "human-live"
:operator_presence_mode "direct-live"
:human_origin true
:ts "2026-03-17T21:14:08Z"
:content "I have seen this migration fail when the overlap window was too short."
:visibility_scope "federation-local"
:consent_basis "federation-policy"
:provenance_refs ["room:fed-pl:question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9"
"event:matrix:$Qaf3M9d0event"]
:language "en"
:content_hash "sha256:98f6bcd4621d373cade4e832627b4f6..."}
TranscriptBundle v1¶
Required fields¶
schema/vbundle_idquestion_idchannel_idsource_scopecreated_atsource_nodessegmentscontains_human_origincontains_direct_human_liveconsent_basisredaction_statusintegrity_proof
Optional fields¶
room_policy_profilesummary_refssource_transportretention_profilepolicy_annotations
Field constraints¶
schema/vMUST equal1.segmentsMUST be an array ofTranscriptSegmentrecords or content-addressed references to such records.contains_human_originMUST betrueif any segment hashuman_origin = true.contains_direct_human_liveMUST betrueif any segment hasorigin_class = human-live.source_nodesMUST include every node that acted as a gateway for included segments, if known.consent_basisat bundle level MUST represent the archival/publication basis for the bundle as a whole, not just one segment.redaction_statusMUST reflect the exported bundle form, not the original room state.integrity_proofMUST carry enough information to verify bundle integrity or locate the verification artifact.
JSON example¶
{
"schema/v": 1,
"bundle_id": "bundle:01JNZ94NQ90KJ0H2VTW6J8Q0D9",
"question_id": "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9",
"channel_id": "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9",
"source_scope": "federation-local",
"created_at": "2026-03-17T21:32:44Z",
"source_nodes": [
"node:pl-wro-7f3c",
"node:pl-wro-secretary-2"
],
"segments": [
{
"schema/v": 1,
"segment_id": "segment:01JNZ8V8EK7M94N8WQ4Q70WJ5V",
"question_id": "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9",
"channel_id": "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9",
"message_id": "matrix:$Qaf3M9d0event",
"speaker_ref": "nym:fed-pl:operator-7f3c",
"gateway_node_ref": "node:pl-wro-7f3c",
"origin_class": "human-live",
"operator_presence_mode": "direct-live",
"human_origin": true,
"ts": "2026-03-17T21:14:08Z",
"content": "I have seen this migration fail when the overlap window was too short.",
"visibility_scope": "federation-local",
"consent_basis": "federation-policy",
"provenance_refs": [
"room:fed-pl:question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9",
"event:matrix:$Qaf3M9d0event"
]
}
],
"contains_human_origin": true,
"contains_direct_human_live": true,
"consent_basis": "federation-policy",
"redaction_status": "partial",
"room_policy_profile": "direct-live-allowed",
"integrity_proof": {
"alg": "sha256+ed25519",
"manifest_hash": "sha256:3f786850e387550fdab836ed7e6dc881...",
"signer": "node:pl-wro-secretary-2",
"signature": "base64url:MEQCIG..."
}
}
EDN example¶
{:schema/v 1
:bundle_id "bundle:01JNZ94NQ90KJ0H2VTW6J8Q0D9"
:question_id "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9"
:channel_id "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9"
:source_scope "federation-local"
:created_at "2026-03-17T21:32:44Z"
:source_nodes ["node:pl-wro-7f3c"
"node:pl-wro-secretary-2"]
:segments [{:schema/v 1
:segment_id "segment:01JNZ8V8EK7M94N8WQ4Q70WJ5V"
:question_id "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9"
:channel_id "question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9"
:message_id "matrix:$Qaf3M9d0event"
:speaker_ref "nym:fed-pl:operator-7f3c"
:gateway_node_ref "node:pl-wro-7f3c"
:origin_class "human-live"
:operator_presence_mode "direct-live"
:human_origin true
:ts "2026-03-17T21:14:08Z"
:content "I have seen this migration fail when the overlap window was too short."
:visibility_scope "federation-local"
:consent_basis "federation-policy"
:provenance_refs ["room:fed-pl:question:01JNY6M2X6Y8M1G5R4Z3K7Q2P9"
"event:matrix:$Qaf3M9d0event"]}]
:contains_human_origin true
:contains_direct_human_live true
:consent_basis "federation-policy"
:redaction_status "partial"
:room_policy_profile "direct-live-allowed"
:integrity_proof {:alg "sha256+ed25519"
:manifest_hash "sha256:3f786850e387550fdab836ed7e6dc881..."
:signer "node:pl-wro-secretary-2"
:signature "base64url:MEQCIG..."}}
Validation Rules¶
- A segment with
origin_class = human-liveMUST fail validation ifhuman_origin = false. - A segment with
origin_class = node-generatedMUST fail validation ifoperator_presence_mode != none. - A bundle with
contains_direct_human_live = trueMUST fail validation if no included segment hasorigin_class = human-live. - A bundle with
contains_human_origin = falseMUST fail validation if any included segment hashuman_origin = true. - A bundle declaring
room_policy_profile = noneMUST fail validation if any included segment hashuman_origin = true. - Unknown enum values MUST be rejected in strict mode and quarantined in ingest pipelines that operate in compatibility mode.
Compatibility Rules¶
- Producers MAY add extra fields.
- Consumers MUST ignore unknown fields unless strict federation policy forbids it.
- Producers MUST NOT repurpose existing field meanings in v1.
- Breaking semantic changes require
schema/v = 2.
Open Questions¶
- Should
contentbe normalized to a single textual field in archival bundles, with multimodal payloads referenced externally? - Should
speaker_refsupport an explicit composite form fornode + operator nym? - Should bundle integrity be Merkle-based in v1, or is manifest-hash plus signature enough?
Next Actions¶
- Define machine-readable JSON Schema and EDN/spec or Malli forms for these contracts.
- Add conformance test vectors for all three
origin_classvariants. - Bind ingest validation to room-policy profile checks from
proposal 005. - Extend
CurationDecisionandCorpusEntryschemas with human-origin eligibility markers.