Proposal 033: Workflow Fan-Out and Temporal Orchestration¶
Based on:
doc/project/30-stories/story-006-voluntary-swarm-exchange.mddoc/project/30-stories/story-006-buyer-node-components.mddoc/project/40-proposals/021-service-offers-orders-and-procurement-bridge.mddoc/project/40-proposals/025-seed-directory-as-capability-catalog.mddoc/project/40-proposals/029-workflow-template-catalog.md
Status¶
Draft
Date¶
2026-04-06
Executive Summary¶
The current Arca workflow engine supports a single vertical slice: one ordered
sequence of steps, each targeting one provider, executed synchronously within one
workflow run. This is not enough for story-006, which requires a buyer-side
orchestrator to solicit multiple providers in parallel, collect their responses
within a deadline, and select the best outcome.
This proposal introduces two orthogonal but composable primitives:
- Fan-out — a workflow step that dispatches one request to multiple discovered targets simultaneously and waits for responses according to a configurable aggregation policy.
- Temporal orchestration — step-level timeout, retry, and deadline declarations that give the host deterministic control over how long any part of a workflow may wait before escalating or failing.
Both primitives are expressed as declarative fields in WorkflowDefinition step
records; neither requires Arca to become a protocol authority or to acquire
signing or settlement responsibilities.
Problem Statement¶
story-006 has the following buyer-side lifecycle shape:
Buyer submits service-order
→ discover providers from Seed Directory / offer catalog
→ solicit N providers in parallel ← fan-out
→ collect responses within deadline ← temporal
→ select best offer ← fan-in / aggregation
→ open procurement contract ← existing substrate
The current Arca engine can only express the last step. The middle three are entirely absent from the data model. The gap is not algorithmic complexity — it is missing expressibility in the workflow definition contract.
What is absent today¶
| Need | Current state |
|---|---|
| Send to N targets at once | Not expressible — targets are always singular |
| Discover targets at runtime from catalog | Not expressible — targets are hardcoded |
| Wait for first/any/all/quorum responses | Not expressible |
| Timeout a waiting step | Not expressible |
| Retry a failed step with backoff | Not expressible |
| Deadline across the whole workflow run | Not expressible |
Goals¶
- Make fan-out and temporal constraints expressible in
WorkflowDefinitionsteps without breaking the existing sequential step format. - Keep
Arcaa hosted workflow module: the host resolves targets, enforces deadlines, and controls dispatch;Arcaproposes workflow intent. - Preserve the existing single-target sequential path as the zero-config default — all new fields are optional.
- Define contracts precisely enough to implement without speculative over-engineering.
Non-Goals¶
- Full workflow DSL or process algebra.
- Sub-workflow nesting or recursive fan-out (post-MVP).
Arcaacquiring signing or settlement authority.- Replacing or modifying the existing procurement substrate.
- Multi-hop fan-out chains (max chain depth = 1 in MVP).
Proposed Data Model¶
1. Fan-Out Target Descriptor¶
A step's target field is extended from a single participant reference to a
FanOutTarget descriptor. When absent, the existing single-target behaviour
is preserved.
"target": {
"resolve": "capability",
"capability_id": "offer-catalog",
"filter": {
"service_type": "text/summarise"
},
"limit": 8
}
Alternative form — static list (for testing or known-participant flows):
"target": {
"resolve": "static",
"participants": [
"participant:did:key:z6Mk...",
"participant:did:key:z6Mk..."
]
}
Fields:
| Field | Type | Description |
|---|---|---|
resolve |
"capability" | "static" |
How targets are discovered |
capability_id |
String | Required when resolve = "capability"; queried against Seed Directory |
filter |
Object | Optional additional filter passed to catalog query |
limit |
u32 | Maximum number of targets to solicit (default: unbounded) |
participants |
[String] | Required when resolve = "static" |
When resolve = "capability", the host resolves targets by querying the local
Seed Directory cache for participants holding a current, valid capability
passport for capability_id. This resolution happens at step execution time,
not at workflow definition time.
2. Fan-In Policy¶
When target expands to more than one participant, the step needs a rule for
deciding when to proceed.
"fan_in": {
"policy": "any_one",
"min_responses": 1
}
policy |
Meaning |
|---|---|
any_one |
Proceed as soon as one response arrives. Default. |
all |
Proceed only when every dispatched target has responded. |
quorum |
Proceed when min_responses responses have arrived. |
best_of |
Collect all responses until deadline, then select by score_field. |
Additional fields:
| Field | Type | Description |
|---|---|---|
min_responses |
u32 | Required for quorum; minimum acceptable response count |
score_field |
String | Required for best_of; JSON path in response to sort by |
score_order |
"asc" | "desc" |
Sort direction for best_of (default: "desc") |
3. Temporal Constraints¶
Temporal constraints are declared per-step as a timing object.
"timing": {
"timeout": "PT30S",
"retry": {
"max_attempts": 3,
"backoff": "PT5S",
"backoff_multiplier": 2.0
},
"on_timeout": "skip"
}
| Field | Type | Description |
|---|---|---|
timeout |
ISO 8601 duration | Maximum wall-clock time to wait for this step to complete |
retry.max_attempts |
u32 | How many times to retry before giving up |
retry.backoff |
ISO 8601 duration | Initial wait between retries |
retry.backoff_multiplier |
f32 | Exponential multiplier (1.0 = constant, 2.0 = doubling) |
on_timeout |
"fail" | "skip" | "abort_workflow" |
What to do when timeout expires without a result |
A workflow-level deadline can also be declared at the root:
"deadline": "PT5M"
This is an absolute maximum duration for the entire workflow run from first step
to completion. When elapsed, any pending steps are cancelled and the run is
marked deadline_exceeded.
4. Extended Step Record (full shape)¶
A step in WorkflowDefinition.plan.steps with the new optional fields:
{
"step_id": "solicit-providers",
"service_type": "text/summarise",
"input": { "text": "{{input.text}}" },
"target": {
"resolve": "capability",
"capability_id": "offer-catalog",
"filter": { "service_type": "text/summarise" },
"limit": 5
},
"fan_in": {
"policy": "any_one"
},
"timing": {
"timeout": "PT30S",
"on_timeout": "skip"
}
}
Backward compatibility: when target, fan_in, and timing are all absent,
the step behaves exactly as it does today.
Execution Model¶
Fan-out and temporal constraints are enforced by the host (the Node daemon),
not by Arca. Arca proposes a WorkflowDefinition; the host executes it.
Fan-out dispatch (host responsibilities)¶
- Resolve
target→ produce list ofparticipant_idvalues. - For each target, create a child dispatch record (see below) and send the step's service request via the existing peer session mechanism.
- Collect responses into a
FanInBufferkeyed by(step_id, participant_id). - Apply
fan_in.policyto decide when the step is done. - Cancel outstanding dispatches after
timing.timeoutor when policy is satisfied, whichever comes first.
Dispatch tracking¶
The host maintains a WorkflowStepDispatch record per dispatched target:
WorkflowStepDispatch {
dispatch_id: String, // "dispatch:<uuid>"
run_id: String, // parent workflow run
step_id: String,
target: String, // participant_id
dispatched_at: String, // RFC 3339
status: Pending | Responded | Timeout | Cancelled,
response: Option<JsonValue>,
responded_at: Option<String>,
}
These records are appended to the commit log under
state/workflow-dispatch/<dispatch_id>.
Temporal enforcement¶
The daemon runs a background tick (reusing the existing workflow eviction task or alongside it) that:
- Checks all
Pendingdispatches against theirtiming.timeout. - On timeout: updates dispatch status to
Timeout, applieson_timeoutpolicy to the parent step, triggers retry if configured. - Checks all active workflow runs against the run-level
deadline. - On deadline exceeded: cancels all pending dispatches for the run, marks run
deadline_exceeded.
Arca contract¶
Arca sees only the completed step output (the selected fan-in result) —
it does not see the raw fan-out dispatches or intermediate responses.
The host produces one canonical step_output from the fan-in buffer and
presents it to Arca as if the step had a single response.
This preserves the existing Arca ↔ host interface contract.
Task Fulfillment Policy¶
Step execution completion and task fulfillment are distinct concepts:
execution.completedmeans the selected provider or dispatcher returned a syntactically usable response.task.fulfilledmeans the workflow has a declared reason to treat the domain task as done.
Arca MUST NOT treat hidden middleware mutation of a step JSON payload as an
implicit fulfillment authority. If a component outside the current step output
decides fulfillment, the workflow definition MUST declare that decision source
explicitly. This keeps domain intent visible in workflow data and prevents
middleware-chain side effects from becoming an invisible task-completion
mechanism.
A step may therefore carry a fulfillment block such as:
{
"fulfillment": {
"policy": "external_decision",
"decision_source": {
"kind": "capability",
"capability_id": "publication.verify",
"input": {
"commit_sha": { "from": "/steps/publish/output/publish_commit" },
"branch": "publish/main"
}
},
"result_match": {
"path": "/verification/status",
"fulfilled_values": ["fulfilled"],
"not_fulfilled_values": ["not_fulfilled", "rejected"]
},
"on_not_fulfilled": "pause",
"on_error": "fail"
}
}
The decision source may be a Sensorium connector, another middleware capability, an operator or requester confirmation, or an automatic policy. The important invariant is that it is named in the workflow definition.
The responder may be Arca-aware and return a dedicated decision envelope:
{
"schema": "arca-task-fulfillment-decision.v1",
"fulfillment/status": "fulfilled",
"reason": "commit-visible-on-origin",
"evidence": {
"commit_sha": "8fa2...",
"ref": "origin/publish/main"
}
}
It may also be deliberately dull and Arca-agnostic:
{ "ok": true }
In the second case the workflow-owned result_match adapter derives
task.fulfilled from the simple JSON shape. The domain-specific component owns
the meaning of ok; Arca owns only the declared matching rule and the
resulting workflow transition.
Workflow Run Status Extensions¶
Add two new terminal statuses to WorkflowRunStatus:
| Status | Meaning |
|---|---|
deadline_exceeded |
The run-level deadline was reached before all steps completed |
step_timeout |
A step timed out and on_timeout = "fail" or "abort_workflow" |
Interaction With Existing Proposals¶
| Proposal | Relation |
|---|---|
| 021 — service-offers-orders | Fan-out is the dispatch mechanism for multi-provider solicitation defined there |
| 025 — seed-directory as capability catalog | resolve = "capability" queries the Seed Directory; no new discovery primitive needed |
| 029 — workflow template catalog | Templates may include target, fan_in, timing fields; no template changes required beyond accepting new fields |
| 032 — key delegation passports | Capability passport verification used during target resolution; orthogonal |
Hard MVP Scope¶
The following is the minimum viable scope for the first implementation sprint:
| Feature | MVP |
|---|---|
target.resolve = "static" |
Yes |
target.resolve = "capability" |
Yes |
fan_in.policy = "any_one" |
Yes |
fan_in.policy = "all" |
Yes |
fan_in.policy = "quorum" |
Deferred |
fan_in.policy = "best_of" |
Yes |
timing.timeout + on_timeout |
Yes |
timing.retry |
Deferred |
Run-level deadline |
Yes |
WorkflowStepDispatch commit log records |
Yes |
| Background deadline enforcement tick | Yes |
Arca sees only canonical fan-in output |
Yes |
Deferred items are post-MVP and MUST NOT be implemented before the MVP scope is clean and tested.
Open Questions¶
-
Fan-in output shape — when
policy = "any_one", the step output is the first response. Whenpolicy = "all", should it be an array of responses or should the host pick one? Proposed:allreturns an array under aresponseskey;Arcais responsible for selecting from it. -
Dispatch cancellation on peer unavailability — if a target is unreachable at dispatch time, should it count against
min_responsesforquorum? Proposed: unreachable targets are excluded from the denominator. -
Capability resolution caching — should
resolve = "capability"always re-query Seed Directory at step execution time, or is a short TTL cache acceptable? Proposed: reuse the existingDelegationCache/PassportCachesync interval; no additional query at dispatch time.