Proposal 063: Inquirium as a Model Inquiry Organ¶
Based on:
doc/project/40-proposals/019-supervised-local-http-json-middleware-executor.mddoc/project/40-proposals/045-sensorium-local-enaction-stratum.mddoc/project/40-proposals/048-sensorium-os-connector-action-classes.mddoc/project/40-proposals/049-json-e-middleware-transformer-executor.mddoc/project/40-proposals/055-bounded-deferred-operation-contract.mddoc/project/60-solutions/019-middleware/019-middleware.mdnode:model-runtime/README.mdnode:nse/README.md
Refined by:
doc/project/40-proposals/064-inquirium-implementation-recommendations.md
Status¶
Accepted
Date¶
2026-05-19
Executive Summary¶
Orbiplex should introduce Inquirium as the node organ for model-backed inquiry and inference. Sensorium remains the node's organ of contact with the world: observation, directive mediation, external signals, OS actions, and sensorimotor outcomes. Inquirium owns a different domain: acts of asking models to generate, classify, embed, summarize, rerank, transform, or otherwise infer from a bounded context.
This proposal does not remove the existing model-runtime work. It reframes it
as the lower execution substrate of Inquirium: runtime catalogs, model profiles,
provider adapters, local process lifecycle, health checks, transport mappings,
resource policies, egress policies, and model selection hooks. Workflow
components should not call model-runtime directly. They should call Inquirium
capabilities. Inquirium may then use model-runtime to choose and invoke a
concrete runtime.
The core decision is:
Sensorium exchanges signals with the world. Inquirium performs bounded acts of model inquiry.
model-runtimeis not a user-facing organ; it is the execution substrate under Inquirium.
This preserves stratification. JSON-e Flow, Arca, role middleware, Scheduler, Monus, Semantic Index, and Sensorium can all request inference without learning provider-specific model protocols. Python model servers and vendor APIs remain implementation details behind runtime adapters, not semantic authorities inside Orbiplex.
Context and Problem Statement¶
Early Orbiplex model-integration thinking tended to treat LLM use as a possible Sensorium connector. That works for narrow cases where a model wrapper is just one finite OS action, for example a script that prepares a Whisper redaction draft or summarizes a local artifact.
It becomes semantically wrong as soon as model use needs host-level operational knowledge:
- sampling parameters such as temperature, top-p, top-k, seed, and max tokens;
- context-window limits and truncation policy;
- embedding dimensions and vector normalization;
- locality and egress policy;
- provider cost, quota, rate limits, and retry behavior;
- model profile selection;
- prompt and response retention rules;
- redaction and trace disclosure policy;
- KV/cache policy and later learning loops;
- health, lifecycle, and protocol differences between model servers.
Those are not merely connector details. They are part of the contract of an inference act. If they live inside a Sensorium connector, Sensorium becomes a large model orchestration system rather than the thin enaction stratum defined by proposal 045. If they live inside workflow definitions, every workflow learns too much about provider mechanics. If they live inside Python workers, host policy and audit become accidental.
The current Node code already contains a useful lower layer:
node:model-runtimedefines model/runtime/profile configuration contracts;node:model-runtime-httpcontains local and remote HTTP adapter logic;- the daemon contains a model runtime supervisor for lifecycle and invocation;
node:nsedefines aselect-llm-modelhook for local policy-driven model selection.
That code is a useful substrate, but its name is too low-level to be the semantic boundary seen by workflows and operators. The missing layer is the organ that turns model runtime mechanics into an Orbiplex inference contract.
Proposed Model / Decision¶
Introduce Inquirium as a node-local organ for model inquiry and inference.
Inquirium is responsible for:
- accepting bounded model inquiry requests from host-granted callers;
- validating request shape, model profile, policy, and output contract;
- applying host policy for locality, egress, retention, redaction, tracing, and resource ceilings;
- selecting a model profile or delegating model selection to NSE;
- invoking a concrete runtime through
model-runtime; - normalizing provider-specific responses into stable result contracts;
- returning synchronous results or canonical deferred operations;
- emitting audit records and redacted traces.
Inquirium is not responsible for:
- observing external reality directly;
- executing arbitrary OS actions;
- replacing Sensorium connectors;
- becoming a general workflow engine;
- giving models authority to mutate node state;
- deciding crisis activation, routing, publication, or governance outcomes on its own;
- acting as a model marketplace or model discovery service;
- acting as a retrieval/RAG system (context selection is a caller responsibility or a separate stratum; Inquirium consumes prepared context);
- acting as an agent orchestrator (a small Flow IR exists for multi-step inference, but cross-domain agency lives in workflow/Arca/role middleware);
- acting as a tool registry or tool authority (Inquirium consumes
allowed/toolsfrom policy; it does not own the tool catalog); - acting as a training platform (
inquirium.train.adaptis a bounded operation that produces artifacts; lifecycle/governance of training programs lives elsewhere).
Runtime Boundary Decision¶
Inquirium should not be implemented as one model runtime middleware, and it should not copy Sensorium's connector ontology.
The settled shape is:
Inquirium Core = semantic inference contract and policy boundary.
model-runtime = lower execution substrate for profiles, lifecycle, health, and adapters.
runtime/provider adapters = replaceable execution details.
This means Inquirium is one node organ at the capability and policy layer, but the execution layer below it is plural. A caller asks for a bounded model inquiry such as generation, classification, embedding, summarization, reranking, structured transformation, image generation, image editing, transcription, or speech synthesis. The caller does not choose or learn provider mechanics.
Sensorium and Inquirium may look similar as layered host organs, but their domains differ:
- Sensorium connectors mediate contact with the external world.
- Inquirium runtime adapters mediate execution by model providers and local model servers.
A model is not treated as a sensor or an effector merely because it is reached through an adapter. Its output is an inference artifact over supplied context, not a direct observation of the world and not an authorized action by itself.
The practical rule is:
Workflow/middleware calls Inquirium capabilities.
Inquirium validates, authorizes, selects, traces, and normalizes.
model-runtime invokes the selected runtime adapter.
Provider adapters absorb provider-specific protocols.
This keeps the trusted Inquirium core small and auditable while allowing new model families, transports, and vendors to be added without changing the workflow-facing contract.
Adapter Hosting Classification¶
An Inquirium runtime adapter may be middleware in the implementation and hosting sense, but it is not generic middleware in the semantic sense. The precise classification has two independent axes:
| Axis | Classification |
|---|---|
| Implementation / hosting | An adapter may be hosted through a middleware executor such as command_stdio, local_http_json, supervised http_local_json, or an in-process handler. |
| Semantic role / authority | The same component is an Inquirium runtime adapter: a narrow execution translator below Inquirium Core and model-runtime. |
The preferred term is middleware-hosted runtime adapter. This avoids saying
that "an Inquirium adapter is middleware" as an identity claim. It means the
adapter may use the middleware hosting fabric, init/report, health, lifecycle,
trace, and executor contracts, while its domain authority remains constrained by
Inquirium Core and model-runtime.
A middleware-hosted runtime adapter may report status, health, conformance,
supported protocol family, and adapter manifest data. It must not claim arbitrary
workflow hooks, local routes, peer dispatch surfaces, or host capabilities merely
because it is implemented as middleware. Its normal conversation partners are
Inquirium Core, model-runtime, provider workers, and explicit host capability
surfaces for scoped leases, artifacts, status, and diagnostics.
Adapter, Runtime, and Model Cardinality¶
Inquirium should not multiply adapter code merely to run multiple models through the same protocol. Reuse belongs at the adapter implementation and adapter instance layers; routing, audit, conformance, and policy belong at the runtime candidate layer.
The cardinality rule is:
one adapter implementation -> many adapter instances
one adapter instance -> many runtime candidates
one runtime candidate -> one selected model binding and policy bundle
An adapter implementation is the code for a protocol family, such as a local HTTP model server protocol or an OpenAI-compatible HTTP protocol. An adapter instance is a configured use of that implementation: base URL, auth reference, egress class, shared HTTP client, process supervision, queues, health probes, and rate limits. A runtime candidate is the host-visible routable option: adapter instance plus model binding plus operation support, default parameters, resource policy, trace policy, retention policy, and conformance state.
This means a single local-model-server adapter instance may expose two runtime candidates for two loaded or loadable models. A single OpenAI-compatible adapter implementation may back several adapter instances when egress, credentials, trust boundary, rate limits, or failure domain differ; each instance may then expose several runtime candidates for different remote models. Creating one adapter implementation per model is justified only when the model requires a different protocol, isolation boundary, security policy, or response normalization contract.
Adapters may implement the mechanics of loading, unloading, spawning, pooling, or
queueing model workers. The authority to materialize a routable runtime remains
with the host and model-runtime. When a spawn or load operation succeeds, the
result should become an explicit runtime/ref or runtime.instance/ref with
health, model identity, policy, and conformance visible to Inquirium.
Relation to Agent-Orchestrator Layering¶
Many agent orchestrators converge on a pragmatic split between provider, model, execution runtime, and channel. Inquirium should preserve the useful part of that pattern while making the strata more explicit: provider-facing concerns belong to adapter implementations and adapter instances, model identity belongs to model bindings, host-visible routing belongs to runtime candidates, and conversation ingress/egress belongs outside Inquirium unless explicitly exposed through host capabilities.
The important difference is authority. In an agent orchestrator, the execution runtime often owns a prepared model loop, tool-call handling, and turn state. In Orbiplex, Inquirium is not the agent loop. Inquirium owns bounded inquiry semantics, policy enforcement, runtime selection, normalization, tracing, and auditing. Flow, Arca, Sensorium, and the host own broader orchestration. This keeps inference execution reusable without letting a model adapter become an implicit workflow engine.
Control Plane and Data Plane¶
Inquirium should be stricter than a raw provider API, but it does not need to proxy every byte of model input through the host.
The rule is:
Host/Inquirium control plane = mandatory.
Host/Inquirium data plane = optional.
For ordinary bounded inference, passing compact request material through Inquirium is acceptable and often useful. For large local data operations it is the wrong abstraction. Post-training, fine-tuning, batch embedding, large-scale reranking, vision/audio processing, and local dataset transforms may need direct runtime access to data already present on disk or in a local object store.
That direct access must still be host-authorized. The host grants scoped data handles, not ambient authority. A worker may read samples directly from an approved dataset path, content-addressed artifact set, local object store prefix, or query handle, but only under an explicit lease and runtime policy.
The intended shape is:
Caller -> Inquirium:
request operation, purpose, profile, dataset/artifact refs, output contract
Inquirium/host:
authorize capability
classify data access
issue scoped read/write leases
choose runtime/profile
enforce sandbox, egress, resource, trace, and retention policy
record manifest/provenance/status
Runtime worker:
read granted inputs directly
write checkpoints/adapters/embeddings/metrics/artifacts to granted outputs
return bounded status and result refs
This is different from bypassing the host. The host remains the authority for who may perform which operation, against which data, under which locality, egress, budget, sandbox, retention, and audit policy. What changes is only the data path: large samples, tensors, media files, and dataset shards need not move through the host process as payloads.
The hard constraints are:
- no ambient filesystem access;
- no ambient network access;
- no worker-selected provider or model profile;
- no untracked dataset reads;
- no raw sample, prompt, or response logging by default;
- no mutation of Memarium, Agora, identity, publication, routing, or governance state by the worker.
The preferred primitives are:
- dataset handles;
- artifact refs;
- content-addressed manifests;
- scoped path or object-store leases;
- sandbox profiles;
- egress classes;
- resource budgets;
- output artifact refs;
- metadata-only audit by default.
For remote post-training or remote batch processing, the same direct data-plane pattern may be allowed only under an explicit egress grant, data classification decision, destination policy, and operator or policy approval appropriate to the sensitivity class. Local trusted workers can use lighter leases, but they still must not receive ambient authority.
Provider-managed conversation state¶
Some providers retain conversation state, prompt cache, or KV cache on their
side and offer "continue this session" semantics. Inquirium treats this as a
distinct trust decision, not as a transparent optimization. A profile must
explicitly declare provider-managed-memory: allowed | denied (default
denied). Where allowed, the trace records that the provider holds
session state, and retention/egress policy applies to the fact that state
exists at the provider, not only to the local artifacts. A profile without
this declaration falls back to per-call statelessness and forbids the
runtime from using provider continuation APIs.
Layering¶
flowchart TD
subgraph Orchestration["Orchestration and Consumers"]
JSONE["JSON-e Flow"]
Role["Role middleware"]
Arca["Arca"]
Scheduler["Scheduler"]
Monus["Monus"]
Sensorium["Sensorium"]
SemanticIndex["Semantic Index"]
end
Inquirium["Inquirium Core\nmodel inquiry contract, policy, trace"]
NSE["NSE select-llm-model\nlocal policy hook"]
Runtime["model-runtime\ncatalog, lifecycle, health, adapters"]
subgraph Providers["Concrete model execution surfaces"]
LocalHTTP["local HTTP model server\nOllama / vLLM / llama.cpp / MLX"]
Stdio["command_stdio worker\nPython / CLI wrapper"]
RemoteAPI["remote HTTP API\nOpenAI-compatible / Anthropic-like / other"]
end
JSONE --> Inquirium
Role --> Inquirium
Arca --> Inquirium
Scheduler --> Inquirium
Monus --> Inquirium
Sensorium --> Inquirium
SemanticIndex --> Inquirium
Inquirium --> NSE
Inquirium --> Runtime
Runtime --> LocalHTTP
Runtime --> Stdio
Runtime --> RemoteAPI
The important dependency direction is one-way:
Sensorium may call Inquirium.
Inquirium must not depend on Sensorium.
model-runtime must not depend on Inquirium or Sensorium.
Providers must not know Orbiplex domain semantics.
Naming¶
| Term | Meaning |
|---|---|
| Inquirium | The node organ for bounded model-backed inquiry and inference. |
| Inquirium Core | The host-owned component that validates, authorizes, selects, invokes, normalizes, traces, and audits inference requests. |
| Model Runtime | The lower substrate for model execution surfaces: profiles, lifecycle, health checks, transports, provider mappings, and resource/egress policy. |
| Model Profile | A host-defined policy bundle describing desired capability, locality, cost tier, context behavior, and output class. |
| Adapter Implementation | The reusable code/package that speaks one protocol family or execution interface. |
| Adapter Instance | A configured and optionally supervised use of an adapter implementation: endpoints, credentials, process lifecycle, pools, limits, and health. |
| Runtime Adapter | The concrete protocol adapter for local HTTP, command stdio, remote HTTP API, or later transports. |
| Middleware-hosted Runtime Adapter | A runtime adapter implemented through the middleware hosting fabric while retaining the narrow Inquirium adapter role and authority boundary. |
| Runtime Candidate | A host-visible routable execution option: adapter instance plus model binding, operation support, policies, health, and conformance. |
| Runtime Instance | A materialized live process, loaded model, session, or worker instance created under host/model-runtime supervision. |
| Model Binding | The configured provider-facing model name or handle, mapped to a model/ref, optional digest/hash, defaults, and constraints. |
| Provider Worker | A concrete server, process, or API implementing model execution. It is not an Orbiplex organ and does not own policy. |
Responsibilities by Layer¶
The most common implementation drift puts decisions in the wrong layer. The following matrix is the contract:
| Decision / concern | Inquirium Core | model-runtime | Runtime adapter | Provider worker |
|---|---|---|---|---|
| Caller capability + purpose grant | owns | — | — | — |
Operation semantics (generate, embed, …) |
owns | — | — | — |
| Policy: locality, egress, retention, trace | owns | reads | applies | — |
| Profile selection (caller-requested or NSE) | owns | exposes candidates | — | — |
| Request/result schema normalization | owns | — | maps to/from provider | — |
| Audit/provenance/artifact manifest | owns | — | — | — |
| Lifecycle, supervision, health probes | calls | owns | implements | — |
| Resource ceilings, sandbox profile | enforces | owns | applies | runs under |
| Transport mapping (HTTP, stdio, …) | — | calls | owns | — |
| Provider protocol details, payload shape | — | — | owns | speaks |
| Actual inference computation | — | — | dispatches | owns |
The rule of thumb: if a decision concerns what an inference act means or who may do it, it belongs in Inquirium Core. If it concerns how a process stays alive, it belongs in model-runtime. If it concerns which bytes go over which wire, it belongs in the runtime adapter. Provider workers execute and return; they own no Orbiplex authority.
Relationship to Sensorium¶
Sensorium remains the sensorimotor contact surface with the world. Its connectors adapt external systems into observations, directives, diagnostic records, and artifacts. Sensorium can use Inquirium when it needs model-assisted classification, summarization, or interpretation of an admitted observation.
Inquirium is different:
- Sensorium answers: "What signal or effect crosses the node/world boundary?"
- Inquirium answers: "What bounded model inquiry should be performed over this context?"
This avoids making LLMs look like sensors. A model may operate on Sensorium signals, Memarium facts, Agora records, workflow context, or user-provided messages. Its domain is not contact with the world; its domain is inference over given context.
Relationship to Middleware¶
Inquirium should be exposed through host capabilities and stable request/result contracts, not as a generic supervised HTTP middleware interface.
An Inquirium adapter may nevertheless be implemented as middleware. This is an
execution-hosting choice, not a semantic promotion. The adapter remains a
runtime adapter: it translates request/result data and provider protocol details
for Inquirium and model-runtime. It does not become a role middleware, workflow
engine, Sensorium connector, generic route owner, or authority over model
selection and retention policy.
Middleware can consume Inquirium:
- JSON-e Flow may call
inquirium.generateorinquirium.classifyas a host-capability step. - Role middleware may request a model judgment but must not delegate authority to the model.
- Arca may use Inquirium for planning support, draft review, or task fulfillment evidence.
- Monus may use Inquirium to form local concern drafts from selected inputs.
- Semantic Index may use Inquirium embedding profiles.
This keeps middleware composition declarative while preventing each middleware from inventing its own provider adapter and retention policy.
Inquirium requests carry the correlation_id of the calling workflow/saga
(per the temporal storage convention in proposal 062) and propagate it into
every emitted event (request.started, runtime.selected, usage.metrics,
artifact.produced, …). This lets operators reconstruct a multi-component
saga — e.g. Whisper intake → Inquirium summarize → Artifact Delivery — by
joining on one identifier without per-component reconstruction logic.
Public Capability Surface¶
The first capability vocabulary should be small and verb-oriented:
| Capability | Purpose |
|---|---|
inquirium.generate |
Generate text or structured JSON from messages/context under a model profile. |
inquirium.classify |
Classify an input against a bounded label set or schema. |
inquirium.embed |
Produce embeddings under an embedding profile. Embedding output inherits the retention/egress class of its input — vectors are a lossy encoding of source material, not neutral numbers, and are subject to the same disclosure boundary. |
inquirium.summarize |
Produce a summary under an explicit summary contract. |
inquirium.rerank |
Rank candidate items against a query or purpose. |
inquirium.transform |
Apply a bounded structured transformation where generation is constrained by an output schema. |
inquirium.image.generate |
Produce an image artifact under an explicit image generation contract. |
inquirium.image.edit |
Produce a derived image artifact from source image/context under an explicit edit contract. |
inquirium.audio.transcribe |
Produce text or structured transcript artifacts from audio input. |
inquirium.audio.synthesize |
Produce speech/audio artifacts from text or structured speech input. |
inquirium.train.adapt |
Run bounded local or approved-remote adaptation/post-training over granted dataset handles and produce model artifacts. |
inquirium.batch.embed |
Produce embeddings for a granted dataset/artifact set without proxying every sample through the host. |
inquirium.runtime.status |
Operator/status surface for model profiles and runtime health. |
The capability names are intentionally not provider-specific. A caller asks for
an act of inference, not for ollama, vllm, mlx, openai, or a Python
script.
The MVP may implement only a smaller subset of this vocabulary. The important contract rule is that each operation is named by the inference act and output class, not by the provider or transport that happens to execute it.
Request Contract Shape¶
The exact schemas are future work, but the first request contract should carry these concepts:
{
"schema": "inquirium-request.v1",
"request/id": "inq:req:...",
"operation": "generate",
"profile/ref": "local-small-fast",
"purpose": "story-draft/review",
"input": {
"messages": [],
"context_refs": []
},
"constraints": {
"max/output-tokens": 512,
"temperature": 0.2,
"response/schema-ref": "story-review.v1"
},
"retention": {
"persist_prompt": false,
"persist_response": false,
"trace_level": "metadata-only"
},
"idempotency/key": "..."
}
The contract should distinguish:
- caller intent (
purpose); - model profile (
profile/ref); - operation class (
operation); - input material and references;
- inference constraints;
- retention and trace policy;
- idempotency and correlation.
Provider-specific fields may exist only behind host-owned profile/runtime configuration, not as required fields on every caller request.
Result Contract Shape¶
The result should normalize provider responses without pretending that model output is fact:
{
"schema": "inquirium-result.v1",
"request/id": "inq:req:...",
"operation": "generate",
"outcome": "completed",
"result": {
"text": "...",
"json": {}
},
"model/used": {
"model/id": "model:bielik",
"runtime/id": "runtime:ollama-bielik-local",
"profile/ref": "local-small-fast"
},
"usage": {
"input/tokens": 1200,
"output/tokens": 240
},
"trace": {
"trace/ref": "trace:...",
"redaction/profile": "metadata-only"
}
}
For longer operations Inquirium should return deferred-operation.v1 and later
complete with deferred-operation-status.v1, reusing proposal 055 rather than
inventing a second async contract.
Model Selection¶
Model selection is host policy, not caller authority. A caller may request a profile or capability class. Inquirium may:
- accept the requested profile;
- use NSE
select-llm-modelto choose among healthy candidates; - defer because no safe runtime is currently ready;
- reject because the caller, purpose, locality, egress, cost, or retention policy does not permit the requested operation.
The model selector should see redacted request metadata and candidate runtime metadata. It should not receive full prompt bodies unless explicitly allowed by the relevant trace/retention policy.
Model identity is content-addressed, not name-addressed. A profile may
declare allowed-model-hashes[] and/or disallowed-model-hashes[]; a
runtime whose currently-loaded model reports a hash outside the allowed
set produces a terminal model-hash-denied outcome, never a silent
substitution. This is the explicit defense against the "same model id,
different weights" failure mode where a provider rotates a model under a
stable name. Inquirium does not assume that model/id alone identifies
the artifact.
Runtime Execution¶
model-runtime should remain the lower layer that knows how to use concrete
execution surfaces:
- local HTTP model servers such as Ollama, vLLM, llama.cpp server, MLX server, or custom local APIs;
command_stdioworkers, including Python wrappers around model libraries;- remote HTTP APIs such as OpenAI-compatible or Anthropic-like endpoints;
- diffusion, vision, audio, embedding, and reranking servers exposed through local or remote runtime adapters;
- future runtime kinds such as GPU pools, edge devices, or federated model providers.
This lower layer may supervise processes, probe health, map requests and responses, enforce resource ceilings, and apply egress restrictions. It should not know whether the caller is Arca, Sensorium, Monus, or a role middleware.
Python and Model Libraries¶
The proposal explicitly allows model execution to live in Python, external servers, or vendor APIs. Rust is not expected to link directly to every model library.
The rule is:
Python worker = execution detail.
Runtime adapter = host-owned protocol boundary.
Inquirium = inference contract and policy boundary.
Workflow = consumer of inquiry results.
Python workers should receive normalized model-runtime requests and return bounded model-runtime responses. They should not receive ambient authority over Orbiplex host capabilities, local identity, publication, routing, or storage.
When a worker needs large local inputs, it may receive host-issued data leases and artifact handles instead of inline samples. This keeps Python, CLI, or external model tooling useful for post-training and batch jobs without turning those workers into host policy authorities.
Authority Boundary¶
Model output is never authority by itself. Inquirium may produce:
- candidate text;
- candidate structured JSON;
- classification scores;
- embeddings;
- summaries;
- recommendations;
- routing suggestions;
- crisis-signal evidence.
The host, workflow, policy engine, or human operator must still decide whether to act on that output. For example:
- a model may recommend a route, but Artifact Delivery policy decides whether the route is allowed;
- a model may classify a crisis-related signal, but crisis activation policy decides the operational mode;
- a model may summarize audit records, but the audit ledger remains the source of truth;
- a model may draft a Whisper redaction, but publication remains gated by the existing Whisper/Sensorium/operator path.
This is the same architectural rule as with Sensorium: the organ produces bounded evidence or effects, not unilateral governance.
Trade-offs¶
Benefits¶
- Keeps Sensorium focused on local enaction and signal exchange.
- Gives model-backed inference a proper domain boundary.
- Reuses existing
model-runtimeand NSE work without exposing it as a workflow-facing surface. - Prevents workflow definitions from learning provider-specific protocols.
- Allows Python model libraries without putting Python in charge of host policy.
- Allows LLMs, embedding models, rerankers, diffusion models, vision models, and audio models to share one host policy boundary without forcing one provider API or one runtime shape.
- Allows large local model operations to use direct, leased data paths without forcing the host to proxy every sample, tensor, image, audio file, or dataset shard.
- Makes sampling, context, retention, trace, egress, and resource limits explicit parts of an inference contract.
- Creates a natural home for future model lifecycle concerns such as KV cache, prompt retention, model provenance, evaluation, and training loops.
Costs¶
- Introduces a new named organ and therefore another concept operators must learn.
- Requires clear documentation so Inquirium does not become a synonym for "anything AI".
- Requires request/result schemas and capability grants before it should be used by general middleware.
- May overlap with simple Sensorium OS script wrappers during migration.
Migration Impact¶
Existing Sensorium OS scripts that call a local model can remain valid as finite, bounded OS actions. They should be treated as compatibility or bootstrapping paths, not as the long-term model interface.
The desired migration is:
JSON-e Flow -> sensorium.directive.invoke -> sensorium-os script -> model
to:
JSON-e Flow -> inquirium.generate/classify/embed -> model-runtime -> model
Sensorium may still call Inquirium internally when a Sensorium action needs model assistance, but the model domain no longer lives inside Sensorium.
Failure Modes and Mitigations¶
| Failure mode | Mitigation |
|---|---|
| Inquirium becomes a general AI agent with ambient authority. | Capabilities are operation-specific; model output is evidence, not authority; actions remain in workflow/policy/Sensorium/AD layers. |
| Inquirium becomes a monolithic runtime middleware. | Keep Inquirium as the semantic contract and policy layer; keep provider lifecycle, health, and invocation in model-runtime adapters. |
| A runtime adapter gains broad middleware authority because it is middleware-hosted. | Treat hosting as a separate axis from semantic authority; allow only adapter manifest, status, health, conformance, provider invocation, and explicitly granted lease/artifact/status capabilities. |
| Multiple models hide behind one runtime candidate. | Keep one host-visible runtime candidate per model binding and policy bundle, even when they share one adapter instance and one server process. |
| Inquirium copies Sensorium connectors and treats models as sensors/effectors. | Document the ontology split: Sensorium mediates world contact; Inquirium mediates bounded inference over supplied context. |
| Provider-specific parameters leak into every caller request. | Keep provider details in model profiles and runtime configs; expose only stable inference constraints at the request boundary. |
| Direct data-plane operations bypass host policy. | Require host-issued dataset/artifact leases, sandbox profiles, egress classes, resource budgets, manifests, provenance, and metadata-only audit records. |
| Post-training workers gain ambient local filesystem or network access. | Grant scoped paths/object-store prefixes/query handles only; fail closed when a lease, sandbox profile, or egress policy is missing. |
| Sensorium and Inquirium both offer model actions. | Mark Sensorium model wrappers as finite OS compatibility actions; document Inquirium as the canonical model inquiry surface. |
| Python workers bypass host policy. | Workers run behind model-runtime adapters and receive only normalized requests; no direct host capability authority. |
| Prompts or outputs leak into logs/traces. | Inquirium owns trace and retention policy; default to metadata-only traces and explicit opt-in persistence. |
| Models are treated as truth. | Result contracts label outputs as candidates, scores, or generated artifacts; downstream components decide acceptance. |
| Runtime health and model selection become hidden magic. | Expose inquirium.runtime.status, model profile diagnostics, selected runtime metadata, and NSE trace summaries. |
| Long model calls block workflow threads. | Reuse deferred-operation.v1 and bounded host poll/resume paths. |
Open Questions¶
- Should the first Inquirium implementation be in-process Rust in the daemon, a
supervised local middleware module, or a Rust organ embedded similarly to
sensorium-core? - What is the minimal schema set for MVP: only
inquirium-request.v1andinquirium-result.v1, or separate operation-specific schemas for generate, classify, embed, summarize, and rerank? - Should
profile/refbe mandatory for all callers, or may callers request an operation class and let host policy choose the profile entirely? - Which model parameters are stable request-level constraints and which belong only to host-owned runtime/profile configuration?
- How should Inquirium expose prompt/context redaction failures: hard reject, degraded request, or deferred operator remediation?
- Should embeddings be stored only by consuming components such as Semantic Index, or should Inquirium have a local embedding cache?
- How much of NSE
select-llm-modelinput may include prompt-derived metadata under default privacy policy? - What is the first practical consumer: Semantic Index embeddings, Whisper redaction, story-role generation, Monus summaries, or operator diagnostics?
- What is the minimal lease schema for direct data-plane operations: scoped paths, object-store prefixes, artifact refs, query handles, or all of them?
- Which operation classes may use direct data-plane leases in MVP: batch embeddings, local post-training, large media transforms, or only one pilot?
- Which sensitivity-class taxonomy does Inquirium apply at the data boundary, and where does that taxonomy live? Inquirium references "sensitivity class", "data classification", and "retention profile" repeatedly, but the canonical enumeration is a cross-cutting concern (Memarium spaces, Pseudonym Vault classes, Whisper privacy levels). The open decision is whether Inquirium imports an existing taxonomy by reference, or whether a new node-wide data-classification proposal becomes a prerequisite.
Next Actions¶
- Define the canonical Inquirium capability vocabulary and decide which operation classes are MVP.
- Add schemas for
inquirium-request.v1andinquirium-result.v1, or split them into operation-specific schemas if the contracts diverge. - Refactor documentation around model use so Sensorium OS model wrappers are described as compatibility actions, not the canonical LLM boundary.
- Define a small
InquiriumCoreimplementation surface that wraps the existing daemon model-runtime supervisor without exposing provider details to callers. - Add host capability authorization for the first Inquirium operation.
- Connect NSE
select-llm-modelas the optional model selection policy hook. - Add operator diagnostics for configured profiles, candidate runtimes, health, and last selection decisions.
- Migrate one existing bounded model-adjacent flow from Sensorium OS script invocation to Inquirium.
Tracking¶
| ID | Work item | Status | Notes |
|---|---|---|---|
| P063-01 | Establish Inquirium as a separate model inquiry organ | accepted | This proposal defines the boundary and is now accepted for implementation planning. |
| P063-02 | Reframe model-runtime as Inquirium substrate |
accepted | Existing Node crates are useful lower layers, but should not be workflow-facing. |
| P063-03 | Define Inquirium capability vocabulary | todo | Start with generate, classify, embed, summarize, rerank, transform, and runtime status. |
| P063-04 | Define request/result schemas | todo | Decide between one generic contract and operation-specific schemas. |
| P063-05 | Implement Inquirium Core wrapper over model-runtime | todo | Should preserve model-runtime independence from Inquirium and Sensorium. |
| P063-06 | Integrate NSE model selection | todo | Use select-llm-model as optional policy, not as domain truth. |
| P063-07 | Add host capability gate and audit | todo | Inference authority must be granted explicitly; model output is not authority. |
| P063-08 | Migrate one existing model-adjacent path | todo | Candidate paths: Whisper redaction, Semantic Index embedding, story-role generation, or Monus summary. |