Bounded Local Server Runtime¶

Status: Implemented (MVP) Based on: internal node code audit finding A01 Date: 2026-04-29

Executive Summary¶

A shared sync TCP/HTTP mini-server primitive (bounded-server crate) replaces ad-hoc thread::spawn accept loops in the daemon health endpoint, Seed Directory, Agora service, and daemon peer listener. Python supervised middleware uses the analogous bounded_http.py helper around the stdlib ThreadingHTTPServer. Every server surface governed by these primitives gets bounded accepts or active-request permits, fast overload rejection, and first-class shutdown drain or daemon-supervised shutdown with metrics. The daemon health endpoint also exposes a local health_server.max_connections configuration knob so integration tests and constrained deployments can verify the same overload behavior used in production.

Context and Problem Statement¶

Before this solution, four server surfaces used unbounded or separately bounded thread::spawn accept/session loops without a shared resource contract:

Daemon health endpoint (daemon/src/endpoint_routes.rs)
Seed Directory (seed-directory/src/lib.rs)
Agora service (agora-service/src/main.rs)
Daemon peer listener (daemon/src/peer_supervisor.rs)
Bundled Python middleware (middleware-modules/*/service.py)

A small number of slow or malicious clients could exhaust the process thread limit, memory, or both. Shutdown was not coordinated — active handlers could continue running after the accept loop exited.

Proposed Model¶

A single bounded-server crate providing:

BoundedServerConfig — resource contract: max_connections, per-socket read/write timeouts, handler timeout, shutdown grace period, backoff intervals.
BoundedServerHandle — running server: local_addr, stop(), wait() for externally owned stop tokens, stop_token(), and metrics() snapshot.
ConnectionHandler trait — single-method handler interface. Each accepted connection acquires a permit; the permit is released automatically when the handler returns.
Overload behaviour: when all permits are held, new connections receive a fast HTTP 503 before the TCP socket is closed. No thread is spawned for rejected connections.
Shutdown: the stop signal closes the accept loop. Workers poll a shared stop flag on a 250ms interval and exit when the receiver disconnects. The server joins worker threads and returns final metrics. This sync primitive does not preempt a handler thread; handlers must enforce read/write deadlines and return.
Thread pool: a fixed-size pool of max_connections worker threads pulls from a bounded sync_channel. The channel depth equals max_connections, doubling as an overload signal when try_send fails.
Metrics: active_connections, accepted_total, rejected_over_capacity_total, handler_errors_total, shutdown_drain_ms, last_accept_error, last_handler_error.
Daemon resource pressure surface: /v1/runtime-metrics exposes a resource_pressure object with OS parallelism, best-effort process thread count where available, supervised middleware HTTP totals, and scheduler active/queued/registered job counts. Node UI mirrors that summary on the Status page.

For Python middleware, BoundedThreadingHTTPServer keeps the familiar stdlib BaseHTTPRequestHandler programming model but gates active requests with ORBIPLEX_MIDDLEWARE_MAX_CONNECTIONS (default: 32). When no permit is available it writes a minimal HTTP 503 response and closes the socket without starting another handler thread.

Production Invariant¶

No production local server path may create one unbounded thread per accepted connection or request. A sync server must either use the Rust bounded-server primitive, the Python BoundedThreadingHTTPServer helper, or an equivalent bounded executor with explicit overload behavior. Reload and restart paths must use cooperative bounded drain. If a supervised process cannot be stopped within its configured stop/kill budget, the daemon/operator surface must report the remaining process rather than hiding it as a successful shutdown.

Trade-offs¶

Benefit	Risk / Constraint
Single contract for all sync mini-servers	Thread-per-connection still used; bounded but not async
No new dependency (pure std)	`sync_channel` + `Arc<Mutex<Receiver>>` adds contention at high scale
Fast 503 rejection before thread spawn	503 body may not reach client if TCP buffers are full
First-class shutdown drain	Worker exit latency up to 250ms while idle; active handlers must return cooperatively
Simple `ConnectionHandler` trait	Does not support streaming responses natively

Failure Modes and Mitigations¶

Channel disconnect: workers notice RecvTimeoutError::Disconnected and exit gracefully. The server join thread waits for all workers.
Worker panic: handler panics are caught at the per-connection boundary, counted as handler_errors_total, stored as last_handler_error, and the worker continues serving later connections.
Accept error: logged to last_accept_error metric; accept loop sleeps for accept_error_backoff and retries.
Slow handler during shutdown: the server does not preempt handlers. It waits for them to finish. The shutdown_grace value is the intended operational budget and is reflected in metrics, but handler code remains responsible for honoring socket deadlines and returning.

Deferred Questions¶

Should a later async runtime add a handler deadline that the server enforces (e.g., via set_read_timeout already applied, but no preemption)?
Should the peer supervisor's session worker lifecycle also share a richer cooperative shutdown token, or is the listener-level bounded gate sufficient for M23-M25/A01 closure?

Next Actions¶

[x] Implement bounded-server crate with contract tests.
[x] Migrate daemon health endpoint.
[x] Migrate Seed Directory server.
[x] Migrate Agora service.
[x] Migrate daemon peer listener.
[x] Migrate bundled Python middleware to BoundedThreadingHTTPServer.
[x] Expose daemon resource pressure in runtime metrics and Node UI status.
[x] Add bounded Python HTTP helper test for fast 503 rejection.
[x] Add integration test that verifies 503 under load in a real daemon context.
[ ] Post-MVP: keep an explicit audit inventory for any production local listener that is not backed by bounded-server, BoundedThreadingHTTPServer, or a documented equivalent bounded adapter.