Bounded Local Server Runtime¶
Status: Implemented (MVP) Based on: internal node code audit finding A01 Date: 2026-04-29
Executive Summary¶
A shared sync TCP/HTTP mini-server primitive (bounded-server crate) replaces
ad-hoc thread::spawn accept loops in the daemon health endpoint, Seed
Directory, Agora service, and daemon peer listener. Python supervised
middleware uses the analogous bounded_http.py helper around the stdlib
ThreadingHTTPServer. Every server surface governed by these primitives gets
bounded accepts or active-request permits, fast overload rejection, and
first-class shutdown drain or daemon-supervised shutdown with metrics.
Context and Problem Statement¶
Before this solution, four server surfaces used unbounded or separately bounded
thread::spawn accept/session loops without a shared resource contract:
- Daemon health endpoint (
daemon/src/endpoint_routes.rs) - Seed Directory (
seed-directory/src/lib.rs) - Agora service (
agora-service/src/main.rs) - Daemon peer listener (
daemon/src/peer_supervisor.rs) - Bundled Python middleware (
middleware-modules/*/service.py)
A small number of slow or malicious clients could exhaust the process thread limit, memory, or both. Shutdown was not coordinated — active handlers could continue running after the accept loop exited.
Proposed Model¶
A single bounded-server crate providing:
BoundedServerConfig— resource contract:max_connections, per-socket read/write timeouts, handler timeout, shutdown grace period, backoff intervals.BoundedServerHandle— running server:local_addr,stop(),wait()for externally owned stop tokens,stop_token(), andmetrics()snapshot.ConnectionHandlertrait — single-method handler interface. Each accepted connection acquires a permit; the permit is released automatically when the handler returns.- Overload behaviour: when all permits are held, new connections receive a fast HTTP 503 before the TCP socket is closed. No thread is spawned for rejected connections.
- Shutdown: the stop signal closes the accept loop. Workers poll a shared stop flag on a 250ms interval and exit when the receiver disconnects. The server joins worker threads and returns final metrics. This sync primitive does not preempt a handler thread; handlers must enforce read/write deadlines and return.
- Thread pool: a fixed-size pool of
max_connectionsworker threads pulls from a boundedsync_channel. The channel depth equalsmax_connections, doubling as an overload signal whentry_sendfails. - Metrics:
active_connections,accepted_total,rejected_over_capacity_total,handler_errors_total,shutdown_drain_ms,last_accept_error,last_handler_error. - Daemon resource pressure surface:
/v1/runtime-metricsexposes aresource_pressureobject with OS parallelism, best-effort process thread count where available, supervised middleware HTTP totals, and scheduler active/queued/registered job counts. Node UI mirrors that summary on the Status page.
For Python middleware, BoundedThreadingHTTPServer keeps the familiar stdlib
BaseHTTPRequestHandler programming model but gates active requests with
ORBIPLEX_MIDDLEWARE_MAX_CONNECTIONS (default: 32). When no permit is
available it writes a minimal HTTP 503 response and closes the socket without
starting another handler thread.
Production Invariant¶
No production local server path may create one unbounded thread per accepted
connection or request. A sync server must either use the Rust bounded-server
primitive, the Python BoundedThreadingHTTPServer helper, or an equivalent
bounded executor with explicit overload behavior. Reload and restart paths must
use cooperative bounded drain. If a supervised process cannot be stopped within
its configured stop/kill budget, the daemon/operator surface must report the
remaining process rather than hiding it as a successful shutdown.
Trade-offs¶
| Benefit | Risk / Constraint |
|---|---|
| Single contract for all sync mini-servers | Thread-per-connection still used; bounded but not async |
| No new dependency (pure std) | sync_channel + Arc<Mutex<Receiver>> adds contention at high scale |
| Fast 503 rejection before thread spawn | 503 body may not reach client if TCP buffers are full |
| First-class shutdown drain | Worker exit latency up to 250ms while idle; active handlers must return cooperatively |
Simple ConnectionHandler trait |
Does not support streaming responses natively |
Failure Modes and Mitigations¶
- Channel disconnect: workers notice
RecvTimeoutError::Disconnectedand exit gracefully. The server join thread waits for all workers. - Worker panic: handler panics are caught at the per-connection boundary,
counted as
handler_errors_total, stored aslast_handler_error, and the worker continues serving later connections. - Accept error: logged to
last_accept_errormetric; accept loop sleeps foraccept_error_backoffand retries. - Slow handler during shutdown: the server does not preempt handlers.
It waits for them to finish. The
shutdown_gracevalue is the intended operational budget and is reflected in metrics, but handler code remains responsible for honoring socket deadlines and returning.
Deferred Questions¶
- Should a later async runtime add a handler deadline that the server enforces (e.g., via
set_read_timeoutalready applied, but no preemption)? - Should the peer supervisor's session worker lifecycle also share a richer cooperative shutdown token, or is the listener-level bounded gate sufficient for M23-M25/A01 closure?
Next Actions¶
- [x] Implement
bounded-servercrate with contract tests. - [x] Migrate daemon health endpoint.
- [x] Migrate Seed Directory server.
- [x] Migrate Agora service.
- [x] Migrate daemon peer listener.
- [x] Migrate bundled Python middleware to
BoundedThreadingHTTPServer. - [x] Expose daemon resource pressure in runtime metrics and Node UI status.
- [x] Add bounded Python HTTP helper test for fast 503 rejection.
- [ ] Add integration test that verifies 503 under load in a real daemon context.