# Clawnet refactor (protocol + auth unification)
# Hi
Hi Peter — great direction; this unlocks simpler UX + stronger security.
# Purpose
Single, rigorous document for:
- Current state: protocols, flows, trust boundaries.
- Pain points: approvals, multi‑hop routing, UI duplication.
- Proposed new state: one protocol, scoped roles, unified auth/pairing, TLS pinning.
- Identity model: stable IDs + cute slugs.
- Migration plan, risks, open questions.
# Goals (from discussion)
- One protocol for all clients (mac app, CLI, iOS, Android, headless node).
- Every network participant authenticated + paired.
- Role clarity: nodes vs operators.
- Central approvals routed to where the user is.
- TLS encryption + optional pinning for all remote traffic.
- Minimal code duplication.
- Single machine should appear once (no UI/node duplicate entry).
# Non‑goals (explicit)
- Remove capability separation (still need least‑privilege).
- Expose full gateway control plane without scope checks.
- Make auth depend on human labels (slugs remain non‑security).
# Current state (as‑is)
# Two protocols
# 1) Gateway WebSocket (control plane)
- Full API surface: config, channels, models, sessions, agent runs, logs, nodes, etc.
- Default bind: loopback. Remote access via SSH/Tailscale.
- Auth: token/password via
connect. - No TLS pinning (relies on loopback/tunnel).
- Code:
src/gateway/server/ws-connection/message-handler.tssrc/gateway/client.tsdocs/gateway/protocol.md
# 2) Bridge (node transport)
- Narrow allowlist surface, node identity + pairing.
- JSONL over TCP; optional TLS + cert fingerprint pinning.
- TLS advertises fingerprint in discovery TXT.
- Code:
src/infra/bridge/server/connection.tssrc/gateway/server-bridge.tssrc/node-host/bridge-client.tsdocs/gateway/bridge-protocol.md
# Control plane clients today
- CLI → Gateway WS via
callGateway(src/gateway/call.ts). - macOS app UI → Gateway WS (
GatewayConnection). - Web Control UI → Gateway WS.
- ACP → Gateway WS.
- Browser control uses its own HTTP control server.
# Nodes today
- macOS app in node mode connects to Gateway bridge (
MacNodeBridgeSession). - iOS/Android apps connect to Gateway bridge.
- Pairing + per‑node token stored on gateway.
# Current approval flow (exec)
- Agent uses
system.runvia Gateway. - Gateway invokes node over bridge.
- Node runtime decides approval.
- UI prompt shown by mac app (when node == mac app).
- Node returns
invoke-resto Gateway. - Multi‑hop, UI tied to node host.
# Presence + identity today
- Gateway presence entries from WS clients.
- Node presence entries from bridge.
- mac app can show two entries for same machine (UI + node).
- Node identity stored in pairing store; UI identity separate.
# Problems / pain points
- Two protocol stacks to maintain (WS + Bridge).
- Approvals on remote nodes: prompt appears on node host, not where user is.
- TLS pinning only exists for bridge; WS depends on SSH/Tailscale.
- Identity duplication: same machine shows as multiple instances.
- Ambiguous roles: UI + node + CLI capabilities not clearly separated.
# Proposed new state (Clawnet)
# One protocol, two roles
Single WS protocol with role + scope.
- Role: node (capability host)
- Role: operator (control plane)
- Optional scope for operator:
operator.read(status + viewing)operator.write(agent run, sends)operator.admin(config, channels, models)
# Role behaviors
Node
- Can register capabilities (
caps,commands, permissions). - Can receive
invokecommands (system.run,camera.*,canvas.*,screen.record, etc). - Can send events:
voice.transcript,agent.request,chat.subscribe. - Cannot call config/models/channels/sessions/agent control plane APIs.
Operator
- Full control plane API, gated by scope.
- Receives all approvals.
- Does not directly execute OS actions; routes to nodes.
# Key rule
Role is per‑connection, not per device. A device may open both roles, separately.
# Unified authentication + pairing
# Client identity
Every client provides:
deviceId(stable, derived from device key).displayName(human name).role+scope+caps+commands.
# Pairing flow (unified)
- Client connects unauthenticated.
- Gateway creates a pairing request for that
deviceId. - Operator receives prompt; approves/denies.
- Gateway issues credentials bound to:
- device public key
- role(s)
- scope(s)
- capabilities/commands
- Client persists token, reconnects authenticated.
# Device‑bound auth (avoid bearer token replay)
Preferred: device keypairs.
- Device generates keypair once.
deviceId = fingerprint(publicKey).- Gateway sends nonce; device signs; gateway verifies.
- Tokens are issued to a public key (proof‑of‑possession), not a string.
Alternatives:
- mTLS (client certs): strongest, more ops complexity.
- Short‑lived bearer tokens only as a temporary phase (rotate + revoke early).
# Silent approval (SSH heuristic)
Define it precisely to avoid a weak link. Prefer one:
- Local‑only: auto‑pair when client connects via loopback/Unix socket.
- Challenge via SSH: gateway issues nonce; client proves SSH by fetching it.
- Physical presence window: after a local approval on gateway host UI, allow auto‑pair for a short window (e.g. 10 minutes).
Always log + record auto‑approvals.
# TLS everywhere (dev + prod)
# Reuse existing bridge TLS
Use current TLS runtime + fingerprint pinning:
src/infra/bridge/server/tls.ts- fingerprint verification logic in
src/node-host/bridge-client.ts
# Apply to WS
- WS server supports TLS with same cert/key + fingerprint.
- WS clients can pin fingerprint (optional).
- Discovery advertises TLS + fingerprint for all endpoints.
- Discovery is locator hints only; never a trust anchor.
# Why
- Reduce reliance on SSH/Tailscale for confidentiality.
- Make remote mobile connections safe by default.
# Approvals redesign (centralized)
# Current
Approval happens on node host (mac app node runtime). Prompt appears where node runs.
# Proposed
Approval is gateway‑hosted, UI delivered to operator clients.
# New flow
- Gateway receives
system.runintent (agent). - Gateway creates approval record:
approval.requested. - Operator UI(s) show prompt.
- Approval decision sent to gateway:
approval.resolve. - Gateway invokes node command if approved.
- Node executes, returns
invoke-res.
# Approval semantics (hardening)
- Broadcast to all operators; only the active UI shows a modal (others get a toast).
- First resolution wins; gateway rejects subsequent resolves as already settled.
- Default timeout: deny after N seconds (e.g. 60s), log reason.
- Resolution requires
operator.approvalsscope.
# Benefits
- Prompt appears where user is (mac/phone).
- Consistent approvals for remote nodes.
- Node runtime stays headless; no UI dependency.
# Role clarity examples
# iPhone app
- Node role for: mic, camera, voice chat, location, push‑to‑talk.
- Optional operator.read for status and chat view.
- Optional operator.write/admin only when explicitly enabled.
# macOS app
- Operator role by default (control UI).
- Node role when “Mac node” enabled (system.run, screen, camera).
- Same deviceId for both connections → merged UI entry.
# CLI
- Operator role always.
- Scope derived by subcommand:
status,logs→ readagent,message→ writeconfig,channels→ admin- approvals + pairing →
operator.approvals/operator.pairing
# Identity + slugs
# Stable ID
Required for auth; never changes. Preferred:
- Keypair fingerprint (public key hash).
# Cute slug (lobster‑themed)
Human label only.
- Example:
scarlet-claw,saltwave,mantis-pinch. - Stored in gateway registry, editable.
- Collision handling:
-2,-3.
# UI grouping
Same deviceId across roles → single “Instance” row:
- Badge:
operator,node. - Shows capabilities + last seen.
# Migration strategy
# Phase 0: Document + align
- Publish this doc.
- Inventory all protocol calls + approval flows.
# Phase 1: Add roles/scopes to WS
- Extend
connectparams withrole,scope,deviceId. - Add allowlist gating for node role.
# Phase 2: Bridge compatibility
- Keep bridge running.
- Add WS node support in parallel.
- Gate features behind config flag.
# Phase 3: Central approvals
- Add approval request + resolve events in WS.
- Update mac app UI to prompt + respond.
- Node runtime stops prompting UI.
# Phase 4: TLS unification
- Add TLS config for WS using bridge TLS runtime.
- Add pinning to clients.
# Phase 5: Deprecate bridge
- Migrate iOS/Android/mac node to WS.
- Keep bridge as fallback; remove once stable.
# Phase 6: Device‑bound auth
- Require key‑based identity for all non‑local connections.
- Add revocation + rotation UI.
# Security notes
- Role/allowlist enforced at gateway boundary.
- No client gets “full” API without operator scope.
- Pairing required for all connections.
- TLS + pinning reduces MITM risk for mobile.
- SSH silent approval is a convenience; still recorded + revocable.
- Discovery is never a trust anchor.
- Capability claims are verified against server allowlists by platform/type.
# Streaming + large payloads (node media)
WS control plane is fine for small messages, but nodes also do:
- camera clips
- screen recordings
- audio streams
Options:
- WS binary frames + chunking + backpressure rules.
- Separate streaming endpoint (still TLS + auth).
- Keep bridge longer for media‑heavy commands, migrate last.
Pick one before implementation to avoid drift.
# Capability + command policy
- Node‑reported caps/commands are treated as claims.
- Gateway enforces per‑platform allowlists.
- Any new command requires operator approval or explicit allowlist change.
- Audit changes with timestamps.
# Audit + rate limiting
- Log: pairing requests, approvals/denials, token issuance/rotation/revocation.
- Rate‑limit pairing spam and approval prompts.
# Protocol hygiene
- Explicit protocol version + error codes.
- Reconnect rules + heartbeat policy.
- Presence TTL and last‑seen semantics.
# Open questions
Single device running both roles: token model
- Recommend separate tokens per role (node vs operator).
- Same deviceId; different scopes; clearer revocation.
Operator scope granularity
- read/write/admin + approvals + pairing (minimum viable).
- Consider per‑feature scopes later.
Token rotation + revocation UX
- Auto‑rotate on role change.
- UI to revoke by deviceId + role.
Discovery
- Extend current Bonjour TXT to include WS TLS fingerprint + role hints.
- Treat as locator hints only.
Cross‑network approval
- Broadcast to all operator clients; active UI shows modal.
- First response wins; gateway enforces atomicity.
# Summary (TL;DR)
- Today: WS control plane + Bridge node transport.
- Pain: approvals + duplication + two stacks.
- Proposal: one WS protocol with explicit roles + scopes, unified pairing + TLS pinning, gateway‑hosted approvals, stable device IDs + cute slugs.
- Outcome: simpler UX, stronger security, less duplication, better mobile routing.