Skip to content

# Clawnet refactor (protocol + auth unification)

# Hi

Hi Peter — great direction; this unlocks simpler UX + stronger security.

# Purpose

Single, rigorous document for:

  • Current state: protocols, flows, trust boundaries.
  • Pain points: approvals, multi‑hop routing, UI duplication.
  • Proposed new state: one protocol, scoped roles, unified auth/pairing, TLS pinning.
  • Identity model: stable IDs + cute slugs.
  • Migration plan, risks, open questions.

# Goals (from discussion)

  • One protocol for all clients (mac app, CLI, iOS, Android, headless node).
  • Every network participant authenticated + paired.
  • Role clarity: nodes vs operators.
  • Central approvals routed to where the user is.
  • TLS encryption + optional pinning for all remote traffic.
  • Minimal code duplication.
  • Single machine should appear once (no UI/node duplicate entry).

# Non‑goals (explicit)

  • Remove capability separation (still need least‑privilege).
  • Expose full gateway control plane without scope checks.
  • Make auth depend on human labels (slugs remain non‑security).

# Current state (as‑is)

# Two protocols

# 1) Gateway WebSocket (control plane)

  • Full API surface: config, channels, models, sessions, agent runs, logs, nodes, etc.
  • Default bind: loopback. Remote access via SSH/Tailscale.
  • Auth: token/password via connect.
  • No TLS pinning (relies on loopback/tunnel).
  • Code:
    • src/gateway/server/ws-connection/message-handler.ts
    • src/gateway/client.ts
    • docs/gateway/protocol.md

# 2) Bridge (node transport)

  • Narrow allowlist surface, node identity + pairing.
  • JSONL over TCP; optional TLS + cert fingerprint pinning.
  • TLS advertises fingerprint in discovery TXT.
  • Code:
    • src/infra/bridge/server/connection.ts
    • src/gateway/server-bridge.ts
    • src/node-host/bridge-client.ts
    • docs/gateway/bridge-protocol.md

# Control plane clients today

  • CLI → Gateway WS via callGateway (src/gateway/call.ts).
  • macOS app UI → Gateway WS (GatewayConnection).
  • Web Control UI → Gateway WS.
  • ACP → Gateway WS.
  • Browser control uses its own HTTP control server.

# Nodes today

  • macOS app in node mode connects to Gateway bridge (MacNodeBridgeSession).
  • iOS/Android apps connect to Gateway bridge.
  • Pairing + per‑node token stored on gateway.

# Current approval flow (exec)

  • Agent uses system.run via Gateway.
  • Gateway invokes node over bridge.
  • Node runtime decides approval.
  • UI prompt shown by mac app (when node == mac app).
  • Node returns invoke-res to Gateway.
  • Multi‑hop, UI tied to node host.

# Presence + identity today

  • Gateway presence entries from WS clients.
  • Node presence entries from bridge.
  • mac app can show two entries for same machine (UI + node).
  • Node identity stored in pairing store; UI identity separate.

# Problems / pain points

  • Two protocol stacks to maintain (WS + Bridge).
  • Approvals on remote nodes: prompt appears on node host, not where user is.
  • TLS pinning only exists for bridge; WS depends on SSH/Tailscale.
  • Identity duplication: same machine shows as multiple instances.
  • Ambiguous roles: UI + node + CLI capabilities not clearly separated.

# Proposed new state (Clawnet)

# One protocol, two roles

Single WS protocol with role + scope.

  • Role: node (capability host)
  • Role: operator (control plane)
  • Optional scope for operator:
    • operator.read (status + viewing)
    • operator.write (agent run, sends)
    • operator.admin (config, channels, models)

# Role behaviors

Node

  • Can register capabilities (caps, commands, permissions).
  • Can receive invoke commands (system.run, camera.*, canvas.*, screen.record, etc).
  • Can send events: voice.transcript, agent.request, chat.subscribe.
  • Cannot call config/models/channels/sessions/agent control plane APIs.

Operator

  • Full control plane API, gated by scope.
  • Receives all approvals.
  • Does not directly execute OS actions; routes to nodes.

# Key rule

Role is per‑connection, not per device. A device may open both roles, separately.


# Unified authentication + pairing

# Client identity

Every client provides:

  • deviceId (stable, derived from device key).
  • displayName (human name).
  • role + scope + caps + commands.

# Pairing flow (unified)

  • Client connects unauthenticated.
  • Gateway creates a pairing request for that deviceId.
  • Operator receives prompt; approves/denies.
  • Gateway issues credentials bound to:
    • device public key
    • role(s)
    • scope(s)
    • capabilities/commands
  • Client persists token, reconnects authenticated.

# Device‑bound auth (avoid bearer token replay)

Preferred: device keypairs.

  • Device generates keypair once.
  • deviceId = fingerprint(publicKey).
  • Gateway sends nonce; device signs; gateway verifies.
  • Tokens are issued to a public key (proof‑of‑possession), not a string.

Alternatives:

  • mTLS (client certs): strongest, more ops complexity.
  • Short‑lived bearer tokens only as a temporary phase (rotate + revoke early).

# Silent approval (SSH heuristic)

Define it precisely to avoid a weak link. Prefer one:

  • Local‑only: auto‑pair when client connects via loopback/Unix socket.
  • Challenge via SSH: gateway issues nonce; client proves SSH by fetching it.
  • Physical presence window: after a local approval on gateway host UI, allow auto‑pair for a short window (e.g. 10 minutes).

Always log + record auto‑approvals.


# TLS everywhere (dev + prod)

# Reuse existing bridge TLS

Use current TLS runtime + fingerprint pinning:

  • src/infra/bridge/server/tls.ts
  • fingerprint verification logic in src/node-host/bridge-client.ts

# Apply to WS

  • WS server supports TLS with same cert/key + fingerprint.
  • WS clients can pin fingerprint (optional).
  • Discovery advertises TLS + fingerprint for all endpoints.
    • Discovery is locator hints only; never a trust anchor.

# Why

  • Reduce reliance on SSH/Tailscale for confidentiality.
  • Make remote mobile connections safe by default.

# Approvals redesign (centralized)

# Current

Approval happens on node host (mac app node runtime). Prompt appears where node runs.

# Proposed

Approval is gateway‑hosted, UI delivered to operator clients.

# New flow

  1. Gateway receives system.run intent (agent).
  2. Gateway creates approval record: approval.requested.
  3. Operator UI(s) show prompt.
  4. Approval decision sent to gateway: approval.resolve.
  5. Gateway invokes node command if approved.
  6. Node executes, returns invoke-res.

# Approval semantics (hardening)

  • Broadcast to all operators; only the active UI shows a modal (others get a toast).
  • First resolution wins; gateway rejects subsequent resolves as already settled.
  • Default timeout: deny after N seconds (e.g. 60s), log reason.
  • Resolution requires operator.approvals scope.

# Benefits

  • Prompt appears where user is (mac/phone).
  • Consistent approvals for remote nodes.
  • Node runtime stays headless; no UI dependency.

# Role clarity examples

# iPhone app

  • Node role for: mic, camera, voice chat, location, push‑to‑talk.
  • Optional operator.read for status and chat view.
  • Optional operator.write/admin only when explicitly enabled.

# macOS app

  • Operator role by default (control UI).
  • Node role when “Mac node” enabled (system.run, screen, camera).
  • Same deviceId for both connections → merged UI entry.

# CLI

  • Operator role always.
  • Scope derived by subcommand:
    • status, logs → read
    • agent, message → write
    • config, channels → admin
    • approvals + pairing → operator.approvals / operator.pairing

# Identity + slugs

# Stable ID

Required for auth; never changes. Preferred:

  • Keypair fingerprint (public key hash).

# Cute slug (lobster‑themed)

Human label only.

  • Example: scarlet-claw, saltwave, mantis-pinch.
  • Stored in gateway registry, editable.
  • Collision handling: -2, -3.

# UI grouping

Same deviceId across roles → single “Instance” row:

  • Badge: operator, node.
  • Shows capabilities + last seen.

# Migration strategy

# Phase 0: Document + align

  • Publish this doc.
  • Inventory all protocol calls + approval flows.

# Phase 1: Add roles/scopes to WS

  • Extend connect params with role, scope, deviceId.
  • Add allowlist gating for node role.

# Phase 2: Bridge compatibility

  • Keep bridge running.
  • Add WS node support in parallel.
  • Gate features behind config flag.

# Phase 3: Central approvals

  • Add approval request + resolve events in WS.
  • Update mac app UI to prompt + respond.
  • Node runtime stops prompting UI.

# Phase 4: TLS unification

  • Add TLS config for WS using bridge TLS runtime.
  • Add pinning to clients.

# Phase 5: Deprecate bridge

  • Migrate iOS/Android/mac node to WS.
  • Keep bridge as fallback; remove once stable.

# Phase 6: Device‑bound auth

  • Require key‑based identity for all non‑local connections.
  • Add revocation + rotation UI.

# Security notes

  • Role/allowlist enforced at gateway boundary.
  • No client gets “full” API without operator scope.
  • Pairing required for all connections.
  • TLS + pinning reduces MITM risk for mobile.
  • SSH silent approval is a convenience; still recorded + revocable.
  • Discovery is never a trust anchor.
  • Capability claims are verified against server allowlists by platform/type.

# Streaming + large payloads (node media)

WS control plane is fine for small messages, but nodes also do:

  • camera clips
  • screen recordings
  • audio streams

Options:

  1. WS binary frames + chunking + backpressure rules.
  2. Separate streaming endpoint (still TLS + auth).
  3. Keep bridge longer for media‑heavy commands, migrate last.

Pick one before implementation to avoid drift.

# Capability + command policy

  • Node‑reported caps/commands are treated as claims.
  • Gateway enforces per‑platform allowlists.
  • Any new command requires operator approval or explicit allowlist change.
  • Audit changes with timestamps.

# Audit + rate limiting

  • Log: pairing requests, approvals/denials, token issuance/rotation/revocation.
  • Rate‑limit pairing spam and approval prompts.

# Protocol hygiene

  • Explicit protocol version + error codes.
  • Reconnect rules + heartbeat policy.
  • Presence TTL and last‑seen semantics.

# Open questions

  1. Single device running both roles: token model

    • Recommend separate tokens per role (node vs operator).
    • Same deviceId; different scopes; clearer revocation.
  2. Operator scope granularity

    • read/write/admin + approvals + pairing (minimum viable).
    • Consider per‑feature scopes later.
  3. Token rotation + revocation UX

    • Auto‑rotate on role change.
    • UI to revoke by deviceId + role.
  4. Discovery

    • Extend current Bonjour TXT to include WS TLS fingerprint + role hints.
    • Treat as locator hints only.
  5. Cross‑network approval

    • Broadcast to all operator clients; active UI shows modal.
    • First response wins; gateway enforces atomicity.

# Summary (TL;DR)

  • Today: WS control plane + Bridge node transport.
  • Pain: approvals + duplication + two stacks.
  • Proposal: one WS protocol with explicit roles + scopes, unified pairing + TLS pinning, gateway‑hosted approvals, stable device IDs + cute slugs.
  • Outcome: simpler UX, stronger security, less duplication, better mobile routing.

Released under the MIT License.