Skip to content

# Command Queue (2026-01-16)

We serialize inbound auto-reply runs (all channels) through a tiny in-process queue to prevent multiple agent runs from colliding, while still allowing safe parallelism across sessions.

# Why

  • Auto-reply runs can be expensive (LLM calls) and can collide when multiple inbound messages arrive close together.
  • Serializing avoids competing for shared resources (session files, logs, CLI stdin) and reduces the chance of upstream rate limits.

# How it works

  • A lane-aware FIFO queue drains each lane with a configurable concurrency cap (default 1 for unconfigured lanes; main defaults to 4, subagent to 8).
  • runEmbeddedPiAgent enqueues by session key (lane session:<key>) to guarantee only one active run per session.
  • Each session run is then queued into a global lane (main by default) so overall parallelism is capped by agents.defaults.maxConcurrent.
  • When verbose logging is enabled, queued runs emit a short notice if they waited more than ~2s before starting.
  • Typing indicators still fire immediately on enqueue (when supported by the channel) so user experience is unchanged while we wait our turn.

# Queue modes (per channel)

Inbound messages can steer the current run, wait for a followup turn, or do both:

  • steer: inject immediately into the current run (cancels pending tool calls after the next tool boundary). If not streaming, falls back to followup.
  • followup: enqueue for the next agent turn after the current run ends.
  • collect: coalesce all queued messages into a single followup turn (default). If messages target different channels/threads, they drain individually to preserve routing.
  • steer-backlog (aka steer+backlog): steer now and preserve the message for a followup turn.
  • interrupt (legacy): abort the active run for that session, then run the newest message.
  • queue (legacy alias): same as steer.

Steer-backlog means you can get a followup response after the steered run, so streaming surfaces can look like duplicates. Prefer collect/steer if you want one response per inbound message. Send /queue collect as a standalone command (per-session) or set messages.queue.byChannel.discord: "collect".

Defaults (when unset in config):

  • All surfaces → collect

Configure globally or per channel via messages.queue:

json5
{
  messages: {
    queue: {
      mode: "collect",
      debounceMs: 1000,
      cap: 20,
      drop: "summarize",
      byChannel: { discord: "collect" },
    },
  },
}

# Queue options

Options apply to followup, collect, and steer-backlog (and to steer when it falls back to followup):

  • debounceMs: wait for quiet before starting a followup turn (prevents “continue, continue”).
  • cap: max queued messages per session.
  • drop: overflow policy (old, new, summarize).

Summarize keeps a short bullet list of dropped messages and injects it as a synthetic followup prompt. Defaults: debounceMs: 1000, cap: 20, drop: summarize.

# Per-session overrides

  • Send /queue <mode> as a standalone command to store the mode for the current session.
  • Options can be combined: /queue collect debounce:2s cap:25 drop:summarize
  • /queue default or /queue reset clears the session override.

# Scope and guarantees

  • Applies to auto-reply agent runs across all inbound channels that use the gateway reply pipeline (WhatsApp web, Telegram, Slack, Discord, Signal, iMessage, webchat, etc.).
  • Default lane (main) is process-wide for inbound + main heartbeats; set agents.defaults.maxConcurrent to allow multiple sessions in parallel.
  • Additional lanes may exist (e.g. cron, subagent) so background jobs can run in parallel without blocking inbound replies.
  • Per-session lanes guarantee that only one agent run touches a given session at a time.
  • No external dependencies or background worker threads; pure TypeScript + promises.

# Troubleshooting

  • If commands seem stuck, enable verbose logs and look for “queued for …ms” lines to confirm the queue is draining.
  • If you need queue depth, enable verbose logs and watch for queue timing lines.

Released under the MIT License.