Concurrent Sessions Plan -- Issue #1493 Step 3

Goal

Enable multiple sessions per bot to run simultaneously. A bot can have 100 sessions, each talking to its own Claude Code process through gate-runtime. The frontend shows which sessions are working, lets you switch instantly, and accumulates messages in real-time from all sessions.

Current Architecture (Single-Session)

The Problem: One Process Per Bot

The current system enforces one Claude Code process per bot and one active session per bot:

User -> sendMessage(botId) -> runtime.send(botId)
                               |
                         bots.get(botId)   <-- ONE BotRuntime per botId
                               |
                         bot.proc (stdin)  <-- ONE ChildProcess
                               |
                         stdout -> handleMessage(bot, msg)
                               |
                         _onEvent(botId, msg)
                               |
                         bridge.handleEvent(botId, event)
                               |
                         emitBotEvent(botId, { event: 'chat', payload: { sessionKey, chatSessionId, ... } })
                               |
                         SSE -> frontend handleChatEventForBot(botId, payload)

Single-Session Assumptions (Must Change)

Layer File Assumption
runtime.ts const bots = new Map<string, BotRuntime>() One BotRuntime per botId. Contains one proc, one runId, one sessionId.
runtime.ts activateSession() Calls stop() when switching sessions -- kills the old process.
runtime.ts send() -> sendToProcess() Writes to bot.proc.stdin -- one process, one stdin.
runtime.ts onTurnComplete() Drains bot.messageQueue -- one queue for all messages.
index.ts sendMessage() Calls ensureCurrentRuntimeSession() -- one session per bot.
index.ts activateSession() Sets bot.runtimeSessionId, bot.chatSessionId -- singular.
session-registry.ts ensureCurrentRuntimeSession() Finds or creates ONE current session. is_current column is a single boolean.
session-registry.ts createFreshRuntimeSession() Stops the previous session before creating a new one.
bridge.ts streams = new Map<string, ...>() One stream accumulator per botId.
bridge.ts _getSessionMeta(botId) Returns one chatSessionId, one sessionKey.
bot-state.ts botWorkingState.get(botId) One working state per bot.
bot-state.ts setBotWorking(botId, runId, sessionKey) One runId, one sessionKey per bot.
ws-manager.ts SSE events keyed by botId All events for a bot go to all SSE clients for that bot.
Frontend ChatState.currentMsg One streaming bubble.
Frontend bot.currentRunId One active run per bot.
Frontend bot.currentStreamText One stream accumulator per bot.
Frontend showTyping(botId) / hideTyping(botId) One typing indicator per bot.

New Architecture (Multi-Session)

Core Concept: Session-Keyed Processes

Instead of Map<botId, BotRuntime>, use Map<sessionKey, BotRuntime> where sessionKey = "gate:{chatSessionId}".

Each session gets its own Claude Code process, its own stdin/stdout, its own queue.

User -> sendMessage(botId, sessionKey) -> runtime.send(sessionKey)
                                           |
                                     sessions.get(sessionKey)  <-- BotRuntime per SESSION
                                           |
                                     session.proc (stdin)      <-- own ChildProcess
                                           |
                                     stdout -> handleMessage(session, msg)
                                           |
                                     _onEvent(botId, msg)   <-- botId + sessionKey tagged
                                           |
                                     bridge.handleEvent(botId, event)  <-- includes sessionKey
                                           |
                                     emitBotEvent(botId, { ..., sessionKey, chatSessionId })
                                           |
                                     SSE -> frontend routes by sessionKey

Phase 1: Backend -- Multi-Process Runtime

File: lib/gate-runtime/runtime.ts

  1. Rename key from botId to sessionKey:

    • const sessions = new Map<string, BotRuntime>() (rename bots -> sessions)
    • Each BotRuntime still has a botId field for identification
    • Registration: register(sessionKey, config) where sessionKey = "gate:{chatSessionId}"
    • send(sessionKey, text, opts) -- routes to the correct process
  2. Keep bot-level helpers:

    • getState(botId) -> scan all sessions for this bot, return 'working' if any are
    • getSessionsForBot(botId) -> return all BotRuntime entries for this bot
    • stopAll(botId) -> stop all processes for a bot
  3. Remove single-session locks:

    • activateSession() no longer calls stop() -- sessions coexist
    • newSession() no longer kills old process -- just registers a new one
    • onTurnComplete() drains queue per-session, not per-bot

File: lib/gate-runtime/index.ts

  1. Route sendMessage by session:

    • sendMessage(botId, message, bot, opts) -> resolve session from opts.chatSessionId
    • If session doesn't have a registered process, register and spawn one
    • activateSession becomes per-sessionKey
    • Bot-level state queries (getState, isRegistered) aggregate across all sessions
  2. syncRuntimeSessionEvent already tags by bot + session -- works as-is.

File: lib/gate-runtime/session-registry.ts

  1. Remove single-session enforcement:
    • ensureCurrentRuntimeSession -> create or find session by chatSessionId, don't enforce is_current
    • createFreshRuntimeSession -> DON'T stop previous session, just create a new one
    • Remove is_current column dependency for session selection (keep for "last active" display)

File: lib/gate-runtime/bridge.ts

  1. Session-keyed stream accumulators:
    • streams = new Map<string, ...>() keyed by sessionKey instead of botId
    • _toolNames keyed by sessionKey
    • Every event emitted includes both botId and sessionKey

File: lib/bot-state.ts

  1. Multi-session working state:
    • botWorkingState -> Map<string, { runId, sessionKey, since }> keyed by composite botId:sessionKey
    • isBotWorking(botId) -> return true if ANY session is working
    • isBotWorkingSession(botId, sessionKey) -> check specific session
    • setBotWorking(botId, runId, sessionKey) -> track per-session
    • clearBotWorking(botId, runId, sessionKey) -> clear per-session
    • getAllWorkingSessions(botId) -> return all working sessions for a bot

Phase 2: CS Ownership -- Code Sessions Belong to Bot Sessions

Critical problem: Code Sessions (CS) are currently owned by bot_id only. The code monitor pings sendMessageToBot(botId, ...) which routes to whatever session is "current". With concurrent sessions, this ping goes to the WRONG session -- the bot session that happens to be active, not the one that started the CS.

Current state (broken for concurrent):

code_sessions.bot_id = 'molly'           <-- CS knows its bot
code_monitor -> sendMessageToBot('molly') <-- pings "the bot"
gate-runtime -> routes to is_current=1    <-- wrong session!

New state (session-aware):

code_sessions.bot_id = 'molly'
code_sessions.chat_session_id = 42       <-- CS knows its bot SESSION
code_monitor -> sendMessageToBot('molly', chatSessionId=42)
gate-runtime -> routes to session 42      <-- correct session

Changes needed:

  1. DB migration: Add chat_session_id INTEGER column to code_sessions table

    • Nullable for backward compat (existing CS get null = legacy bot-level routing)
    • Set on CS creation: startSession(botId, { chatSessionId }) passes the owning session
  2. CS creation: When a bot session starts a CS, tag it with chat_session_id

    • POST /api/bots/:botId/code/start accepts chatSessionId param
    • lib/code-session.ts startSession() stores it in DB
    • The frontend sends chatSessionId from ChatState.currentChatSessionIds[botId]
  3. Monitor routing: deliverMonitorPing() routes to the owning session

    • lib/code-monitor.ts: read code_sessions.chat_session_id for the CS being monitored
    • Pass chatSessionId to sendMessageToBot(botId, msg, ..., chatSessionId)
    • Gate-runtime routes the message to that specific session's process (not "current")
    • Fallback: if chat_session_id is null (legacy CS), route to current session as before
  4. CS transfer: POST /api/bots/:botId/code/session/transfer must also update chat_session_id

    • When transferring a CS between bots, the new bot's current session becomes the owner

Phase 3: Frontend -- Tree Structure

Remove the standalone sidebar sessions section. Sessions are now nested under each bot in the sidebar tree:

Sidebar
  |-- [Search button]  (Ctrl+K — searches bots, sessions, CS, chat history)
  |-- Bot: Molly
  |     |-- Session: "Fix auth bug" (active, working)     <-- click = switch chat to this session
  |     |     |-- CS: claude-a8f2 (running, project-x)    <-- click = open terminal
  |     |     '-- CS: claude-b3c1 (idle, project-y)
  |     |-- Session: "Refactor DB" (idle)
  |     |     '-- CS: claude-d4e5 (completed)
  |     |-- Session: "New session" (new, empty)
  |     |-- [+ New session]
  |     '-- [5 more sessions...]                          <-- collapsed, click to search
  |-- Bot: Klara
  |     |-- Session: "Deploy pipeline" (working)
  |     '-- [+ New session]
  '-- [+ Add bot]

Key changes:

  1. Remove #sidebarSessions section from HTML and loadSidebarSessions() from JS

    • Sessions are no longer a standalone sidebar section
    • They render as children of each bot list item
  2. Extend renderBotListItem() to show bot sessions as expandable children:

    • Each bot item expands to show up to 5 sessions (sorted: working first, then by activity)
    • Each session shows: status icon, label, message count
    • Clicking a session switches the chat view to that session (not the bot -- the SESSION)
    • CS items render nested under their owning session (not flat under the bot)
  3. CS renders under its owning session (not flat under the bot):

    • renderSessionsInSidebar() groups CS by chat_session_id
    • CS with null chat_session_id (legacy) renders under the bot's current/latest session
    • Each CS item: status dot, name/project, click to open terminal
  4. Active state is per-session, not per-bot:

    • Clicking a session makes it the active chat view AND the active session for that bot
    • The bot list item shows which session is active (highlighted child)
    • Multiple bots can have working sessions simultaneously
    • Only the currently viewed session shows the typing indicator in the chat area
  5. Session-level message routing (unchanged from Phase 1 plan):

    • bot.sessionMessages: Record<string, ChatMessage[]> -- keyed by chatSessionId
    • SSE events route by sessionKey to correct message array
    • Switching sessions re-renders from memory (instant)
    • Lazy-load history from API for sessions not yet in memory
  6. Search modal (Ctrl+K):

    • Already searches bots and code sessions
    • Extend to also search bot sessions (by label, first message)
    • Extend to search chat history (by message content)
    • Results show: type icon, label, metadata, click to navigate

Phase 4: DB & Status

  1. gate_runtime_sessions.status tracks per-session state:

    • new -> just created, no Claude process yet
    • working -> Claude process active, processing a message
    • idle -> Claude process alive but idle (waiting for timeout)
    • stopped -> Claude process killed (idle timeout or manual)
    • errored -> last turn errored
  2. is_current becomes "last user-interacted session":

    • Multiple sessions can be working simultaneously
    • is_current = 1 just means "which session the user last sent a message to"
    • Used for UI default selection, not for runtime exclusivity

Event Flow (Concurrent)

Bot has 3 sessions: S1 (working), S2 (idle), S3 (working)

S1 stdout -> handleMessage -> bridge -> emitBotEvent(botId, { sessionKey: "gate:1", state: "delta" })
S3 stdout -> handleMessage -> bridge -> emitBotEvent(botId, { sessionKey: "gate:3", state: "delta" })

Frontend SSE receives both:
  - delta for S1: append to sessionMessages["gate:1"]
  - delta for S3: append to sessionMessages["gate:3"]

User is viewing S1:
  - S1 delta -> render streaming bubble
  - S3 delta -> accumulate silently, update sidebar status icon

User switches to S3:
  - S1 streaming bubble paused (still accumulating in background)
  - S3 messages rendered from sessionMessages["gate:3"]
  - S3 streaming continues in real-time

Implementation Order

  1. Runtime multi-process (runtime.ts) -- key change: Map keyed by sessionKey
  2. Index.ts routing -- sendMessage resolves to sessionKey
  3. Bridge session-keying -- stream accumulators per session
  4. Bot-state multi-session -- working state per session
  5. Session-registry unlocking -- remove stop-previous enforcement
  6. Frontend message routing -- accumulate per-session
  7. Frontend UI -- session status indicators, instant switching
  8. Cleanup -- idle timeout per-session, max concurrent sessions limit

Safety & Limits


Gap Analysis (Critical Issues Found)

Audit of every file in the event chain found 7 critical gaps the plan must address:

GAP 1: messageQueue has no session routing (runtime.ts)

File: lib/gate-runtime/runtime.ts line 86-90, 603-605

The QueuedMessage interface has { text, opts, runId } but no sessionKey or chatSessionId. When multiple sessions queue messages, onTurnComplete() drains FIFO without checking which session the message belongs to.

Example failure: Session 1 finishes, drains queue, sends Session 2's message to Session 1's process.

Fix: Add sessionKey to QueuedMessage. Queue per-session: Map<sessionKey, QueuedMessage[]>. Drain only the queue for the session that just completed.

GAP 2: lastSessionCost is per-bot (runtime.ts)

File: lib/gate-runtime/runtime.ts line 108, 704-705

BotRuntime.lastSessionCost is a single number. Cost is computed as turnCost = msg.total_cost_usd - bot.lastSessionCost. With concurrent sessions, each session has its own cumulative cost from Claude Code, but they share one lastSessionCost. Result: negative costs, wrong cost events.

Fix: Track lastSessionCost per runtimeSessionId: Map<number, number>.

GAP 3: Context tokens are per-bot (runtime.ts)

File: lib/gate-runtime/runtime.ts line 109-113, 531-590

cumulativeInputTokens, cumulativeOutputTokens, contextWindow are fields on BotRuntime (shared). readContextFromConversation() reads one JSONL file and overwrites shared state.

Fix: Track context per session. Each BotRuntime (now per-session) carries its own context state -- this is solved by the rekey to Map<sessionKey, BotRuntime>.

GAP 4: activeProjectId is per-bot (message-capture.ts)

File: lib/message-capture.ts line 58

const activeProjectId = new Map<string, string>() keyed by botId only. Concurrent sessions could overwrite each other's project context.

Fix: Key by botId:chatSessionId composite.

GAP 5: Work rules refresh counter is per-bot

File: lib/work-rules-refresh.ts, called from lib/gate-runtime/index.ts line 363

incrementMessageCount(botId) increments a single counter per bot. Multiple sessions increment the same counter simultaneously, causing premature or skipped refreshes.

Fix: Accept chatSessionId parameter, track per-session. Or: accept per-bot counting as intentional (total messages across all sessions triggers refresh). Document the decision.

GAP 6: CLAUDE.md write collision (config.ts)

File: lib/gate-runtime/config.ts generateClaudeMd()

All sessions for a bot write to the same CLAUDE.md in the shared workspace. 5 concurrent writeFileSync() calls = file corruption.

Fix: Atomic write: write to temp file, then fs.renameSync() (atomic on same filesystem). Or: write once on first session, skip if already fresh (check hash/mtime).

GAP 7: bot-activity SSE is one state per bot

File: lib/ws-manager.ts broadcastBotStatus(), lib/bot-state.ts setBotWorking()

The dashboard and chat frontend receive one bot-activity event per bot with a single state. If Session 1 finishes (emits idle) while Session 2 is still working, the bot shows as idle.

Fix:


GAP 8: Code Sessions have no session ownership

File: lib/drizzle/schema.ts code_sessions table, lib/code-monitor.ts, lib/code-session.ts

code_sessions table has bot_id but NO chat_session_id. The monitor pings sendMessageToBot(botId) which routes to whatever session is "current". With concurrent sessions, the ping hits the wrong session -- the user's active chat, not the session that started the CS.

Example failure: Session 1 starts a CS. User switches to Session 2. CS needs input. Monitor pings Session 2. Session 2 has no idea what the CS is about.

Fix:


Files to Modify (Complete List)

File Change Gap
lib/gate-runtime/runtime.ts Rekey Map by sessionKey, per-session queue, per-session cost/context GAP 1,2,3
lib/gate-runtime/index.ts Route sendMessage by session, aggregate bot-level queries
lib/gate-runtime/bridge.ts Session-keyed stream accumulators
lib/gate-runtime/session-registry.ts Remove single-session enforcement
lib/gate-runtime/config.ts Atomic CLAUDE.md writes GAP 6
lib/bot-state.ts Multi-session working state, aggregate bot status GAP 7
lib/ws-manager.ts Include workingSessions count in bot-activity GAP 7
lib/message-capture.ts Key activeProjectId by bot+session GAP 4
lib/work-rules-refresh.ts Decide: per-session or per-bot counting GAP 5
lib/code-session.ts Accept + store chatSessionId on CS creation GAP 8
lib/code-monitor.ts Pass chatSessionId when pinging bot GAP 8
lib/drizzle/schema.ts Add chat_session_id to code_sessions GAP 8
api/code.ts Accept chatSessionId in start endpoint GAP 8
api/bot-runtime.ts GET sessions shows per-session status + CS children
clients/chat/src/state.ts Add sessionMessages to BotState
clients/chat/src/chat-events.ts Route events by sessionKey
clients/chat/src/ui.ts Tree: Bot > Sessions > CS. Remove standalone sidebar sessions
clients/chat/src/code-sessions.ts Group CS under owning session, not flat under bot
clients/chat/src/search-modal.ts Add session + chat history search
clients/chat/src/history.ts Instant session switching from memory
clients/chat/src/messaging.ts Send with sessionKey, don't kill old session
clients/chat/index.html Remove #sidebarSessions, tree renders in bot list
clients/dashboard/src/stores/bot-store.ts Per-session state tracking (optional) GAP 7