Concurrent Sessions Plan -- Issue #1493 Step 3

Goal

Enable multiple sessions per bot to run simultaneously. A bot can have 100 sessions, each talking to its own Claude Code process through gate-runtime. The frontend shows which sessions are working, lets you switch instantly, and accumulates messages in real-time from all sessions.

Current Architecture (Single-Session)

The Problem: One Process Per Bot

The current system enforces one Claude Code process per bot and one active session per bot:

User -> sendMessage(botId) -> runtime.send(botId)
                               |
                         bots.get(botId)   <-- ONE BotRuntime per botId
                               |
                         bot.proc (stdin)  <-- ONE ChildProcess
                               |
                         stdout -> handleMessage(bot, msg)
                               |
                         _onEvent(botId, msg)
                               |
                         bridge.handleEvent(botId, event)
                               |
                         emitBotEvent(botId, { event: 'chat', payload: { sessionKey, chatSessionId, ... } })
                               |
                         SSE -> frontend handleChatEventForBot(botId, payload)

Single-Session Assumptions (Must Change)

Layer	File	Assumption
runtime.ts	`const bots = new Map<string, BotRuntime>()`	One `BotRuntime` per `botId`. Contains one `proc`, one `runId`, one `sessionId`.
runtime.ts	`activateSession()`	Calls `stop()` when switching sessions -- kills the old process.
runtime.ts	`send()` -> `sendToProcess()`	Writes to `bot.proc.stdin` -- one process, one stdin.
runtime.ts	`onTurnComplete()`	Drains `bot.messageQueue` -- one queue for all messages.
index.ts	`sendMessage()`	Calls `ensureCurrentRuntimeSession()` -- one session per bot.
index.ts	`activateSession()`	Sets `bot.runtimeSessionId`, `bot.chatSessionId` -- singular.
session-registry.ts	`ensureCurrentRuntimeSession()`	Finds or creates ONE current session. `is_current` column is a single boolean.
session-registry.ts	`createFreshRuntimeSession()`	Stops the previous session before creating a new one.
bridge.ts	`streams = new Map<string, ...>()`	One stream accumulator per `botId`.
bridge.ts	`_getSessionMeta(botId)`	Returns one `chatSessionId`, one `sessionKey`.
bot-state.ts	`botWorkingState.get(botId)`	One working state per bot.
bot-state.ts	`setBotWorking(botId, runId, sessionKey)`	One runId, one sessionKey per bot.
ws-manager.ts	SSE events keyed by `botId`	All events for a bot go to all SSE clients for that bot.
Frontend	`ChatState.currentMsg`	One streaming bubble.
Frontend	`bot.currentRunId`	One active run per bot.
Frontend	`bot.currentStreamText`	One stream accumulator per bot.
Frontend	`showTyping(botId)` / `hideTyping(botId)`	One typing indicator per bot.

New Architecture (Multi-Session)

Core Concept: Session-Keyed Processes

Instead of Map<botId, BotRuntime>, use Map<sessionKey, BotRuntime> where sessionKey = "gate:{chatSessionId}".

Each session gets its own Claude Code process, its own stdin/stdout, its own queue.

User -> sendMessage(botId, sessionKey) -> runtime.send(sessionKey)
                                           |
                                     sessions.get(sessionKey)  <-- BotRuntime per SESSION
                                           |
                                     session.proc (stdin)      <-- own ChildProcess
                                           |
                                     stdout -> handleMessage(session, msg)
                                           |
                                     _onEvent(botId, msg)   <-- botId + sessionKey tagged
                                           |
                                     bridge.handleEvent(botId, event)  <-- includes sessionKey
                                           |
                                     emitBotEvent(botId, { ..., sessionKey, chatSessionId })
                                           |
                                     SSE -> frontend routes by sessionKey

Phase 1: Backend -- Multi-Process Runtime

File: lib/gate-runtime/runtime.ts

Rename key from botId to sessionKey:
- const sessions = new Map<string, BotRuntime>() (rename bots -> sessions)
- Each BotRuntime still has a botId field for identification
- Registration: register(sessionKey, config) where sessionKey = "gate:{chatSessionId}"
- send(sessionKey, text, opts) -- routes to the correct process
Keep bot-level helpers:
- getState(botId) -> scan all sessions for this bot, return 'working' if any are
- getSessionsForBot(botId) -> return all BotRuntime entries for this bot
- stopAll(botId) -> stop all processes for a bot
Remove single-session locks:
- activateSession() no longer calls stop() -- sessions coexist
- newSession() no longer kills old process -- just registers a new one
- onTurnComplete() drains queue per-session, not per-bot

File: lib/gate-runtime/index.ts

Route sendMessage by session:
- sendMessage(botId, message, bot, opts) -> resolve session from opts.chatSessionId
- If session doesn't have a registered process, register and spawn one
- activateSession becomes per-sessionKey
- Bot-level state queries (getState, isRegistered) aggregate across all sessions
syncRuntimeSessionEvent already tags by bot + session -- works as-is.

File: lib/gate-runtime/session-registry.ts

Remove single-session enforcement:
- ensureCurrentRuntimeSession -> create or find session by chatSessionId, don't enforce is_current
- createFreshRuntimeSession -> DON'T stop previous session, just create a new one
- Remove is_current column dependency for session selection (keep for "last active" display)

File: lib/gate-runtime/bridge.ts

Session-keyed stream accumulators:
- streams = new Map<string, ...>() keyed by sessionKey instead of botId
- _toolNames keyed by sessionKey
- Every event emitted includes both botId and sessionKey

File: lib/bot-state.ts

Multi-session working state:
- botWorkingState -> Map<string, { runId, sessionKey, since }> keyed by composite botId:sessionKey
- isBotWorking(botId) -> return true if ANY session is working
- isBotWorkingSession(botId, sessionKey) -> check specific session
- setBotWorking(botId, runId, sessionKey) -> track per-session
- clearBotWorking(botId, runId, sessionKey) -> clear per-session
- getAllWorkingSessions(botId) -> return all working sessions for a bot

Phase 2: CS Ownership -- Code Sessions Belong to Bot Sessions

Critical problem: Code Sessions (CS) are currently owned by bot_id only. The code monitor pings sendMessageToBot(botId, ...) which routes to whatever session is "current". With concurrent sessions, this ping goes to the WRONG session -- the bot session that happens to be active, not the one that started the CS.

Current state (broken for concurrent):

code_sessions.bot_id = 'molly'           <-- CS knows its bot
code_monitor -> sendMessageToBot('molly') <-- pings "the bot"
gate-runtime -> routes to is_current=1    <-- wrong session!

New state (session-aware):

code_sessions.bot_id = 'molly'
code_sessions.chat_session_id = 42       <-- CS knows its bot SESSION
code_monitor -> sendMessageToBot('molly', chatSessionId=42)
gate-runtime -> routes to session 42      <-- correct session

Changes needed:

DB migration: Add chat_session_id INTEGER column to code_sessions table
- Nullable for backward compat (existing CS get null = legacy bot-level routing)
- Set on CS creation: startSession(botId, { chatSessionId }) passes the owning session
CS creation: When a bot session starts a CS, tag it with chat_session_id
- POST /api/bots/:botId/code/start accepts chatSessionId param
- lib/code-session.ts startSession() stores it in DB
- The frontend sends chatSessionId from ChatState.currentChatSessionIds[botId]
Monitor routing: deliverMonitorPing() routes to the owning session
- lib/code-monitor.ts: read code_sessions.chat_session_id for the CS being monitored
- Pass chatSessionId to sendMessageToBot(botId, msg, ..., chatSessionId)
- Gate-runtime routes the message to that specific session's process (not "current")
- Fallback: if chat_session_id is null (legacy CS), route to current session as before
CS transfer: POST /api/bots/:botId/code/session/transfer must also update chat_session_id
- When transferring a CS between bots, the new bot's current session becomes the owner

Phase 3: Frontend -- Tree Structure

Remove the standalone sidebar sessions section. Sessions are now nested under each bot in the sidebar tree:

Sidebar
  |-- [Search button]  (Ctrl+K — searches bots, sessions, CS, chat history)
  |-- Bot: Molly
  |     |-- Session: "Fix auth bug" (active, working)     <-- click = switch chat to this session
  |     |     |-- CS: claude-a8f2 (running, project-x)    <-- click = open terminal
  |     |     '-- CS: claude-b3c1 (idle, project-y)
  |     |-- Session: "Refactor DB" (idle)
  |     |     '-- CS: claude-d4e5 (completed)
  |     |-- Session: "New session" (new, empty)
  |     |-- [+ New session]
  |     '-- [5 more sessions...]                          <-- collapsed, click to search
  |-- Bot: Klara
  |     |-- Session: "Deploy pipeline" (working)
  |     '-- [+ New session]
  '-- [+ Add bot]

Key changes:

Remove #sidebarSessions section from HTML and loadSidebarSessions() from JS
- Sessions are no longer a standalone sidebar section
- They render as children of each bot list item
Extend renderBotListItem() to show bot sessions as expandable children:
- Each bot item expands to show up to 5 sessions (sorted: working first, then by activity)
- Each session shows: status icon, label, message count
- Clicking a session switches the chat view to that session (not the bot -- the SESSION)
- CS items render nested under their owning session (not flat under the bot)
CS renders under its owning session (not flat under the bot):
- renderSessionsInSidebar() groups CS by chat_session_id
- CS with null chat_session_id (legacy) renders under the bot's current/latest session
- Each CS item: status dot, name/project, click to open terminal
Active state is per-session, not per-bot:
- Clicking a session makes it the active chat view AND the active session for that bot
- The bot list item shows which session is active (highlighted child)
- Multiple bots can have working sessions simultaneously
- Only the currently viewed session shows the typing indicator in the chat area
Session-level message routing (unchanged from Phase 1 plan):
- bot.sessionMessages: Record<string, ChatMessage[]> -- keyed by chatSessionId
- SSE events route by sessionKey to correct message array
- Switching sessions re-renders from memory (instant)
- Lazy-load history from API for sessions not yet in memory
Search modal (Ctrl+K):
- Already searches bots and code sessions
- Extend to also search bot sessions (by label, first message)
- Extend to search chat history (by message content)
- Results show: type icon, label, metadata, click to navigate

Phase 4: DB & Status

gate_runtime_sessions.status tracks per-session state:
- new -> just created, no Claude process yet
- working -> Claude process active, processing a message
- idle -> Claude process alive but idle (waiting for timeout)
- stopped -> Claude process killed (idle timeout or manual)
- errored -> last turn errored
is_current becomes "last user-interacted session":
- Multiple sessions can be working simultaneously
- is_current = 1 just means "which session the user last sent a message to"
- Used for UI default selection, not for runtime exclusivity

Event Flow (Concurrent)

Bot has 3 sessions: S1 (working), S2 (idle), S3 (working)

S1 stdout -> handleMessage -> bridge -> emitBotEvent(botId, { sessionKey: "gate:1", state: "delta" })
S3 stdout -> handleMessage -> bridge -> emitBotEvent(botId, { sessionKey: "gate:3", state: "delta" })

Frontend SSE receives both:
  - delta for S1: append to sessionMessages["gate:1"]
  - delta for S3: append to sessionMessages["gate:3"]

User is viewing S1:
  - S1 delta -> render streaming bubble
  - S3 delta -> accumulate silently, update sidebar status icon

User switches to S3:
  - S1 streaming bubble paused (still accumulating in background)
  - S3 messages rendered from sessionMessages["gate:3"]
  - S3 streaming continues in real-time

Implementation Order

Runtime multi-process (runtime.ts) -- key change: Map keyed by sessionKey
Index.ts routing -- sendMessage resolves to sessionKey
Bridge session-keying -- stream accumulators per session
Bot-state multi-session -- working state per session
Session-registry unlocking -- remove stop-previous enforcement
Frontend message routing -- accumulate per-session
Frontend UI -- session status indicators, instant switching
Cleanup -- idle timeout per-session, max concurrent sessions limit

Safety & Limits

Max concurrent PROCESSES per bot: 5 (configurable). Prevents runaway process spawning.
Min sessions per bot: 1. Auto-created on first load if none exist.
Idle timeout: 10 min per process (existing). Inactive sessions auto-kill process but preserve session data.
Memory: Each Claude Code process uses ~50-100MB. 5 concurrent = 250-500MB per bot.
Auth: All sessions for a bot share the same auth profile and credentials.
Workspace: All sessions for a bot share the same workspace directory. Claude Code handles isolation with worktrees.
Session count: Unlimited sessions can EXIST (history preserved). Only 5 can have live PROCESSES at once.

Gap Analysis (Critical Issues Found)

Audit of every file in the event chain found 7 critical gaps the plan must address:

GAP 1: `messageQueue` has no session routing (runtime.ts)

File: lib/gate-runtime/runtime.ts line 86-90, 603-605

The QueuedMessage interface has { text, opts, runId } but no sessionKey or chatSessionId. When multiple sessions queue messages, onTurnComplete() drains FIFO without checking which session the message belongs to.

Example failure: Session 1 finishes, drains queue, sends Session 2's message to Session 1's process.

Fix: Add sessionKey to QueuedMessage. Queue per-session: Map<sessionKey, QueuedMessage[]>. Drain only the queue for the session that just completed.

GAP 2: `lastSessionCost` is per-bot (runtime.ts)

File: lib/gate-runtime/runtime.ts line 108, 704-705

BotRuntime.lastSessionCost is a single number. Cost is computed as turnCost = msg.total_cost_usd - bot.lastSessionCost. With concurrent sessions, each session has its own cumulative cost from Claude Code, but they share one lastSessionCost. Result: negative costs, wrong cost events.

Fix: Track lastSessionCost per runtimeSessionId: Map<number, number>.

GAP 3: Context tokens are per-bot (runtime.ts)

File: lib/gate-runtime/runtime.ts line 109-113, 531-590

cumulativeInputTokens, cumulativeOutputTokens, contextWindow are fields on BotRuntime (shared). readContextFromConversation() reads one JSONL file and overwrites shared state.

Fix: Track context per session. Each BotRuntime (now per-session) carries its own context state -- this is solved by the rekey to Map<sessionKey, BotRuntime>.

GAP 4: `activeProjectId` is per-bot (message-capture.ts)

File: lib/message-capture.ts line 58

const activeProjectId = new Map<string, string>() keyed by botId only. Concurrent sessions could overwrite each other's project context.

Fix: Key by botId:chatSessionId composite.

GAP 5: Work rules refresh counter is per-bot

File: lib/work-rules-refresh.ts, called from lib/gate-runtime/index.ts line 363

incrementMessageCount(botId) increments a single counter per bot. Multiple sessions increment the same counter simultaneously, causing premature or skipped refreshes.

Fix: Accept chatSessionId parameter, track per-session. Or: accept per-bot counting as intentional (total messages across all sessions triggers refresh). Document the decision.

GAP 6: CLAUDE.md write collision (config.ts)

File: lib/gate-runtime/config.ts generateClaudeMd()

All sessions for a bot write to the same CLAUDE.md in the shared workspace. 5 concurrent writeFileSync() calls = file corruption.

Fix: Atomic write: write to temp file, then fs.renameSync() (atomic on same filesystem). Or: write once on first session, skip if already fresh (check hash/mtime).

GAP 7: `bot-activity` SSE is one state per bot

File: lib/ws-manager.ts broadcastBotStatus(), lib/bot-state.ts setBotWorking()

The dashboard and chat frontend receive one bot-activity event per bot with a single state. If Session 1 finishes (emits idle) while Session 2 is still working, the bot shows as idle.

Fix:

bot-activity event includes workingSessions: number count
setBotWorking / clearBotWorking check if ANY session is still working before emitting idle
Dashboard bot card: show "2/5 sessions working" or similar aggregate
Chat sidebar: per-session status icons (already planned)

GAP 8: Code Sessions have no session ownership

File: lib/drizzle/schema.ts code_sessions table, lib/code-monitor.ts, lib/code-session.ts

code_sessions table has bot_id but NO chat_session_id. The monitor pings sendMessageToBot(botId) which routes to whatever session is "current". With concurrent sessions, the ping hits the wrong session -- the user's active chat, not the session that started the CS.

Example failure: Session 1 starts a CS. User switches to Session 2. CS needs input. Monitor pings Session 2. Session 2 has no idea what the CS is about.