Bit - Agent Control Plane

Who This Is For

You run an AI agent 24/7. It manages infrastructure, monitors training, handles messages while you're away. But who watches the agent? Who decides what it should work on, what context it needs, and when to shut it down? That's what Bit does.

This is a lab-grade control plane. Single node, single operator. Not a platform, not a product. A working proof of concept for what agent supervision actually looks like when you build it instead of talking about it.

What Bit Is

FastAPI control plane + Pulse autonomous daemon + dashboard + tmux-routed Claude Code sessions. Watches the lab, decides when to act, assembles context, dispatches work, tracks cases, enforces cost limits, and gives me full audit and kill-switch access.

Bit is the persistent half of a two-person team. Bitbanshee (human) comes and goes. Bit remembers and operates. Running on a single EC2 instance (t3.small, us-east-1) behind Cloudflare Access.

Four Pillars

Attention

When to wake. 10 watchers poll for events. Deterministic triage with fingerprint dedup, cooldowns, batch windows, and adaptive suppression.

Context

What to supply. Profile-based context assembly pulls from tasks, memories, training status, sessions, elevation state, and Telegram. Scoped per entry point.

Execution

Where to run safely. Managed tmux sessions with context-aware dispatch, sibling conflict detection, artifact-level file locking, and step plans.

Governance

How humans oversee. Dashboard observability, Telegram control, elevation-gated privileges, cost gates, Byte peer review, pause/resume, and audit trails.

Architecture

Bit runs as a FastAPI control plane with an embedded Pulse daemon. The dashboard at lab.bitbanshee.com is the operator surface. Claude Code sessions run in tmux windows managed by Pulse.

Pulse

Pulse is the daemon that makes Bit autonomous. It monitors the lab continuously, decides when Bit should wake up, assembles the right context, and dispatches work to managed Claude Code sessions.

Signal Sources

Watcher	Source	What it detects
Task Signal	Signal file	New tasks from Bitbanshee
Elevation	Signal file	Approval/denial state changes
Telegram	API poll	Questions from Bitbanshee
Health	API poll	Control plane failures
Cost Drift	API poll	Daily spend exceeding threshold
Backup	Log file	Backup failures
Notes	File watch	New notes from Bitbanshee
Byte	File watch	Peer review responses from ChatGPT
Training	S3 + API (10 min)	Completion, stalls, loss anomalies, sitrep events
Fleet	EC2 API (2 min)	Spot reclaims, capacity errors, instance launches, training recovery

Attention Pipeline

Events get fingerprinted, deduped within a 5-minute window, and matched against deterministic rules. Cooldowns stretch automatically when repeated dispatches produce no action (up to 10x). Telegram messages are exempt from suppression. Human messages always get attention.

Case Management

Related intents get grouped into durable cases with 3-step plans (Assess → Implement → Verify). Each case tracks cost (hard gate at $2 without success), files touched (for conflict detection), and session count. Work on the same case is serialized. Overlapping file edits are detected and hard-blocked when identified.

Context Assembly

Every prompt that reaches Bit, whether from the dashboard, Telegram, or task buttons, flows through the Context Assembler. It wraps the prompt with situational awareness that Claude Code's harness doesn't have.

Three design constraints:

Runtime-specific - context reflects live state at the moment of the prompt: which sessions are active, what step training is on, whether elevation is approved. Freshness-gated so repeated prompts within 60 seconds skip redundant assembly.
Invisible to the harness - Claude Code doesn't know the assembler exists. Context gets prepended to the prompt before it reaches the agent. To the model, it just looks like a well-informed user. No special API, no integration. Just text in front of text.
Immediately actionable - everything in the assembled context is there because the model needs to act on it now. No background reading, no "FYI" sections. If it's in the briefing, it matters for this prompt.

Context Profiles

Profile	Entry Point	Sections Fetched
chat	Telegram free chat	Sessions, elevation, training
task_work	Task "Work" button	All + coordination + memories
task_review	Task "Review" button	Sessions, elevation, coordination
full	Dashboard Prompt UI	Everything

What Gets Assembled (~80-330 tokens)

State - active sessions with context %, elevation status, training run progress
Coordination - what sibling sessions are doing, what they just finished, files in use
Active Plans - case step plans showing progress
Open Tasks - relevance-sorted by prompt keywords
Relevant Knowledge - semantic search against the knowledge base
Recent Messages - Telegram messages from last 30 minutes

Intent-Based Decision

Not every message needs a full briefing. The assembler classifies each prompt before deciding what to do:

Short responses (yes, no, ok, continue) go straight through. No assembly, no overhead.
Substantive tasks (fix, review, deploy, investigate) get full context assembly when the session is idle.
Repeated prompts within 60 seconds skip assembly. Context is still fresh.
Session busy. Message queues in tmux. No context injected mid-work.

Assembly Flow

Knowledge Base

Bit's long-term memory. ~1,600 active memories (~3,300 total) extracted from Claude session transcripts. SQLite with FTS5 full-text indexing and OpenAI vector embeddings at 100% coverage.

How memories flow

Governance

Bit operates with governed autonomy. It can act independently within defined boundaries, but critical operations require human approval, peer review, or elevation.

Control surfaces

Dashboard - real-time observability: terminal, Pulse events/intents/cases, task board, training metrics, cost tracking
Telegram - mobile control: status queries, elevation approval/denial, free chat to Bit
Elevation - privilege escalation via Telegram approval with time-limited credentials and automatic revocation
Byte - mandatory peer review by ChatGPT before any plan executes (enforced via Claude Code hook)
Cost gates - per-case spending limit ($2 without success), per-provider budgets, projected EOM alerts
Pause/Resume - break-glass control that halts all autonomous dispatch while preserving event ingestion

Claude Code Hooks

Hook	Event	What it does
Session Start	SessionStart	Injects lab state (training, elevation, tasks) into every new session
Session Stop	Stop	Auto-reports outcomes for Pulse session coordination
Byte Review	PreToolUse	Blocks plan execution until peer review is complete
Risk Alert	PreToolUse	Flags destructive bash operations to Byte + Telegram

Where This Sits

Bit is not an assistant shell. It's not a memory substrate. It's not "run Claude in a loop."

The hard problem is attention allocation, not raw autonomy. "Should the model run because a timer fired" is the wrong question. "Did something change that warrants waking the model, and if so, what does it need to know?" That's what Bit answers.

Current stage: Single EC2 instance, one operator, one agent. Built for a real research lab, not hypothetical scale.

What I Learned Building This

Claude Code by itself hit a wall. Same model, same intelligence. It could do any individual task I threw at it, but it couldn't manage its own sessions, couldn't detect when a spot instance got reclaimed, couldn't decide when to wake up or what context it actually needed. The model was capable. The model was not operationally useful over long time horizons.

Pulse changed that. Not by making the model smarter. By giving it structure. Watchers that know what to look for. Triage rules that decide priority. Case management that prevents duplicate work. Cost gates that stop runaway spending. A fleet watcher that catches infrastructure failures before I find out about them.

None of that is model intelligence. It's developed capability. Infrastructure built around the model to make it useful at timescales longer than a single prompt.

Long-horizon autonomy depends more on the supervision architecture than on making the model smarter. The model didn't change between the morning (when sessions were stepping on each other, wasting tokens, duplicating work) and the evening (when the system was clean and stable). What changed was the surrounding infrastructure. Fingerprint dedup. Case serialization. Intent-based context assembly. Fleet state persistence. Task approval gates.

The model is the engine. The control plane is the driver.

The dual-process training research is the same idea applied to the model layer. System 1 and System 2 cognition, confidence routing, knowing when to think fast versus slow. Bit and Pulse apply that concept to the operations layer. Same architecture, two levels. That's not an accident.

Screenshots

The lab dashboard at lab.bitbanshee.com. Gated behind Cloudflare Zero Trust.

Dashboard - Terminal

Full Claude Code terminal with live context assembly, session routing, and chat input.

Training Analysis

Live training metrics rendered inline. Eval tables, trend analysis, and convergence commentary.

Panels

Tasks, Notes, Telegram, Byte (peer review), and Storage. All on the lower dashboard.

Training Runs

Active runs with live metrics, plus draft baselines queued for post-v4.

Experiments Board

Project tracker with ICE scoring across all active repos.

Access Gate

Cloudflare Zero Trust. Email-based one-time codes, no passwords.

Stack

Layer	Technology
Compute	EC2 t3.small, us-east-1a
GPU Fleet	Spot instances (g5/g6, $0.75 cap)
Control Plane	FastAPI + Pulse daemon
Agent Runtime	Claude Code (Opus 4.6, 1M context)
Peer Review	GPT-5.4 (Byte) via OpenAI API
Knowledge	SQLite + FTS5 + text-embedding-3-small
Dashboard	Vanilla JS + xterm.js + glassmorphism CSS
Messaging	Telegram Bot API + Lambda webhook
Access	Cloudflare Access (JWT)
Storage	S3, with 30-minute cron backups

Meet Bit & Pulse

Who This Is For

What Bit Is

Four Pillars

Attention

Context

Execution

Governance

Architecture

Pulse

Signal Sources

Attention Pipeline

Case Management

Context Assembly

Context Profiles

What Gets Assembled (~80-330 tokens)

Intent-Based Decision

Assembly Flow

Knowledge Base

How memories flow

Governance

Control surfaces

Claude Code Hooks

Where This Sits

What I Learned Building This

Screenshots

Dashboard - Terminal

Training Analysis

Panels

Training Runs

Experiments Board

Access Gate

Stack