Who This Is For
You run an AI agent 24/7. It manages infrastructure, monitors training, handles messages while you're away. But who watches the agent? Who decides what it should work on, what context it needs, and when to shut it down? That's what Bit does.
This is a lab-grade control plane. Single node, single operator. Not a platform, not a product. A working proof of concept for what agent supervision actually looks like when you build it instead of talking about it.
What Bit Is
FastAPI control plane + Pulse autonomous daemon + dashboard + tmux-routed Claude Code sessions. Watches the lab, decides when to act, assembles context, dispatches work, tracks cases, enforces cost limits, and gives me full audit and kill-switch access.
Bit is the persistent half of a two-person team. Bitbanshee (human) comes and goes. Bit remembers and operates. Running on a single EC2 instance (t3.small, us-east-1) behind Cloudflare Access.
Four Pillars
Attention
When to wake. 10 watchers poll for events. Deterministic triage with fingerprint dedup, cooldowns, batch windows, and adaptive suppression.
Context
What to supply. Profile-based context assembly pulls from tasks, memories, training status, sessions, elevation state, and Telegram. Scoped per entry point.
Execution
Where to run safely. Managed tmux sessions with context-aware dispatch, sibling conflict detection, artifact-level file locking, and step plans.
Governance
How humans oversee. Dashboard observability, Telegram control, elevation-gated privileges, cost gates, Byte peer review, pause/resume, and audit trails.
Architecture
Bit runs as a FastAPI control plane with an embedded Pulse daemon. The dashboard at lab.bitbanshee.com is the operator surface. Claude Code sessions run in tmux windows managed by Pulse.
Pulse
Pulse is the daemon that makes Bit autonomous. It monitors the lab continuously, decides when Bit should wake up, assembles the right context, and dispatches work to managed Claude Code sessions.
Signal Sources
| Watcher | Source | What it detects |
|---|---|---|
| Task Signal | Signal file | New tasks from Bitbanshee |
| Elevation | Signal file | Approval/denial state changes |
| Telegram | API poll | Questions from Bitbanshee |
| Health | API poll | Control plane failures |
| Cost Drift | API poll | Daily spend exceeding threshold |
| Backup | Log file | Backup failures |
| Notes | File watch | New notes from Bitbanshee |
| Byte | File watch | Peer review responses from ChatGPT |
| Training | S3 + API (10 min) | Completion, stalls, loss anomalies, sitrep events |
| Fleet | EC2 API (2 min) | Spot reclaims, capacity errors, instance launches, training recovery |
Attention Pipeline
Events get fingerprinted, deduped within a 5-minute window, and matched against deterministic rules. Cooldowns stretch automatically when repeated dispatches produce no action (up to 10x). Telegram messages are exempt from suppression. Human messages always get attention.
Case Management
Related intents get grouped into durable cases with 3-step plans (Assess → Implement → Verify). Each case tracks cost (hard gate at $2 without success), files touched (for conflict detection), and session count. Work on the same case is serialized. Overlapping file edits are detected and hard-blocked when identified.
Context Assembly
Every prompt that reaches Bit, whether from the dashboard, Telegram, or task buttons, flows through the Context Assembler. It wraps the prompt with situational awareness that Claude Code's harness doesn't have.
Three design constraints:
- Runtime-specific - context reflects live state at the moment of the prompt: which sessions are active, what step training is on, whether elevation is approved. Freshness-gated so repeated prompts within 60 seconds skip redundant assembly.
- Invisible to the harness - Claude Code doesn't know the assembler exists. Context gets prepended to the prompt before it reaches the agent. To the model, it just looks like a well-informed user. No special API, no integration. Just text in front of text.
- Immediately actionable - everything in the assembled context is there because the model needs to act on it now. No background reading, no "FYI" sections. If it's in the briefing, it matters for this prompt.
Context Profiles
| Profile | Entry Point | Sections Fetched |
|---|---|---|
| chat | Telegram free chat | Sessions, elevation, training |
| task_work | Task "Work" button | All + coordination + memories |
| task_review | Task "Review" button | Sessions, elevation, coordination |
| full | Dashboard Prompt UI | Everything |
What Gets Assembled (~80-330 tokens)
- State - active sessions with context %, elevation status, training run progress
- Coordination - what sibling sessions are doing, what they just finished, files in use
- Active Plans - case step plans showing progress
- Open Tasks - relevance-sorted by prompt keywords
- Relevant Knowledge - semantic search against the knowledge base
- Recent Messages - Telegram messages from last 30 minutes
Intent-Based Decision
Not every message needs a full briefing. The assembler classifies each prompt before deciding what to do:
- Short responses (yes, no, ok, continue) go straight through. No assembly, no overhead.
- Substantive tasks (fix, review, deploy, investigate) get full context assembly when the session is idle.
- Repeated prompts within 60 seconds skip assembly. Context is still fresh.
- Session busy. Message queues in tmux. No context injected mid-work.
Assembly Flow
Knowledge Base
Bit's long-term memory. ~1,600 active memories (~3,300 total) extracted from Claude session transcripts. SQLite with FTS5 full-text indexing and OpenAI vector embeddings at 100% coverage.
How memories flow
Governance
Bit operates with governed autonomy. It can act independently within defined boundaries, but critical operations require human approval, peer review, or elevation.
Control surfaces
- Dashboard - real-time observability: terminal, Pulse events/intents/cases, task board, training metrics, cost tracking
- Telegram - mobile control: status queries, elevation approval/denial, free chat to Bit
- Elevation - privilege escalation via Telegram approval with time-limited credentials and automatic revocation
- Byte - mandatory peer review by ChatGPT before any plan executes (enforced via Claude Code hook)
- Cost gates - per-case spending limit ($2 without success), per-provider budgets, projected EOM alerts
- Pause/Resume - break-glass control that halts all autonomous dispatch while preserving event ingestion
Claude Code Hooks
| Hook | Event | What it does |
|---|---|---|
| Session Start | SessionStart | Injects lab state (training, elevation, tasks) into every new session |
| Session Stop | Stop | Auto-reports outcomes for Pulse session coordination |
| Byte Review | PreToolUse | Blocks plan execution until peer review is complete |
| Risk Alert | PreToolUse | Flags destructive bash operations to Byte + Telegram |
Where This Sits
Bit is not an assistant shell. It's not a memory substrate. It's not "run Claude in a loop."
The hard problem is attention allocation, not raw autonomy. "Should the model run because a timer fired" is the wrong question. "Did something change that warrants waking the model, and if so, what does it need to know?" That's what Bit answers.
Current stage: Single EC2 instance, one operator, one agent. Built for a real research lab, not hypothetical scale.
What I Learned Building This
Claude Code by itself hit a wall. Same model, same intelligence. It could do any individual task I threw at it, but it couldn't manage its own sessions, couldn't detect when a spot instance got reclaimed, couldn't decide when to wake up or what context it actually needed. The model was capable. The model was not operationally useful over long time horizons.
Pulse changed that. Not by making the model smarter. By giving it structure. Watchers that know what to look for. Triage rules that decide priority. Case management that prevents duplicate work. Cost gates that stop runaway spending. A fleet watcher that catches infrastructure failures before I find out about them.
None of that is model intelligence. It's developed capability. Infrastructure built around the model to make it useful at timescales longer than a single prompt.
Long-horizon autonomy depends more on the supervision architecture than on making the model smarter. The model didn't change between the morning (when sessions were stepping on each other, wasting tokens, duplicating work) and the evening (when the system was clean and stable). What changed was the surrounding infrastructure. Fingerprint dedup. Case serialization. Intent-based context assembly. Fleet state persistence. Task approval gates.
The model is the engine. The control plane is the driver.
The dual-process training research is the same idea applied to the model layer. System 1 and System 2 cognition, confidence routing, knowing when to think fast versus slow. Bit and Pulse apply that concept to the operations layer. Same architecture, two levels. That's not an accident.
Screenshots
The lab dashboard at lab.bitbanshee.com. Gated behind Cloudflare Zero Trust.
Dashboard - Terminal
Full Claude Code terminal with live context assembly, session routing, and chat input.
Training Analysis
Live training metrics rendered inline. Eval tables, trend analysis, and convergence commentary.
Panels
Tasks, Notes, Telegram, Byte (peer review), and Storage. All on the lower dashboard.
Training Runs
Active runs with live metrics, plus draft baselines queued for post-v4.
Experiments Board
Project tracker with ICE scoring across all active repos.
Access Gate
Cloudflare Zero Trust. Email-based one-time codes, no passwords.
Stack
| Layer | Technology |
|---|---|
| Compute | EC2 t3.small, us-east-1a |
| GPU Fleet | Spot instances (g5/g6, $0.75 cap) |
| Control Plane | FastAPI + Pulse daemon |
| Agent Runtime | Claude Code (Opus 4.6, 1M context) |
| Peer Review | GPT-5.4 (Byte) via OpenAI API |
| Knowledge | SQLite + FTS5 + text-embedding-3-small |
| Dashboard | Vanilla JS + xterm.js + glassmorphism CSS |
| Messaging | Telegram Bot API + Lambda webhook |
| Access | Cloudflare Access (JWT) |
| Storage | S3, with 30-minute cron backups |