Day-2 operations
Everything on this page assumes a running registry with real capsule data. Test each runbook against a scratch registry (CAPSOL_DATA_DIR=/tmp/capsol-drill capsol serve) before you need it in anger.
Upgrades
Section titled “Upgrades”capsol upgrades are in-place: stop the process, install the new version, start it. Registry state lives entirely in CAPSOL_DATA_DIR (default ./registry-data) and is migrated forward automatically at boot — migrations are additive and idempotent.
0.15 → 0.16 specifically
Section titled “0.15 → 0.16 specifically”- Take a backup first (next section). 0.16 adds fields; it does not rewrite existing records, but a backup makes the rollback trivial.
- Install and restart:
npm install -g capsol@0.16then restartcapsol serve(or redeploy the container). - What changes for clients:
- Nothing breaks.
/mcp/:capsuleIdURLs, existing enrollment tokens, and existing share credentials keep working. Credentials issued before 0.16 keep their recorded expiry (none), so no agent is locked out by the upgrade. - New OAuth access tokens now expire after 30 days and come with a refresh token. Long-lived clients should use the
refresh_tokengrant; see API reference. - Dashboard sessions need a re-login once (cookies are now
SameSite=Strictand carry a CSRF token).
- Nothing breaks.
- Verify:
curl -s localhost:4000/healthshows the new version;curl -s localhost:4000/readyreportsstatus: "ready"; an agent can still read its capsule. - Rollback: stop the process, reinstall the previous version, restore the backup taken in step 1 if anything migrated unexpectedly.
Backup and restore
Section titled “Backup and restore”All registry state is files. There is no database.
| What | Where | Contains |
|---|---|---|
| Registry index | CAPSOL_DATA_DIR/*.json | capsules, shares, principals, credential hashes, grants, settings, policy |
| Capsule content | CAPSOL_DATA_DIR/boxes/<id>/.capsol/ | knowledge entries, blobs, audit logs, memory |
| Admin key | ~/.capsol/admin.key | generated bootstrap key (skip if you set CAPSOL_ADMIN_KEY yourself) |
| Secret key | wherever CAPSOL_SECRET_KEY_FILE points | decrypts OIDC/SMTP settings |
Backup
Section titled “Backup”# Hot backup is safe: writes are atomic temp+rename per file.tar -czf capsol-backup-$(date +%Y%m%d-%H%M%S).tgz \ -C "$(dirname "$CAPSOL_DATA_DIR")" "$(basename "$CAPSOL_DATA_DIR")"Store the tarball, the admin key, and the secret key in separate places — the tarball contains only credential hashes, but it does contain all capsule content.
Restore drill
Section titled “Restore drill”systemctl stop capsol # or: fly scale count 0 / docker stopmv "$CAPSOL_DATA_DIR" "$CAPSOL_DATA_DIR.broken-$(date +%s)"tar -xzf capsol-backup-....tgz -C "$(dirname "$CAPSOL_DATA_DIR")"systemctl start capsolcurl -s localhost:4000/ready # expect "ready"Run the drill quarterly. A backup you have never restored is a hypothesis.
Key rotation runbooks
Section titled “Key rotation runbooks”Rotate the admin key
Section titled “Rotate the admin key”- Generate:
openssl rand -hex 24(or any ≥16-byte secret), or delete~/.capsol/admin.keyand let first-run regenerate one with a checksum. - Set it:
CAPSOL_ADMIN_KEY=<new>in the environment (preferred for hosted), or write it to~/.capsol/admin.key. - Restart the registry. Old dashboard sessions die immediately — the cookie is
sha256(key)and stops matching. - Update anything that calls
/v1/*with the old key (CI, scripts,capsol grants --key).
Agent connections are not affected: MCP credentials are independent of the admin key.
Rotate the secret key
Section titled “Rotate the secret key”The secret key encrypts the OIDC client secret and SMTP URL inside settings.json. Rotating it makes those two ciphertexts unreadable, so:
- Note the current OIDC client secret and SMTP URL (from your IdP / mail provider — they are not recoverable from capsol after rotation).
- Generate:
openssl rand -base64 48→ setCAPSOL_SECRET_KEY(orCAPSOL_SECRET_KEY_FILE). - Restart.
- Re-enter the OIDC client secret and SMTP URL in dashboard → Settings.
Revoke a compromised agent in under 60 seconds
Section titled “Revoke a compromised agent in under 60 seconds”# 1. Find the connection (10s)curl -s -H "Authorization: Bearer $ADMIN" localhost:4000/v1/connections | \ python3 -c "import json,sys; [print(c['connection_id'], c['label'], c['role']) for c in json.load(sys.stdin)['connections']]"
# 2. Revoke it (5s) — immediate; the next MCP call gets 401 token_revokedcurl -s -X PATCH -H "Authorization: Bearer $ADMIN" -H "Content-Type: application/json" \ -d '{"status":"revoked"}' localhost:4000/v1/shares/<connection_id>Or in the dashboard: Connections → row → revoke. If the credential was OAuth-issued and you only have the token, POST /oauth/revoke with token=<value> kills the access and refresh token. Pausing ({"status":"paused"}) is the reversible variant.
Operator lifecycle
Section titled “Operator lifecycle”- Add: dashboard Operators → invite by email (or
POST /v1/operators/invites). The link is single-use, expires in 7 days, and binds the invited role on OIDC sign-in. Direct creation (POST /v1/operators) is admin-only. - Change role:
PATCH /v1/operators/:id {"role": "approver"}. The new role applies on the operator’s next login. - Offboard:
PATCH /v1/operators/:id {"status": "disabled"}— existing sessions stop working immediately and new logins are refused. Their owned capsules stay; an admin can reassign ownership viaPATCH /v1/capsules/:id. - No passwords: operator access is OIDC or break-glass only. If your IdP is down, the admin key still works (and is logged as
break-glass). - Pre-0.19 capsules are unowned; an admin claims them with
PATCH /v1/capsules/:id {"claim_ownership": true}.
Incident playbook
Section titled “Incident playbook””The registry won’t start”
Section titled “”The registry won’t start””- Read stderr. Boot failures are structured JSON with a
remediationfield. CAPSOL_SECRET_KEY required in production→ set a persistent key:openssl rand -base64 48. The process exits non-zero by design rather than booting with an ephemeral key.Cannot create ~/.capsol/Cannot write→ the message includes the exactmkdir/chmodfix.EADDRINUSE→ withPORTset, the port is honored exactly and busy means exit; without it, capsol tries 4000–4010 automatically.- Corrupt JSON in
CAPSOL_DATA_DIR(e.g. disk-full partial write — rare; writes are atomic) → restore the affected file from backup; each*.jsonfile is independent.
”An agent reports 401”
Section titled “”An agent reports 401””Ask the agent for the error_code — every 401 carries one (table):
| Code | Meaning | Operator action |
|---|---|---|
token_expired | 30-day OAuth TTL elapsed | None — the client should use its refresh token |
token_revoked | Credential revoked/rotated | Re-approve access if unintended |
grant_revoked | You (or policy) revoked the grant | Expected; re-approve if needed |
connection_paused | Connection paused | Reactivate from Connections |
invalid_token | Unknown credential | Client misconfigured — re-enroll or re-run OAuth |
”I think a token leaked”
Section titled “”I think a token leaked””- Revoke it (runbook above) — under 60 seconds.
- Check the capsule audit log (
/v1/capsules/:id/logsor dashboard Activity) for what that connection read or wrote. - Rotate the admin key if the leak vector could have included it.
- Audit logs are plain JSONL without tamper-evidence — treat them as a best-effort record, not forensic proof (see Security).
Retention and log growth
Section titled “Retention and log growth”- Per-capsule audit logs are daily JSONL files under
boxes/<id>/.capsol/logs/. Nothing rotates them automatically; prune by age with a cron job (find ... -name '*.jsonl' -mtime +90 -delete) after archiving if you need history. - Signals are stored as capsule entries under
notes://signals/with a TTL (default 1 day) and are garbage-collected opportunistically on signal writes. - Anonymous capsules expire after 7 days and are cleaned hourly.
- The registry
*.jsonindex files do not grow unboundedly with traffic — only with the number of capsules, connections, grants, and enrollments.