Guard Rails complement Quality Rails by controlling what agents can do: - Capability-based permissions (resource:action pattern) - Read/organize/draft allowed by default - Execute/admin require explicit grants - Human-in-the-loop approval for sensitive actions Examples: email (read/draft ✅, send ❌), git (commit ✅, force push ❌) Also: - Add .admin-credentials and .env.bak.* to .gitignore Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
13 KiB
Guard Rails: Capability-Based Permission System
Overview
Mosaic Stack implements two complementary safety systems:
| System | Purpose | Scope |
|---|---|---|
| Quality Rails | Ensure output quality | Code reviews, linting, tests, token budgets |
| Guard Rails | Control agent capabilities | What agents CAN and CANNOT do |
This document describes the Guard Rails system—a capability-based permission model that limits what agents, integrations, and plugins can do within the platform.
Core Principle
Prepare freely, execute with approval.
Agents should be able to read, analyze, organize, and draft—but destructive, irreversible, or sensitive actions require explicit human approval.
Permission Model
Capability Structure
Capabilities follow a resource:action pattern:
<resource>:<action>
Examples:
email:read
email:draft
email:send
calendar:read
calendar:create_draft
calendar:send_invite
git:commit
git:push
git:force_push
Permission Levels
| Level | Description | Example Actions |
|---|---|---|
| read | View/query data | Read emails, view calendar, list files |
| organize | Non-destructive mutations | Label, sort, archive, tag |
| draft | Create pending items | Compose email drafts, stage commits |
| execute | Perform actions | Send email, push code, transfer funds |
| admin | Destructive/irreversible | Delete, force push, revoke access |
Default Stance
By default, agents receive:
- ✅ All
readpermissions for their domain - ✅ All
organizepermissions for their domain - ✅ All
draftpermissions for their domain - ❌ No
executepermissions (must be explicitly granted) - ❌ No
adminpermissions (must be explicitly granted with additional confirmation)
Example: Email Integration
integration: email
agent: jarvis
capabilities:
granted:
- email:read # Read inbox, threads, attachments
- email:search # Search across mailbox
- email:organize # Label, archive, mark read/unread
- email:draft # Compose and save drafts
denied:
- email:send # Cannot send emails
- email:delete # Cannot permanently delete
requires_approval:
- email:send # Human must click "Send"
- email:delete # Human must confirm deletion
Workflow Example
User: "Reply to John's email about the meeting"
Agent Actions:
1. email:read → Reads John's email (allowed)
2. email:search → Finds related context (allowed)
3. email:draft → Composes reply draft (allowed)
4. email:send → BLOCKED
Agent Response:
"I've drafted a reply to John. Review it in your drafts
and click Send when ready."
[Link to draft in email client]
Example: Git Integration
integration: git
agent: code-assistant
capabilities:
granted:
- git:read # View repos, commits, diffs
- git:branch # Create/switch branches
- git:commit # Create commits (local)
- git:push_feature # Push to feature branches
denied:
- git:push_main # Cannot push to main/master
- git:force_push # Never force push
- git:delete_branch # Cannot delete branches
requires_approval:
- git:push_main # Requires PR approval
- git:merge # Requires code review
Example: Calendar Integration
integration: calendar
agent: jarvis
capabilities:
granted:
- calendar:read # View events, availability
- calendar:analyze # Find conflicts, suggest times
- calendar:draft # Create draft events
denied:
- calendar:send_invite # Cannot send invitations
- calendar:delete # Cannot delete events
- calendar:modify # Cannot modify existing events
requires_approval:
- calendar:send_invite # Human confirms before sending
- calendar:accept # Human confirms RSVP
Example: Financial Integration
integration: finance
agent: finance-assistant
capabilities:
granted:
- finance:read # View transactions, balances
- finance:categorize # Categorize transactions
- finance:report # Generate reports
- finance:draft # Prepare transfer requests
denied:
- finance:transfer # Cannot move money
- finance:pay # Cannot make payments
- finance:modify # Cannot edit transactions
requires_approval:
- finance:transfer # Multi-factor approval required
- finance:pay # Human must authorize
Example: Home Automation
integration: home
agent: jarvis
capabilities:
granted:
- home:read # View device states
- home:climate # Adjust thermostat
- home:lights # Control lighting
- home:media # Control entertainment
denied:
- home:unlock # Cannot unlock doors
- home:disarm # Cannot disarm security
- home:garage # Cannot open garage
requires_approval:
- home:unlock # Requires biometric + PIN
- home:disarm # Requires security code
Implementation Architecture
Capability Check Flow
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Agent │────▶│ Guard Rail │────▶│ Resource │
│ Request │ │ Gateway │ │ Service │
└─────────────┘ └──────────────┘ └─────────────┘
│
┌──────┴──────┐
▼ ▼
┌─────────┐ ┌──────────┐
│ Allowed │ │ Denied │
└─────────┘ └──────────┘
│ │
▼ ▼
┌─────────┐ ┌──────────┐
│ Execute │ │ Queue │
│ Action │ │ Approval │
└─────────┘ └──────────┘
Database Schema
-- Capability definitions
CREATE TABLE capabilities (
id UUID PRIMARY KEY,
resource VARCHAR(100) NOT NULL,
action VARCHAR(100) NOT NULL,
level VARCHAR(20) NOT NULL, -- read, organize, draft, execute, admin
description TEXT,
risk_level VARCHAR(20), -- low, medium, high, critical
UNIQUE(resource, action)
);
-- Agent capability grants
CREATE TABLE agent_capabilities (
id UUID PRIMARY KEY,
agent_id UUID REFERENCES agents(id),
capability_id UUID REFERENCES capabilities(id),
status VARCHAR(20) NOT NULL, -- granted, denied, requires_approval
granted_by UUID REFERENCES users(id),
granted_at TIMESTAMP,
expires_at TIMESTAMP,
conditions JSONB, -- Additional constraints
UNIQUE(agent_id, capability_id)
);
-- Approval queue for requires_approval capabilities
CREATE TABLE capability_approvals (
id UUID PRIMARY KEY,
agent_id UUID REFERENCES agents(id),
capability_id UUID REFERENCES capabilities(id),
request_context JSONB, -- What the agent wants to do
status VARCHAR(20), -- pending, approved, denied, expired
requested_at TIMESTAMP,
decided_at TIMESTAMP,
decided_by UUID REFERENCES users(id),
decision_reason TEXT
);
-- Audit log for all capability checks
CREATE TABLE capability_audit_log (
id UUID PRIMARY KEY,
agent_id UUID REFERENCES agents(id),
capability_id UUID REFERENCES capabilities(id),
result VARCHAR(20), -- allowed, denied, queued
context JSONB,
timestamp TIMESTAMP DEFAULT NOW()
);
API Design
// Check if agent has capability
async function checkCapability(
agentId: string,
resource: string,
action: string,
context?: Record<string, unknown>
): Promise<CapabilityResult> {
// Returns: { allowed: boolean, reason?: string, approvalId?: string }
}
// Request approval for blocked capability
async function requestApproval(
agentId: string,
resource: string,
action: string,
context: Record<string, unknown>
): Promise<ApprovalRequest> {
// Creates approval request, notifies user
}
// Grant capability to agent
async function grantCapability(
agentId: string,
capabilityId: string,
grantedBy: string,
options?: {
expiresAt?: Date;
conditions?: Record<string, unknown>;
}
): Promise<void>;
Configuration
Per-Integration Defaults
Each integration defines sensible defaults:
# integrations/email/defaults.yaml
integration: email
default_capabilities:
granted:
- email:read
- email:search
- email:organize
- email:draft
denied:
- email:send
- email:delete
requires_approval:
- email:send
Per-Agent Overrides
Users can customize per agent:
# agents/jarvis/capabilities.yaml
agent: jarvis
overrides:
email:
# Jarvis can send to known contacts
email:send:
status: granted
conditions:
recipient_in: known_contacts
# But still needs approval for new recipients
email:send_new:
status: requires_approval
User Experience
Approval Notifications
When an agent hits a requires_approval capability:
- Agent informs user what it wants to do
- Draft/preview created for user review
- Notification sent via preferred channel (app, email, SMS)
- User approves/denies with optional feedback
- Agent proceeds or adjusts based on decision
Approval UI
┌────────────────────────────────────────────────┐
│ 🤖 Jarvis needs your approval │
├────────────────────────────────────────────────┤
│ │
│ Action: Send email │
│ To: john.smith@example.com │
│ Subject: Re: Project Update │
│ │
│ ┌────────────────────────────────────────────┐ │
│ │ Hi John, │ │
│ │ │ │
│ │ Thanks for the update. I've reviewed... │ │
│ │ [Preview truncated - click to expand] │ │
│ └────────────────────────────────────────────┘ │
│ │
│ [Deny] [Edit Draft] [✓ Approve & Send] │
│ │
└────────────────────────────────────────────────┘
Security Considerations
Defense in Depth
Guard Rails are one layer of security:
- Authentication - Who is the agent?
- Authorization - What can the agent do? (Guard Rails)
- Rate Limiting - How often can they do it?
- Audit Logging - What did they do?
- Anomaly Detection - Is this behavior unusual?
Capability Escalation Prevention
- Agents cannot grant capabilities to themselves
- Agents cannot grant capabilities to other agents
- Capability grants require human authorization
- Critical capabilities require multi-factor confirmation
Time-Limited Grants
For sensitive operations, capabilities can be time-limited:
capability_grant:
agent: jarvis
capability: email:send
expires_in: 1h
max_uses: 5
reason: "Processing inbox backlog"
Future Enhancements
Contextual Permissions
Grant capabilities based on context:
email:send:
granted_when:
- recipient_in: known_contacts
- thread_initiated_by: user
- content_reviewed: true
denied_when:
- contains_sensitive_data: true
- recipient_is_external: true
Learning Mode
Track what approvals are commonly granted to suggest permission adjustments:
"You've approved 47 email sends from Jarvis to your team.
Would you like to auto-approve emails to @yourcompany.com?"
Delegation Chains
Allow users to delegate approval authority:
delegation:
from: jason
to: melanie
capabilities:
- calendar:send_invite
scope: family_calendar
expires: 2026-03-01