Files

ci/woodpecker/push/woodpecker Pipeline was successful

Details

feat(arch): Add Guard Rails capability-based permission system design

Guard Rails complement Quality Rails by controlling what agents can do:
- Capability-based permissions (resource:action pattern)
- Read/organize/draft allowed by default
- Execute/admin require explicit grants
- Human-in-the-loop approval for sensitive actions

Examples: email (read/draft ✅, send ❌), git (commit ✅, force push ❌)

Also:
- Add .admin-credentials and .env.bak.* to .gitignore

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-01 00:25:53 -06:00

13 KiB

Raw Blame History

Guard Rails: Capability-Based Permission System

Overview

Mosaic Stack implements two complementary safety systems:

System	Purpose	Scope
Quality Rails	Ensure output quality	Code reviews, linting, tests, token budgets
Guard Rails	Control agent capabilities	What agents CAN and CANNOT do

This document describes the Guard Rails system—a capability-based permission model that limits what agents, integrations, and plugins can do within the platform.

Core Principle

Prepare freely, execute with approval.

Agents should be able to read, analyze, organize, and draft—but destructive, irreversible, or sensitive actions require explicit human approval.

Permission Model

Capability Structure

Capabilities follow a resource:action pattern:

<resource>:<action>

Examples:
  email:read
  email:draft
  email:send
  calendar:read
  calendar:create_draft
  calendar:send_invite
  git:commit
  git:push
  git:force_push

Permission Levels

Level	Description	Example Actions
read	View/query data	Read emails, view calendar, list files
organize	Non-destructive mutations	Label, sort, archive, tag
draft	Create pending items	Compose email drafts, stage commits
execute	Perform actions	Send email, push code, transfer funds
admin	Destructive/irreversible	Delete, force push, revoke access

Default Stance

By default, agents receive:

✅ All read permissions for their domain
✅ All organize permissions for their domain
✅ All draft permissions for their domain
❌ No execute permissions (must be explicitly granted)
❌ No admin permissions (must be explicitly granted with additional confirmation)

Example: Email Integration

integration: email
agent: jarvis

capabilities:
  granted:
    - email:read # Read inbox, threads, attachments
    - email:search # Search across mailbox
    - email:organize # Label, archive, mark read/unread
    - email:draft # Compose and save drafts

  denied:
    - email:send # Cannot send emails
    - email:delete # Cannot permanently delete

  requires_approval:
    - email:send # Human must click "Send"
    - email:delete # Human must confirm deletion

Workflow Example

User: "Reply to John's email about the meeting"

Agent Actions:
1. email:read     → Reads John's email (allowed)
2. email:search   → Finds related context (allowed)
3. email:draft    → Composes reply draft (allowed)
4. email:send     → BLOCKED

Agent Response:
"I've drafted a reply to John. Review it in your drafts
and click Send when ready."

[Link to draft in email client]

Example: Git Integration

integration: git
agent: code-assistant

capabilities:
  granted:
    - git:read # View repos, commits, diffs
    - git:branch # Create/switch branches
    - git:commit # Create commits (local)
    - git:push_feature # Push to feature branches

  denied:
    - git:push_main # Cannot push to main/master
    - git:force_push # Never force push
    - git:delete_branch # Cannot delete branches

  requires_approval:
    - git:push_main # Requires PR approval
    - git:merge # Requires code review

Example: Calendar Integration

integration: calendar
agent: jarvis

capabilities:
  granted:
    - calendar:read # View events, availability
    - calendar:analyze # Find conflicts, suggest times
    - calendar:draft # Create draft events

  denied:
    - calendar:send_invite # Cannot send invitations
    - calendar:delete # Cannot delete events
    - calendar:modify # Cannot modify existing events

  requires_approval:
    - calendar:send_invite # Human confirms before sending
    - calendar:accept # Human confirms RSVP

Example: Financial Integration

integration: finance
agent: finance-assistant

capabilities:
  granted:
    - finance:read # View transactions, balances
    - finance:categorize # Categorize transactions
    - finance:report # Generate reports
    - finance:draft # Prepare transfer requests

  denied:
    - finance:transfer # Cannot move money
    - finance:pay # Cannot make payments
    - finance:modify # Cannot edit transactions

  requires_approval:
    - finance:transfer # Multi-factor approval required
    - finance:pay # Human must authorize

Example: Home Automation

integration: home
agent: jarvis

capabilities:
  granted:
    - home:read # View device states
    - home:climate # Adjust thermostat
    - home:lights # Control lighting
    - home:media # Control entertainment

  denied:
    - home:unlock # Cannot unlock doors
    - home:disarm # Cannot disarm security
    - home:garage # Cannot open garage

  requires_approval:
    - home:unlock # Requires biometric + PIN
    - home:disarm # Requires security code

Implementation Architecture

Capability Check Flow

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Agent     │────▶│  Guard Rail  │────▶│  Resource   │
│  Request    │     │   Gateway    │     │   Service   │
└─────────────┘     └──────────────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    ▼             ▼
              ┌─────────┐   ┌──────────┐
              │ Allowed │   │  Denied  │
              └─────────┘   └──────────┘
                    │             │
                    ▼             ▼
              ┌─────────┐   ┌──────────┐
              │ Execute │   │  Queue   │
              │ Action  │   │ Approval │
              └─────────┘   └──────────┘

Database Schema

-- Capability definitions
CREATE TABLE capabilities (
  id UUID PRIMARY KEY,
  resource VARCHAR(100) NOT NULL,
  action VARCHAR(100) NOT NULL,
  level VARCHAR(20) NOT NULL,  -- read, organize, draft, execute, admin
  description TEXT,
  risk_level VARCHAR(20),      -- low, medium, high, critical
  UNIQUE(resource, action)
);

-- Agent capability grants
CREATE TABLE agent_capabilities (
  id UUID PRIMARY KEY,
  agent_id UUID REFERENCES agents(id),
  capability_id UUID REFERENCES capabilities(id),
  status VARCHAR(20) NOT NULL,  -- granted, denied, requires_approval
  granted_by UUID REFERENCES users(id),
  granted_at TIMESTAMP,
  expires_at TIMESTAMP,
  conditions JSONB,             -- Additional constraints
  UNIQUE(agent_id, capability_id)
);

-- Approval queue for requires_approval capabilities
CREATE TABLE capability_approvals (
  id UUID PRIMARY KEY,
  agent_id UUID REFERENCES agents(id),
  capability_id UUID REFERENCES capabilities(id),
  request_context JSONB,        -- What the agent wants to do
  status VARCHAR(20),           -- pending, approved, denied, expired
  requested_at TIMESTAMP,
  decided_at TIMESTAMP,
  decided_by UUID REFERENCES users(id),
  decision_reason TEXT
);

-- Audit log for all capability checks
CREATE TABLE capability_audit_log (
  id UUID PRIMARY KEY,
  agent_id UUID REFERENCES agents(id),
  capability_id UUID REFERENCES capabilities(id),
  result VARCHAR(20),           -- allowed, denied, queued
  context JSONB,
  timestamp TIMESTAMP DEFAULT NOW()
);

API Design

// Check if agent has capability
async function checkCapability(
  agentId: string,
  resource: string,
  action: string,
  context?: Record<string, unknown>
): Promise<CapabilityResult> {
  // Returns: { allowed: boolean, reason?: string, approvalId?: string }
}

// Request approval for blocked capability
async function requestApproval(
  agentId: string,
  resource: string,
  action: string,
  context: Record<string, unknown>
): Promise<ApprovalRequest> {
  // Creates approval request, notifies user
}

// Grant capability to agent
async function grantCapability(
  agentId: string,
  capabilityId: string,
  grantedBy: string,
  options?: {
    expiresAt?: Date;
    conditions?: Record<string, unknown>;
  }
): Promise<void>;

Configuration

Per-Integration Defaults

Each integration defines sensible defaults:

# integrations/email/defaults.yaml
integration: email
default_capabilities:
  granted:
    - email:read
    - email:search
    - email:organize
    - email:draft
  denied:
    - email:send
    - email:delete
  requires_approval:
    - email:send

Per-Agent Overrides

Users can customize per agent:

# agents/jarvis/capabilities.yaml
agent: jarvis
overrides:
  email:
    # Jarvis can send to known contacts
    email:send:
      status: granted
      conditions:
        recipient_in: known_contacts
    # But still needs approval for new recipients
    email:send_new:
      status: requires_approval

User Experience

Approval Notifications

When an agent hits a requires_approval capability:

Agent informs user what it wants to do
Draft/preview created for user review
Notification sent via preferred channel (app, email, SMS)
User approves/denies with optional feedback
Agent proceeds or adjusts based on decision

Approval UI

┌────────────────────────────────────────────────┐
│ 🤖 Jarvis needs your approval                  │
├────────────────────────────────────────────────┤
│                                                │
│ Action: Send email                             │
│ To: john.smith@example.com                     │
│ Subject: Re: Project Update                    │
│                                                │
│ ┌────────────────────────────────────────────┐ │
│ │ Hi John,                                   │ │
│ │                                            │ │
│ │ Thanks for the update. I've reviewed...    │ │
│ │ [Preview truncated - click to expand]      │ │
│ └────────────────────────────────────────────┘ │
│                                                │
│  [Deny]  [Edit Draft]  [✓ Approve & Send]     │
│                                                │
└────────────────────────────────────────────────┘

Security Considerations

Defense in Depth

Guard Rails are one layer of security:

Authentication - Who is the agent?
Authorization - What can the agent do? (Guard Rails)
Rate Limiting - How often can they do it?
Audit Logging - What did they do?
Anomaly Detection - Is this behavior unusual?

Capability Escalation Prevention

Agents cannot grant capabilities to themselves
Agents cannot grant capabilities to other agents
Capability grants require human authorization
Critical capabilities require multi-factor confirmation

Time-Limited Grants

For sensitive operations, capabilities can be time-limited:

capability_grant:
  agent: jarvis
  capability: email:send
  expires_in: 1h
  max_uses: 5
  reason: "Processing inbox backlog"

Future Enhancements

Contextual Permissions

Grant capabilities based on context:

email:send:
  granted_when:
    - recipient_in: known_contacts
    - thread_initiated_by: user
    - content_reviewed: true
  denied_when:
    - contains_sensitive_data: true
    - recipient_is_external: true

Learning Mode

Track what approvals are commonly granted to suggest permission adjustments:

"You've approved 47 email sends from Jarvis to your team.
Would you like to auto-approve emails to @yourcompany.com?"

Delegation Chains

Allow users to delegate approval authority:

delegation:
  from: jason
  to: melanie
  capabilities:
    - calendar:send_invite
  scope: family_calendar
  expires: 2026-03-01

13 KiB Raw Blame History