Co-authored-by: Jason Woltje <jason@diversecanvas.com> Co-committed-by: Jason Woltje <jason@diversecanvas.com>
190 lines
5.0 KiB
Markdown
190 lines
5.0 KiB
Markdown
# Why Hard Rails Matter
|
|
|
|
## The Problem We Discovered
|
|
|
|
In AI-assisted development, we found:
|
|
|
|
1. **Process adherence fails** - Agents claim to do code review but miss critical issues
|
|
2. **Manual review insufficient** - Even AI-assisted review missed hardcoded passwords, SQL injection
|
|
3. **Scale breaks quality** - 50 issues in a single patch release despite explicit QA processes
|
|
|
|
### Real-World Case Study
|
|
|
|
**Production patch validation:**
|
|
|
|
After explicit code review and QA processes, we discovered **50 issues**:
|
|
|
|
**Security Issues (9):**
|
|
- 4 hardcoded passwords committed to repository
|
|
- 1 SQL injection vulnerability
|
|
- World-readable .env files
|
|
- XSS vulnerabilities (CSP unsafe-inline)
|
|
|
|
**Type Safety Issues (11):**
|
|
- TypeScript strict mode DISABLED (`"strict": false`)
|
|
- ESLint explicitly ALLOWING any types (`no-explicit-any: 'off'`)
|
|
- Missing return types
|
|
- Type assertion overuse
|
|
|
|
**Silent Failures (9):**
|
|
- Errors swallowed in try/catch blocks
|
|
- Functions returning wrong types on error
|
|
- No error logging
|
|
- Network failures treated as false instead of errors
|
|
|
|
**Test Coverage Gaps (10):**
|
|
- No test coverage requirements
|
|
- No testing framework setup
|
|
- Code shipped with 0% coverage
|
|
|
|
**Build Failures (2):**
|
|
- Code committed that doesn't compile
|
|
- Tests committed that fail
|
|
|
|
**Dependency Issues (6):**
|
|
- Critical CVEs not caught
|
|
- Version conflicts between packages
|
|
|
|
## The Solution: Mechanical Enforcement
|
|
|
|
Don't **ask** agents to:
|
|
- "Please do code review"
|
|
- "Make sure to run tests"
|
|
- "Check for security issues"
|
|
|
|
Instead, **BLOCK** commits that:
|
|
- Have type errors
|
|
- Contain hardcoded secrets
|
|
- Don't pass tests
|
|
- Have security vulnerabilities
|
|
|
|
### Why This Works
|
|
|
|
**Example: Type Safety**
|
|
|
|
❌ **Process-based (fails):**
|
|
```
|
|
Human: "Please avoid using 'any' types"
|
|
Agent: "I'll make sure to use proper types"
|
|
*Agent uses any types anyway*
|
|
```
|
|
|
|
✅ **Mechanically enforced (works):**
|
|
```
|
|
Agent writes: const x: any = 123;
|
|
Git hook runs: ❌ Error: no-explicit-any
|
|
Commit blocked
|
|
Agent must fix to proceed
|
|
```
|
|
|
|
The agent doesn't get to **claim** it followed the process. The automated gate **determines** if code is acceptable.
|
|
|
|
## Design Principles
|
|
|
|
### 1. Fail Fast
|
|
|
|
Detect issues at commit time, not in CI, not in code review, not in production.
|
|
|
|
**Timeline:**
|
|
- ⚡ Commit time: Type errors, lint errors, secrets → **BLOCKED**
|
|
- 🔄 CI time: Build failures, test failures, CVEs → **BLOCKED**
|
|
- 👀 Code review: Architecture, design, business logic
|
|
- 🚀 Production: (Issues should never reach here)
|
|
|
|
### 2. Non-Negotiable
|
|
|
|
No agent can bypass enforcement. No "skip hooks" flag. No emergency override.
|
|
|
|
If the code doesn't pass gates, it doesn't get committed. Period.
|
|
|
|
### 3. Portable
|
|
|
|
Same enforcement across:
|
|
- All projects
|
|
- All developers (human + AI)
|
|
- All environments (local, CI, production)
|
|
|
|
### 4. Minimal Friction
|
|
|
|
Auto-fix where possible:
|
|
- Prettier formats code automatically
|
|
- ESLint --fix corrects simple issues
|
|
- Only block when can't auto-fix
|
|
|
|
### 5. Clear Feedback
|
|
|
|
When enforcement blocks a commit, tell the agent:
|
|
- ❌ What's wrong (type error, lint violation, etc.)
|
|
- 📍 Where it is (file:line)
|
|
- ✅ How to fix it (expected type, remove 'any', etc.)
|
|
|
|
## Impact Prediction
|
|
|
|
Based on a 50-issue production analysis:
|
|
|
|
| Phase | Enforcement | Issues Prevented |
|
|
|-------|-------------|------------------|
|
|
| **Phase 1** | Pre-commit + strict mode + ESLint | 25 of 50 (50%) |
|
|
| **Phase 2** | + CI expansion + npm audit | 35 of 50 (70%) |
|
|
| **Phase 3** | + OWASP + coverage gates | 45 of 50 (90%) |
|
|
|
|
**The remaining 10%** require human judgment:
|
|
- Architecture decisions
|
|
- Business logic correctness
|
|
- User experience
|
|
- Performance optimization
|
|
|
|
## Agent Behavior Evolution
|
|
|
|
### Before Quality Rails
|
|
```
|
|
Agent: "I've completed the feature and run all tests"
|
|
Reality: Code has type errors, no tests written, hardcoded password
|
|
Result: 50 issues discovered in code review
|
|
```
|
|
|
|
### After Quality Rails
|
|
```
|
|
Agent writes code with 'any' type
|
|
Git hook: ❌ no-explicit-any
|
|
Agent rewrites with proper type
|
|
Git hook: ✅ Pass
|
|
|
|
Agent writes code with hardcoded password
|
|
Git hook: ❌ Secret detected
|
|
Agent moves to environment variable
|
|
Git hook: ✅ Pass
|
|
|
|
Agent commits without tests
|
|
CI: ❌ Coverage below 80%
|
|
Agent writes tests
|
|
CI: ✅ Pass
|
|
```
|
|
|
|
**The agent learns:** Good code passes gates, bad code is rejected.
|
|
|
|
## Why This Matters for AI Development
|
|
|
|
AI agents are **deterministically bad** at self-enforcement:
|
|
- They claim to follow processes
|
|
- They **believe** they're following processes
|
|
- Output proves otherwise
|
|
|
|
But AI agents are **good** at responding to mechanical feedback:
|
|
- Clear error messages
|
|
- Specific line numbers
|
|
- Concrete fix requirements
|
|
|
|
Quality Rails exploits this strength and avoids the weakness.
|
|
|
|
## Conclusion
|
|
|
|
**Process compliance:** Agents claim → Output fails
|
|
**Mechanical enforcement:** Gates determine → Output succeeds
|
|
|
|
This is not philosophical. This is pragmatic. Based on 50 real issues from production code.
|
|
|
|
Quality Rails exists because **process-based quality doesn't work at scale with AI agents.**
|
|
|
|
Mechanical enforcement does.
|