Track orchestrator agent task completions #372

Closed
opened 2026-02-15 05:28:40 +00:00 by jason.woltje · 1 comment
Owner

Summary

When the orchestrator dispatches a task to a coding harness (Claude Code, Codex CLI, OpenCode) and the task completes, emit a TaskCompletionEvent capturing the full execution context.

Context

The orchestrator manages agent task lifecycle: dispatch → monitor → collect results. This is the ideal point to capture end-to-end task metrics that the individual LLM call tracking (separate issue) cannot see — total duration across multiple LLM calls, retry behavior, quality gate enforcement results, and context window management.

Requirements

Event Fields to Capture

Field Source
task_type From task definition (implementation, debugging, etc.)
complexity From task definition or auto-assessed
harness Which CLI tool ran (claude_code, opencode, etc.)
model / provider From harness config
task_duration_ms Wall-clock: dispatch → completion
estimated_*_tokens From prediction API (pre-task)
actual_*_tokens From harness output/logs
estimated_cost_usd_micros From prediction API
actual_cost_usd_micros Computed from actual tokens
quality_gate_passed Orchestrator's quality coordinator result
quality_gates_run Which gates enforced (build, lint, test, typecheck, security)
quality_gates_failed Which gates failed
context_compactions From harness output
context_rotations From harness output
context_utilization_final From harness output (0.0-1.0)
outcome success, failure, partial, timeout
retry_count How many retries before final outcome
language Primary language of the task
repo_size_category tiny, small, medium, large, huge

Integration Points

  • Task completion handler in the coordinator
  • Retry logic (increment retry_count per attempt)
  • Quality gate enforcement (record gate results)
  • Timeout handler (outcome=timeout)
  • Killswitch activation (outcome=failure with context)

Acceptance Criteria

  • Every completed agent task emits a TaskCompletionEvent
  • Failed tasks tracked with appropriate outcome
  • Retried tasks increment retry_count
  • Quality gate results accurately recorded
  • Context window metrics captured from harness output
  • Token/cost estimates populated from predictions when available
  • Unit tests for event construction
  • No impact on task execution performance (non-blocking track)
## Summary When the orchestrator dispatches a task to a coding harness (Claude Code, Codex CLI, OpenCode) and the task completes, emit a `TaskCompletionEvent` capturing the full execution context. ## Context The orchestrator manages agent task lifecycle: dispatch → monitor → collect results. This is the ideal point to capture end-to-end task metrics that the individual LLM call tracking (separate issue) cannot see — total duration across multiple LLM calls, retry behavior, quality gate enforcement results, and context window management. ## Requirements ### Event Fields to Capture | Field | Source | |-------|--------| | `task_type` | From task definition (implementation, debugging, etc.) | | `complexity` | From task definition or auto-assessed | | `harness` | Which CLI tool ran (claude_code, opencode, etc.) | | `model` / `provider` | From harness config | | `task_duration_ms` | Wall-clock: dispatch → completion | | `estimated_*_tokens` | From prediction API (pre-task) | | `actual_*_tokens` | From harness output/logs | | `estimated_cost_usd_micros` | From prediction API | | `actual_cost_usd_micros` | Computed from actual tokens | | `quality_gate_passed` | Orchestrator's quality coordinator result | | `quality_gates_run` | Which gates enforced (build, lint, test, typecheck, security) | | `quality_gates_failed` | Which gates failed | | `context_compactions` | From harness output | | `context_rotations` | From harness output | | `context_utilization_final` | From harness output (0.0-1.0) | | `outcome` | success, failure, partial, timeout | | `retry_count` | How many retries before final outcome | | `language` | Primary language of the task | | `repo_size_category` | tiny, small, medium, large, huge | ### Integration Points - Task completion handler in the coordinator - Retry logic (increment retry_count per attempt) - Quality gate enforcement (record gate results) - Timeout handler (outcome=timeout) - Killswitch activation (outcome=failure with context) ## Acceptance Criteria - [ ] Every completed agent task emits a TaskCompletionEvent - [ ] Failed tasks tracked with appropriate outcome - [ ] Retried tasks increment retry_count - [ ] Quality gate results accurately recorded - [ ] Context window metrics captured from harness output - [ ] Token/cost estimates populated from predictions when available - [ ] Unit tests for event construction - [ ] No impact on task execution performance (non-blocking track)
jason.woltje added the ai label 2026-02-15 05:28:40 +00:00
jason.woltje added this to the M10-Telemetry (0.0.10) milestone 2026-02-15 05:31:19 +00:00
Author
Owner

Completed in commit 36e6cdd on feature/m10-telemetry. Added _emit_task_telemetry to both Coordinator and OrchestrationLoop with agent-to-telemetry field mapping, non-blocking fire-and-forget. Tests passing.

Completed in commit 36e6cdd on feature/m10-telemetry. Added _emit_task_telemetry to both Coordinator and OrchestrationLoop with agent-to-telemetry field mapping, non-blocking fire-and-forget. Tests passing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mosaic/stack#372