Files
jarvis/skills/software/observability-operability.md
2026-03-24 00:11:34 -05:00

1.7 KiB

Observability and Operability

Purpose

Make systems easier to understand, debug, and run by improving signals, diagnostics, and operational readiness around important behavior.

When to use

  • A system is hard to diagnose in production or staging
  • New functionality needs useful logs, metrics, traces, or alerts
  • Operational ownership is unclear during failures or rollout
  • Reliability work needs better visibility before deeper changes

Inputs to gather

  • Critical workflows, failure modes, and current diagnostic signals
  • Existing logging, metrics, tracing, dashboards, and alerts
  • Operator needs during rollout, incident response, and debugging
  • Noise constraints and performance or cost considerations

How to work

  • Instrument the questions a responder will need answered during failure.
  • Prefer signals tied to user-impacting behavior over vanity metrics.
  • Make logs structured and actionable when possible.
  • Add observability close to important boundaries and state transitions.
  • Keep signal quality high by avoiding low-value noise.

Output expectations

  • Improved observability or an operability plan for the target area
  • Clear explanation of what new signals reveal
  • Notes on alerting, dashboard, or rollout support when relevant

Quality checklist

  • Signals help detect and diagnose meaningful failures.
  • Instrumentation is focused and not excessively noisy.
  • Operational usage is considered, not just implementation convenience.
  • Added visibility maps to critical user or system outcomes.

Handoff notes

  • Mention what incidents or debugging tasks the new observability should make easier.
  • Pair with debugging workflow, incident response, or performance optimization when diagnosis is the main bottleneck.