Initial commit from agent

This commit is contained in:
2026-03-24 00:11:34 -05:00
commit 0c777488d3
69 changed files with 4253 additions and 0 deletions

View File

@@ -0,0 +1,45 @@
# Observability and Operability
## Purpose
Make systems easier to understand, debug, and run by improving signals, diagnostics, and operational readiness around important behavior.
## When to use
- A system is hard to diagnose in production or staging
- New functionality needs useful logs, metrics, traces, or alerts
- Operational ownership is unclear during failures or rollout
- Reliability work needs better visibility before deeper changes
## Inputs to gather
- Critical workflows, failure modes, and current diagnostic signals
- Existing logging, metrics, tracing, dashboards, and alerts
- Operator needs during rollout, incident response, and debugging
- Noise constraints and performance or cost considerations
## How to work
- Instrument the questions a responder will need answered during failure.
- Prefer signals tied to user-impacting behavior over vanity metrics.
- Make logs structured and actionable when possible.
- Add observability close to important boundaries and state transitions.
- Keep signal quality high by avoiding low-value noise.
## Output expectations
- Improved observability or an operability plan for the target area
- Clear explanation of what new signals reveal
- Notes on alerting, dashboard, or rollout support when relevant
## Quality checklist
- Signals help detect and diagnose meaningful failures.
- Instrumentation is focused and not excessively noisy.
- Operational usage is considered, not just implementation convenience.
- Added visibility maps to critical user or system outcomes.
## Handoff notes
- Mention what incidents or debugging tasks the new observability should make easier.
- Pair with debugging workflow, incident response, or performance optimization when diagnosis is the main bottleneck.