2 Commits

Author SHA1 Message Date
6337182226 feat: Add enterprise system resilience and graceful degradation
Resolves CRITICAL #1 from code-health-report-2026-04-13.md

Changes:
- Add tenacity dependency for retry logic
- Create lib/resilience.py with:
  - resilient_http_call decorator (3 retries, exponential backoff 2s→4s→8s)
  - CircuitBreaker class (opens after 5 consecutive failures)
  - handle_404_gracefully decorator for safe resource lookups
- Apply retry decorators to all HTTP clients:
  - workday_client.py: get(), raas()
  - entra_client.py: get(), get_all_pages()
  - helix_client.py: get(), post()
  - intune_client.py: get()
  - lansweeper_client.py: gql()
  - fedex_client.py: post()
- Add graceful degradation to audit tools:
  - audit_user_drift(): Wrap Workday, AD, Entra calls separately
  - audit_device_drift(): Wrap Lansweeper, Intune, Helix calls separately
  - Both now return systems_available and systems_failed fields
- Create check_system_health() tool for proactive monitoring
- Add comprehensive unit tests for resilience module

Benefits:
- HTTP clients now automatically retry transient failures (5xx, timeouts)
- Circuit breaker prevents hammering failing services (fast-fail after threshold)
- Audit tools continue with partial data if some systems unavailable
- Health check tool enables proactive system monitoring before bulk audits
2026-04-13 10:54:06 -04:00
b23cd1f2e2 Added new 'feature-add' prompt 2026-04-13 10:34:49 -04:00