feat: Add enterprise system resilience and graceful degradation
Resolves CRITICAL #1 from code-health-report-2026-04-13.md
Changes:
- Add tenacity dependency for retry logic
- Create lib/resilience.py with:
- resilient_http_call decorator (3 retries, exponential backoff 2s→4s→8s)
- CircuitBreaker class (opens after 5 consecutive failures)
- handle_404_gracefully decorator for safe resource lookups
- Apply retry decorators to all HTTP clients:
- workday_client.py: get(), raas()
- entra_client.py: get(), get_all_pages()
- helix_client.py: get(), post()
- intune_client.py: get()
- lansweeper_client.py: gql()
- fedex_client.py: post()
- Add graceful degradation to audit tools:
- audit_user_drift(): Wrap Workday, AD, Entra calls separately
- audit_device_drift(): Wrap Lansweeper, Intune, Helix calls separately
- Both now return systems_available and systems_failed fields
- Create check_system_health() tool for proactive monitoring
- Add comprehensive unit tests for resilience module
Benefits:
- HTTP clients now automatically retry transient failures (5xx, timeouts)
- Circuit breaker prevents hammering failing services (fast-fail after threshold)
- Audit tools continue with partial data if some systems unavailable
- Health check tool enables proactive system monitoring before bulk audits