From a23a8581eef35a3309edc5f90df2b4a250144e5b Mon Sep 17 00:00:00 2001 From: nathan Date: Sat, 11 Apr 2026 23:56:43 -0400 Subject: [PATCH] docs: reorganize documentation into KBAs/ and SOPs/ subdirectories - documentation/KBAs/: Created subdirectory for Knowledge Base Articles - documentation/SOPs/: Created subdirectory for Standard Operating Procedures - documentation/README.md: Updated to reflect new structure with section descriptions - Moved KBA-001 to KBAs/ folder - Created SOP-001 (Migrate Stack from UI to Git) in SOPs/ folder - Fixed all cross-reference links to use correct relative paths (../) Improves documentation organization by separating troubleshooting guides (KBAs) from procedural guides (SOPs), making it easier to navigate and maintain the knowledge base as it grows. --- ...Komodo-GitOps-Stack-Deployment-Failures.md | 3 +- documentation/README.md | 34 +- .../SOP-001-Migrate-Stack-from-UI-to-Git.md | 464 ++++++++++++++++++ 3 files changed, 489 insertions(+), 12 deletions(-) rename documentation/{ => KBAs}/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md (97%) create mode 100644 documentation/SOPs/SOP-001-Migrate-Stack-from-UI-to-Git.md diff --git a/documentation/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md b/documentation/KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md similarity index 97% rename from documentation/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md rename to documentation/KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md index 5615003..b74e796 100644 --- a/documentation/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md +++ b/documentation/KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md @@ -222,8 +222,7 @@ docker exec komodo-periphery-{node-name} cp \ ## Related Documentation -- [TECHNICAL_RUNBOOK.md](TECHNICAL_RUNBOOK.md) - Infrastructure overview and emergency procedures -- [LLM_HANDOVER.md](LLM_HANDOVER.md) - Quick-start context for AI sessions +- [TECHNICAL_RUNBOOK.md](../TECHNICAL_RUNBOOK.md) - Infrastructure overview and emergency procedures - Repository Memory: `/memories/repo/critical-fixes.md` --- diff --git a/documentation/README.md b/documentation/README.md index d704fb0..657a272 100644 --- a/documentation/README.md +++ b/documentation/README.md @@ -15,20 +15,33 @@ This directory contains all technical documentation for the Castaldi Family Home --- -## Knowledge Base Articles (KBA) +## Knowledge Base Articles (KBAs/) + +Structured troubleshooting articles following the incident → resolution format. ### GitOps & Deployment -- **[KBA-001: Komodo GitOps Stack Deployment Failures](KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md)** +- **[KBA-001: Komodo GitOps Stack Deployment Failures](KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md)** Troubleshooting guide for Git-linked stack pull/deploy failures, canonicalize errors, and Docker image tag issues. --- +## Standard Operating Procedures (SOPs/) + +Step-by-step guides for operational tasks and migrations. + +### Stack Management + +- **[SOP-001: Migrate Stack from UI to Git](SOPs/SOP-001-Migrate-Stack-from-UI-to-Git.md)** + Complete guide for converting Komodo stacks from UI-defined to Git-based deployment, including secrets management and verification steps. + +--- + ## Document Conventions -- **KBA-XXX:** Structured troubleshooting articles following the incident → resolution format -- **Runbooks:** Procedural guides for operations and emergency response -- **Handover Documents:** Context summaries for knowledge transfer +- **KBA-XXX:** Troubleshooting articles with clear problem/solution format (stored in `KBAs/`) +- **SOP-XXX:** Procedural guides for operational tasks (stored in `SOPs/`) +- **Runbooks:** Infrastructure reference and emergency procedures (root level) --- @@ -36,11 +49,12 @@ This directory contains all technical documentation for the Castaldi Family Home When documenting new issues or procedures: -1. **KBA Format:** Use for troubleshooting scenarios with clear problem/solution -2. **Update Runbook:** Add new emergency procedures to TECHNICAL_RUNBOOK.md -3. **Update Repository Memory:** Store critical lessons in `/memories/repo/` -4. **Commit Messages:** Use conventional commits (e.g., `docs: Add KBA-002 for...`) +1. **KBAs:** Create in `KBAs/` folder for troubleshooting scenarios with clear diagnosis → resolution +2. **SOPs:** Create in `SOPs/` folder for repeatable operational procedures and migrations +3. **Update Runbook:** Add new emergency procedures to TECHNICAL_RUNBOOK.md +4. **Update Repository Memory:** Store critical lessons in `/memories/repo/` +5. **Commit Messages:** Use conventional commits (e.g., `docs(kba): add KBA-002 for...`) --- -**Last Updated:** April 12, 2026 +**Last Updated:** April 11, 2026 diff --git a/documentation/SOPs/SOP-001-Migrate-Stack-from-UI-to-Git.md b/documentation/SOPs/SOP-001-Migrate-Stack-from-UI-to-Git.md new file mode 100644 index 0000000..e98699b --- /dev/null +++ b/documentation/SOPs/SOP-001-Migrate-Stack-from-UI-to-Git.md @@ -0,0 +1,464 @@ +# SOP-001: Migrate Komodo Stack from UI-Defined to Git-Based + +**Status:** Active +**Created:** April 11, 2026 +**Last Updated:** April 11, 2026 +**Owner:** Nathan Castaldi +**Applies To:** All Komodo-managed Docker Compose stacks + +--- + +## Purpose + +Convert an existing Komodo stack from UI-defined (manual configuration) to Git-based (GitOps workflow) to enable: +- Version control and change tracking +- Automated deployments via webhooks +- Easier rollback and disaster recovery +- Multi-environment consistency + +--- + +## Prerequisites + +### Required Access + +- [ ] SSH access to the target node (e.g., `waldorf`, `heimdall`, `watchtower`) +- [ ] Komodo UI access (`komodo.castaldifamily.com`) +- [ ] Gitea repository access (`git.castaldifamily.com/nathan/homelab`) +- [ ] Git configured on your local machine + +### Required Infrastructure + +- [ ] Komodo Periphery has `/etc/komodo/repos` volume mounted (see [KBA-001](../KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md)) +- [ ] Gitea webhooks configured and verified +- [ ] NFS mounts operational (if using shared storage) + +--- + +## Pre-Migration Data Collection + +Before making any changes, **capture the current stack configuration** from Komodo UI: + +### Step 1: Export Stack Configuration + +1. **Navigate to Komodo UI** → Stacks → Select your stack +2. **Copy the following information:** + + | Field | Value | Notes | + |-------|-------|-------| + | **Stack Name** | _______________ | (e.g., `plex`, `tunarr`) | + | **Node Name** | _______________ | (e.g., `waldorf`, `heimdall`) | + | **Container Name(s)** | _______________ | From compose services | + | **Image(s)** | _______________ | **Critical:** Note exact tag | + | **Ports** | _______________ | Host:Container mappings | + | **Volumes** | _______________ | Full paths (host → container) | + | **Environment Variables** | _______________ | **Secrets go here, NOT in Git** | + | **Network Mode** | _______________ | (bridge/host/custom) | + | **Restart Policy** | _______________ | (unless-stopped/always/no) | + | **Labels** | _______________ | Traefik, Komodo, etc. | + | **Devices** | _______________ | GPU passthrough, /dev/dri, etc. | + +3. **Copy the complete `compose.yaml`** from the stack editor +4. **Take a screenshot** of the environment variables section (contains secrets) + +--- + +## Migration Procedure + +### Step 2: Create Repository Directory Structure + +On your **local machine** (or via Working Copy on iPad): + +```bash +# Navigate to homelab repo +cd ~/homelab # Or your local path + +# Create the directory structure +mkdir -p nodes/{node-name}/{stack-name} + +# Example: +mkdir -p nodes/waldorf/sonarr +``` + +**Directory naming convention:** +- Node name: `heimdall`, `waldorf`, `watchtower` +- Stack name: Lowercase, matches service (e.g., `plex`, `tunarr`, `sonarr`) + +--- + +### Step 3: Create compose.yaml + +Create `nodes/{node-name}/{stack-name}/compose.yaml` with the **sanitized** configuration: + +```yaml +services: + {service-name}: + image: {registry}/{image}:{tag} # ⚠️ NO 'v' prefix on tag + container_name: {container-name} + restart: unless-stopped # or 'always' + ports: + - {host-port}:{container-port} + environment: + - TZ=America/New_York + # ⚠️ DO NOT store passwords/tokens here + # Use Komodo UI Environment Variables instead + volumes: + - /mnt/appdata/{service}/config:/config + - /mnt/media/data:/data # If applicable + # Optional: GPU passthrough + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: 1 + capabilities: [gpu] + # Optional: Device passthrough (Intel QuickSync, etc.) + devices: + - /dev/dri:/dev/dri + # Optional: Traefik labels + labels: + - "traefik.enable=true" + - "traefik.http.routers.{service}.rule=Host(`{service}.castaldifamily.com`)" + - "komodo.managed=true" +``` + +**⚠️ Critical Configuration Rules:** + +1. **Image Tags:** + - ✅ Correct: `image: chrisbenincasa/tunarr:1.2.11` + - ❌ Wrong: `image: chrisbenincasa/tunarr:v1.2.11` (no `v` prefix) + +2. **Secrets Management:** + - ❌ **NEVER** commit passwords, API keys, or claim tokens to Git + - ✅ Use Komodo Stack Environment Variables for secrets + - ✅ Use placeholders in compose: `- PLEX_CLAIM=${PLEX_CLAIM}` + +3. **Volume Paths:** + - Use absolute paths starting with `/mnt/appdata/` + - Ensure paths exist on the target node + - Match UID/GID permissions (usually `1000:1000`) + +--- + +### Step 4: Commit and Push to Git + +```bash +# Stage the new files +git add nodes/{node-name}/{stack-name}/ + +# Commit with descriptive message +git commit -m "feat(stacks): migrate {stack-name} to Git-based deployment + +- Created compose.yaml for {node-name}/{stack-name} +- Extracted configuration from Komodo UI +- Secrets managed via Komodo Environment Variables" + +# Push to Gitea +git push origin main +``` + +--- + +### Step 5: Reconfigure Stack in Komodo UI + +1. **Navigate to Komodo UI** → Stacks → Select your stack +2. **Change stack type:** + - Click **Edit Stack** (gear icon) + - Change **Source Type** from `Manual` to `Git Repo` + +3. **Configure Git settings:** + + | Field | Value | + |-------|-------| + | **Repo** | `homelab` | + | **Branch** | `main` | + | **Run Directory** | `nodes/{node-name}/{stack-name}` | + | **File Paths** | *(leave blank - uses compose.yaml by default)* | + +4. **Re-add Environment Variables (Secrets):** + - Click **Environment Variables** tab + - Add back any secrets from your pre-migration notes: + ``` + PLEX_CLAIM=claim-xxxxxxxxx + SONARR_API_KEY=xxxxxxxxxxxxxxxx + ``` + +5. **Save Configuration** + +--- + +### Step 6: Deploy and Verify + +1. **Pull from Git:** + - Click **Pull Stack** button + - Wait for success notification + - Verify "Last pulled" timestamp updated + +2. **Deploy Stack:** + - Click **Deploy Stack** button + - Monitor deployment logs in Komodo UI + - Wait for "Running" status + +3. **Verify on Target Node:** + + ```bash + # SSH to the node + ssh chester@10.0.0.{node-ip} + + # Check container status + docker ps | grep {container-name} + + # Check logs for errors + docker logs {container-name} --tail 50 + + # Verify volumes mounted correctly + docker inspect {container-name} | grep -A 10 "Mounts" + + # If GPU passthrough required: + docker exec {container-name} nvidia-smi + ``` + +4. **Functional Testing:** + - Access the service via browser/app + - Verify data persistence (configs, databases) + - Test core functionality (playback, API calls, etc.) + +--- + +## Verification Checklist + +After migration, confirm: + +- [ ] Container is running (`docker ps`) +- [ ] Service accessible via web UI/API +- [ ] Data persists across container restart +- [ ] Environment variables applied correctly +- [ ] GPU accessible (if applicable): `docker exec {name} nvidia-smi` +- [ ] Logs show no mount/permission errors +- [ ] Traefik routing works (if applicable) +- [ ] Auto-deployment triggers on Git push + +--- + +## Rollback Procedure + +If migration fails, revert to UI-defined stack: + +1. **In Komodo UI:** + - Edit Stack → Change **Source Type** back to `Manual` + - Paste original compose.yaml from pre-migration backup + - Re-add environment variables + - Deploy stack + +2. **On Target Node (Emergency):** + ```bash + # Navigate to stack directory + cd /etc/komodo/stacks/{stack-name} + + # Manually restore compose.yaml + nano compose.yaml # Paste backup content + + # Restart manually + docker compose down + docker compose up -d + ``` + +--- + +## Post-Migration Tasks + +### Test Auto-Deployment + +1. Make a minor change to the compose file (e.g., update comment) +2. Commit and push to Git +3. Verify Komodo auto-pulls and redeploys (check timestamps) +4. If auto-deploy doesn't work, manually click "Pull" → "Deploy" + +### Update Documentation + +- [ ] Add stack to node README (e.g., `nodes/waldorf/README.md`) +- [ ] Document any special configuration requirements +- [ ] Update infrastructure diagrams if necessary + +--- + +## Common Issues + +### Issue: "Image not found" after deploy + +**Cause:** Docker tag has `v` prefix (e.g., `v1.2.11` instead of `1.2.11`) + +**Fix:** +```yaml +# In compose.yaml, remove 'v' prefix +image: user/app:1.2.11 # Not v1.2.11 +``` + +--- + +### Issue: Environment variables not applied + +**Cause:** Secrets defined in Git-based compose.yaml instead of Komodo UI + +**Fix:** +1. Remove secrets from `compose.yaml` in Git +2. Use placeholders: `- API_KEY=${API_KEY}` +3. Add actual values in Komodo UI → Stack → Environment Variables + +--- + +### Issue: Pull Stack button doesn't update files + +**Cause:** Known issue (see [KBA-001](../KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md)) + +**Workaround:** +```bash +# SSH to node +docker exec komodo-periphery-{node} sh -c \ + 'cp /etc/komodo/repos/homelab/nodes/{node}/{stack}/compose.yaml \ + /etc/komodo/stacks/{stack}/compose.yaml' +``` + +--- + +### Issue: Permission denied on volume mounts + +**Cause:** UID/GID mismatch between container and host directory + +**Fix:** +```bash +# On target node, set correct ownership +sudo chown -R 1000:1000 /mnt/appdata/{service}/ +``` + +--- + +## Security Considerations + +### Secrets Management + +| ❌ DO NOT | ✅ DO | +|-----------|-------| +| Commit passwords to Git | Use Komodo Environment Variables | +| Store API keys in compose.yaml | Use `${VAR_NAME}` placeholders | +| Hardcode claim tokens | Inject via Komodo UI | + +### Example: Plex Claim Token + +**Bad (in Git):** +```yaml +environment: + - PLEX_CLAIM=claim-sxFpsPTDzzF-9RZAxtUL # ❌ Exposed in Git +``` + +**Good (in Git):** +```yaml +environment: + - PLEX_CLAIM=${PLEX_CLAIM} # ✅ Placeholder +``` + +**In Komodo UI:** +- Environment Variables → Add: `PLEX_CLAIM=claim-sxFpsPTDzzF-9RZAxtUL` + +--- + +## Related Documentation + +- [KBA-001: Komodo GitOps Stack Deployment Failures](../KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md) +- [TECHNICAL_RUNBOOK.md](../TECHNICAL_RUNBOOK.md) - Infrastructure overview +- Repository Memory: `/memories/repo/active-tasks.md` + +--- + +## Appendix: Example Migrations + +### Example 1: Simple Service (No GPU) + +**Stack:** Sonarr on Waldorf + +```yaml +services: + sonarr: + image: lscr.io/linuxserver/sonarr:latest + container_name: sonarr + restart: unless-stopped + ports: + - 8989:8989 + environment: + - PUID=1000 + - PGID=1000 + - TZ=America/New_York + volumes: + - /mnt/appdata/sonarr:/config + - /mnt/media/tvshows:/tv + - /mnt/media/downloads:/downloads +``` + +--- + +### Example 2: GPU Transcoding Service + +**Stack:** Plex on Waldorf (NVIDIA GTX 1060) + +```yaml +services: + plex: + image: lscr.io/linuxserver/plex:latest + container_name: plex + network_mode: host + restart: unless-stopped + environment: + - PUID=1000 + - PGID=1000 + - TZ=America/New_York + - VERSION=docker + - PLEX_CLAIM=${PLEX_CLAIM} # Set in Komodo UI + - NVIDIA_VISIBLE_DEVICES=all + - NVIDIA_DRIVER_CAPABILITIES=compute,video,utility + volumes: + - /mnt/appdata/plex:/config + - /mnt/media/tvshows:/tv + - /mnt/media/movies:/movies + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: 1 + capabilities: [gpu] +``` + +--- + +### Example 3: Traefik-Routed Service + +**Stack:** Custom App on Heimdall (Behind Traefik) + +```yaml +services: + myapp: + image: myregistry/myapp:2.1.0 + container_name: myapp + restart: unless-stopped + ports: + - 3000:3000 + environment: + - NODE_ENV=production + - DATABASE_URL=${DATABASE_URL} # Secret in Komodo UI + volumes: + - /mnt/appdata/myapp/data:/app/data + labels: + - "traefik.enable=true" + - "traefik.http.routers.myapp.rule=Host(`myapp.castaldifamily.com`)" + - "traefik.http.routers.myapp.entrypoints=websecure" + - "traefik.http.routers.myapp.tls=true" + - "traefik.http.routers.myapp.tls.certresolver=cloudflare" + - "traefik.http.services.myapp.loadbalancer.server.port=3000" +``` + +--- + +**Document Version:** 1.0 +**Tested On:** Komodo v2.1.2, Gitea v1.21 +**Validation Status:** Production-Ready ✅