docs: reorganize documentation into KBAs/ and SOPs/ subdirectories

- documentation/KBAs/: Created subdirectory for Knowledge Base Articles
- documentation/SOPs/: Created subdirectory for Standard Operating Procedures
- documentation/README.md: Updated to reflect new structure with section descriptions
- Moved KBA-001 to KBAs/ folder
- Created SOP-001 (Migrate Stack from UI to Git) in SOPs/ folder
- Fixed all cross-reference links to use correct relative paths (../)

Improves documentation organization by separating troubleshooting guides (KBAs) from procedural guides (SOPs), making it easier to navigate and maintain the knowledge base as it grows.
This commit is contained in:
nathan 2026-04-11 23:56:43 -04:00
parent 58cde988da
commit a23a8581ee
3 changed files with 489 additions and 12 deletions

View File

@ -222,8 +222,7 @@ docker exec komodo-periphery-{node-name} cp \
## Related Documentation
- [TECHNICAL_RUNBOOK.md](TECHNICAL_RUNBOOK.md) - Infrastructure overview and emergency procedures
- [LLM_HANDOVER.md](LLM_HANDOVER.md) - Quick-start context for AI sessions
- [TECHNICAL_RUNBOOK.md](../TECHNICAL_RUNBOOK.md) - Infrastructure overview and emergency procedures
- Repository Memory: `/memories/repo/critical-fixes.md`
---

View File

@ -15,20 +15,33 @@ This directory contains all technical documentation for the Castaldi Family Home
---
## Knowledge Base Articles (KBA)
## Knowledge Base Articles (KBAs/)
Structured troubleshooting articles following the incident → resolution format.
### GitOps & Deployment
- **[KBA-001: Komodo GitOps Stack Deployment Failures](KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md)**
- **[KBA-001: Komodo GitOps Stack Deployment Failures](KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md)**
Troubleshooting guide for Git-linked stack pull/deploy failures, canonicalize errors, and Docker image tag issues.
---
## Standard Operating Procedures (SOPs/)
Step-by-step guides for operational tasks and migrations.
### Stack Management
- **[SOP-001: Migrate Stack from UI to Git](SOPs/SOP-001-Migrate-Stack-from-UI-to-Git.md)**
Complete guide for converting Komodo stacks from UI-defined to Git-based deployment, including secrets management and verification steps.
---
## Document Conventions
- **KBA-XXX:** Structured troubleshooting articles following the incident → resolution format
- **Runbooks:** Procedural guides for operations and emergency response
- **Handover Documents:** Context summaries for knowledge transfer
- **KBA-XXX:** Troubleshooting articles with clear problem/solution format (stored in `KBAs/`)
- **SOP-XXX:** Procedural guides for operational tasks (stored in `SOPs/`)
- **Runbooks:** Infrastructure reference and emergency procedures (root level)
---
@ -36,11 +49,12 @@ This directory contains all technical documentation for the Castaldi Family Home
When documenting new issues or procedures:
1. **KBA Format:** Use for troubleshooting scenarios with clear problem/solution
2. **Update Runbook:** Add new emergency procedures to TECHNICAL_RUNBOOK.md
3. **Update Repository Memory:** Store critical lessons in `/memories/repo/`
4. **Commit Messages:** Use conventional commits (e.g., `docs: Add KBA-002 for...`)
1. **KBAs:** Create in `KBAs/` folder for troubleshooting scenarios with clear diagnosis → resolution
2. **SOPs:** Create in `SOPs/` folder for repeatable operational procedures and migrations
3. **Update Runbook:** Add new emergency procedures to TECHNICAL_RUNBOOK.md
4. **Update Repository Memory:** Store critical lessons in `/memories/repo/`
5. **Commit Messages:** Use conventional commits (e.g., `docs(kba): add KBA-002 for...`)
---
**Last Updated:** April 12, 2026
**Last Updated:** April 11, 2026

View File

@ -0,0 +1,464 @@
# SOP-001: Migrate Komodo Stack from UI-Defined to Git-Based
**Status:** Active
**Created:** April 11, 2026
**Last Updated:** April 11, 2026
**Owner:** Nathan Castaldi
**Applies To:** All Komodo-managed Docker Compose stacks
---
## Purpose
Convert an existing Komodo stack from UI-defined (manual configuration) to Git-based (GitOps workflow) to enable:
- Version control and change tracking
- Automated deployments via webhooks
- Easier rollback and disaster recovery
- Multi-environment consistency
---
## Prerequisites
### Required Access
- [ ] SSH access to the target node (e.g., `waldorf`, `heimdall`, `watchtower`)
- [ ] Komodo UI access (`komodo.castaldifamily.com`)
- [ ] Gitea repository access (`git.castaldifamily.com/nathan/homelab`)
- [ ] Git configured on your local machine
### Required Infrastructure
- [ ] Komodo Periphery has `/etc/komodo/repos` volume mounted (see [KBA-001](../KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md))
- [ ] Gitea webhooks configured and verified
- [ ] NFS mounts operational (if using shared storage)
---
## Pre-Migration Data Collection
Before making any changes, **capture the current stack configuration** from Komodo UI:
### Step 1: Export Stack Configuration
1. **Navigate to Komodo UI** → Stacks → Select your stack
2. **Copy the following information:**
| Field | Value | Notes |
|-------|-------|-------|
| **Stack Name** | _______________ | (e.g., `plex`, `tunarr`) |
| **Node Name** | _______________ | (e.g., `waldorf`, `heimdall`) |
| **Container Name(s)** | _______________ | From compose services |
| **Image(s)** | _______________ | **Critical:** Note exact tag |
| **Ports** | _______________ | Host:Container mappings |
| **Volumes** | _______________ | Full paths (host → container) |
| **Environment Variables** | _______________ | **Secrets go here, NOT in Git** |
| **Network Mode** | _______________ | (bridge/host/custom) |
| **Restart Policy** | _______________ | (unless-stopped/always/no) |
| **Labels** | _______________ | Traefik, Komodo, etc. |
| **Devices** | _______________ | GPU passthrough, /dev/dri, etc. |
3. **Copy the complete `compose.yaml`** from the stack editor
4. **Take a screenshot** of the environment variables section (contains secrets)
---
## Migration Procedure
### Step 2: Create Repository Directory Structure
On your **local machine** (or via Working Copy on iPad):
```bash
# Navigate to homelab repo
cd ~/homelab # Or your local path
# Create the directory structure
mkdir -p nodes/{node-name}/{stack-name}
# Example:
mkdir -p nodes/waldorf/sonarr
```
**Directory naming convention:**
- Node name: `heimdall`, `waldorf`, `watchtower`
- Stack name: Lowercase, matches service (e.g., `plex`, `tunarr`, `sonarr`)
---
### Step 3: Create compose.yaml
Create `nodes/{node-name}/{stack-name}/compose.yaml` with the **sanitized** configuration:
```yaml
services:
{service-name}:
image: {registry}/{image}:{tag} # ⚠️ NO 'v' prefix on tag
container_name: {container-name}
restart: unless-stopped # or 'always'
ports:
- {host-port}:{container-port}
environment:
- TZ=America/New_York
# ⚠️ DO NOT store passwords/tokens here
# Use Komodo UI Environment Variables instead
volumes:
- /mnt/appdata/{service}/config:/config
- /mnt/media/data:/data # If applicable
# Optional: GPU passthrough
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
# Optional: Device passthrough (Intel QuickSync, etc.)
devices:
- /dev/dri:/dev/dri
# Optional: Traefik labels
labels:
- "traefik.enable=true"
- "traefik.http.routers.{service}.rule=Host(`{service}.castaldifamily.com`)"
- "komodo.managed=true"
```
**⚠️ Critical Configuration Rules:**
1. **Image Tags:**
- ✅ Correct: `image: chrisbenincasa/tunarr:1.2.11`
- ❌ Wrong: `image: chrisbenincasa/tunarr:v1.2.11` (no `v` prefix)
2. **Secrets Management:**
- ❌ **NEVER** commit passwords, API keys, or claim tokens to Git
- ✅ Use Komodo Stack Environment Variables for secrets
- ✅ Use placeholders in compose: `- PLEX_CLAIM=${PLEX_CLAIM}`
3. **Volume Paths:**
- Use absolute paths starting with `/mnt/appdata/`
- Ensure paths exist on the target node
- Match UID/GID permissions (usually `1000:1000`)
---
### Step 4: Commit and Push to Git
```bash
# Stage the new files
git add nodes/{node-name}/{stack-name}/
# Commit with descriptive message
git commit -m "feat(stacks): migrate {stack-name} to Git-based deployment
- Created compose.yaml for {node-name}/{stack-name}
- Extracted configuration from Komodo UI
- Secrets managed via Komodo Environment Variables"
# Push to Gitea
git push origin main
```
---
### Step 5: Reconfigure Stack in Komodo UI
1. **Navigate to Komodo UI** → Stacks → Select your stack
2. **Change stack type:**
- Click **Edit Stack** (gear icon)
- Change **Source Type** from `Manual` to `Git Repo`
3. **Configure Git settings:**
| Field | Value |
|-------|-------|
| **Repo** | `homelab` |
| **Branch** | `main` |
| **Run Directory** | `nodes/{node-name}/{stack-name}` |
| **File Paths** | *(leave blank - uses compose.yaml by default)* |
4. **Re-add Environment Variables (Secrets):**
- Click **Environment Variables** tab
- Add back any secrets from your pre-migration notes:
```
PLEX_CLAIM=claim-xxxxxxxxx
SONARR_API_KEY=xxxxxxxxxxxxxxxx
```
5. **Save Configuration**
---
### Step 6: Deploy and Verify
1. **Pull from Git:**
- Click **Pull Stack** button
- Wait for success notification
- Verify "Last pulled" timestamp updated
2. **Deploy Stack:**
- Click **Deploy Stack** button
- Monitor deployment logs in Komodo UI
- Wait for "Running" status
3. **Verify on Target Node:**
```bash
# SSH to the node
ssh chester@10.0.0.{node-ip}
# Check container status
docker ps | grep {container-name}
# Check logs for errors
docker logs {container-name} --tail 50
# Verify volumes mounted correctly
docker inspect {container-name} | grep -A 10 "Mounts"
# If GPU passthrough required:
docker exec {container-name} nvidia-smi
```
4. **Functional Testing:**
- Access the service via browser/app
- Verify data persistence (configs, databases)
- Test core functionality (playback, API calls, etc.)
---
## Verification Checklist
After migration, confirm:
- [ ] Container is running (`docker ps`)
- [ ] Service accessible via web UI/API
- [ ] Data persists across container restart
- [ ] Environment variables applied correctly
- [ ] GPU accessible (if applicable): `docker exec {name} nvidia-smi`
- [ ] Logs show no mount/permission errors
- [ ] Traefik routing works (if applicable)
- [ ] Auto-deployment triggers on Git push
---
## Rollback Procedure
If migration fails, revert to UI-defined stack:
1. **In Komodo UI:**
- Edit Stack → Change **Source Type** back to `Manual`
- Paste original compose.yaml from pre-migration backup
- Re-add environment variables
- Deploy stack
2. **On Target Node (Emergency):**
```bash
# Navigate to stack directory
cd /etc/komodo/stacks/{stack-name}
# Manually restore compose.yaml
nano compose.yaml # Paste backup content
# Restart manually
docker compose down
docker compose up -d
```
---
## Post-Migration Tasks
### Test Auto-Deployment
1. Make a minor change to the compose file (e.g., update comment)
2. Commit and push to Git
3. Verify Komodo auto-pulls and redeploys (check timestamps)
4. If auto-deploy doesn't work, manually click "Pull" → "Deploy"
### Update Documentation
- [ ] Add stack to node README (e.g., `nodes/waldorf/README.md`)
- [ ] Document any special configuration requirements
- [ ] Update infrastructure diagrams if necessary
---
## Common Issues
### Issue: "Image not found" after deploy
**Cause:** Docker tag has `v` prefix (e.g., `v1.2.11` instead of `1.2.11`)
**Fix:**
```yaml
# In compose.yaml, remove 'v' prefix
image: user/app:1.2.11 # Not v1.2.11
```
---
### Issue: Environment variables not applied
**Cause:** Secrets defined in Git-based compose.yaml instead of Komodo UI
**Fix:**
1. Remove secrets from `compose.yaml` in Git
2. Use placeholders: `- API_KEY=${API_KEY}`
3. Add actual values in Komodo UI → Stack → Environment Variables
---
### Issue: Pull Stack button doesn't update files
**Cause:** Known issue (see [KBA-001](../KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md))
**Workaround:**
```bash
# SSH to node
docker exec komodo-periphery-{node} sh -c \
'cp /etc/komodo/repos/homelab/nodes/{node}/{stack}/compose.yaml \
/etc/komodo/stacks/{stack}/compose.yaml'
```
---
### Issue: Permission denied on volume mounts
**Cause:** UID/GID mismatch between container and host directory
**Fix:**
```bash
# On target node, set correct ownership
sudo chown -R 1000:1000 /mnt/appdata/{service}/
```
---
## Security Considerations
### Secrets Management
| ❌ DO NOT | ✅ DO |
|-----------|-------|
| Commit passwords to Git | Use Komodo Environment Variables |
| Store API keys in compose.yaml | Use `${VAR_NAME}` placeholders |
| Hardcode claim tokens | Inject via Komodo UI |
### Example: Plex Claim Token
**Bad (in Git):**
```yaml
environment:
- PLEX_CLAIM=claim-sxFpsPTDzzF-9RZAxtUL # ❌ Exposed in Git
```
**Good (in Git):**
```yaml
environment:
- PLEX_CLAIM=${PLEX_CLAIM} # ✅ Placeholder
```
**In Komodo UI:**
- Environment Variables → Add: `PLEX_CLAIM=claim-sxFpsPTDzzF-9RZAxtUL`
---
## Related Documentation
- [KBA-001: Komodo GitOps Stack Deployment Failures](../KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md)
- [TECHNICAL_RUNBOOK.md](../TECHNICAL_RUNBOOK.md) - Infrastructure overview
- Repository Memory: `/memories/repo/active-tasks.md`
---
## Appendix: Example Migrations
### Example 1: Simple Service (No GPU)
**Stack:** Sonarr on Waldorf
```yaml
services:
sonarr:
image: lscr.io/linuxserver/sonarr:latest
container_name: sonarr
restart: unless-stopped
ports:
- 8989:8989
environment:
- PUID=1000
- PGID=1000
- TZ=America/New_York
volumes:
- /mnt/appdata/sonarr:/config
- /mnt/media/tvshows:/tv
- /mnt/media/downloads:/downloads
```
---
### Example 2: GPU Transcoding Service
**Stack:** Plex on Waldorf (NVIDIA GTX 1060)
```yaml
services:
plex:
image: lscr.io/linuxserver/plex:latest
container_name: plex
network_mode: host
restart: unless-stopped
environment:
- PUID=1000
- PGID=1000
- TZ=America/New_York
- VERSION=docker
- PLEX_CLAIM=${PLEX_CLAIM} # Set in Komodo UI
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,video,utility
volumes:
- /mnt/appdata/plex:/config
- /mnt/media/tvshows:/tv
- /mnt/media/movies:/movies
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```
---
### Example 3: Traefik-Routed Service
**Stack:** Custom App on Heimdall (Behind Traefik)
```yaml
services:
myapp:
image: myregistry/myapp:2.1.0
container_name: myapp
restart: unless-stopped
ports:
- 3000:3000
environment:
- NODE_ENV=production
- DATABASE_URL=${DATABASE_URL} # Secret in Komodo UI
volumes:
- /mnt/appdata/myapp/data:/app/data
labels:
- "traefik.enable=true"
- "traefik.http.routers.myapp.rule=Host(`myapp.castaldifamily.com`)"
- "traefik.http.routers.myapp.entrypoints=websecure"
- "traefik.http.routers.myapp.tls=true"
- "traefik.http.routers.myapp.tls.certresolver=cloudflare"
- "traefik.http.services.myapp.loadbalancer.server.port=3000"
```
---
**Document Version:** 1.0
**Tested On:** Komodo v2.1.2, Gitea v1.21
**Validation Status:** Production-Ready ✅