From 7cfc01eea8076e0c8c1124808b60302f722d61ef Mon Sep 17 00:00:00 2001 From: nathan Date: Sun, 12 Apr 2026 01:41:43 -0400 Subject: [PATCH] feat(documentation): add SOP for initial infrastructure deployment with GitOps integration --- documentation/README.md | 5 + ...P-002-Initial-Infrastructure-Deployment.md | 580 ++++++++++++++++++ 2 files changed, 585 insertions(+) create mode 100644 documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md diff --git a/documentation/README.md b/documentation/README.md index 9ae6bac..7ed55f6 100644 --- a/documentation/README.md +++ b/documentation/README.md @@ -30,6 +30,11 @@ Structured troubleshooting articles following the incident → resolution format Step-by-step guides for operational tasks and migrations. +### Infrastructure Deployment + +- **[SOP-002: Initial Infrastructure Deployment](SOPs/SOP-002-Initial-Infrastructure-Deployment.md)** + Complete guide for deploying the homelab from scratch, including secure repository setup, Ansible control node configuration, core service deployment, and GitOps integration. + ### Stack Management - **[SOP-001: Migrate Stack from UI to Git](SOPs/SOP-001-Migrate-Stack-from-UI-to-Git.md)** diff --git a/documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md b/documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md new file mode 100644 index 0000000..ca1b912 --- /dev/null +++ b/documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md @@ -0,0 +1,580 @@ +# SOP-002: Initial Infrastructure Deployment + +**Status:** Active +**Created:** April 12, 2026 +**Last Updated:** April 12, 2026 +**Owner:** Nathan Castaldi +**Applies To:** Fresh homelab deployments and disaster recovery scenarios + +--- + +## Purpose + +Deploy the complete homelab infrastructure from a clean state using GitOps principles and automation. This SOP covers: +- Secure repository setup with encrypted secrets +- Ansible control node configuration +- Core service deployment (Komodo, Traefik, Gitea, Redis) +- Validation and health checks + +**Use Cases:** +- New homelab initialization +- Disaster recovery (full infrastructure rebuild) +- Node replacement or migration + +--- + +## Prerequisites + +### Required Access + +- [ ] Physical or console access to all nodes (Heimdall, Waldorf, Watchtower) +- [ ] GitHub account with access to `homelab` repository +- [ ] Gitea credentials (if repository already hosted locally) +- [ ] Root/sudo privileges on all nodes + +### Required Infrastructure + +- [ ] Nodes have base OS installed (Debian/Ubuntu recommended) +- [ ] Network connectivity between all nodes +- [ ] NFS storage accessible at `10.0.0.250:/Volume1/appdata` +- [ ] DNS/hosts file configured for node resolution +- [ ] Internet access for package installation + +### Security Requirements + +- [ ] Git-crypt symmetric key (if repository already encrypted) +- [ ] Password manager for storing credentials +- [ ] Secure workstation for handling keys and secrets + +--- + +## Security & Pre-Deployment Setup + +### Step 1: Prepare Your Workstation + +**Time:** 15-20 minutes + +1. **Install Required Tools:** + + **Linux/MacOS:** + ```bash + # Install git-crypt + brew install git-crypt # MacOS + # OR + sudo apt install git-crypt # Debian/Ubuntu + + # Verify installation + git-crypt --version + ``` + + **Windows (Git Bash/WSL):** + ```bash + # Download git-crypt binary + curl -L https://github.com/AGWA/git-crypt/releases/download/0.7.0/git-crypt-0.7.0-x86_64.exe -o /usr/local/bin/git-crypt + chmod +x /usr/local/bin/git-crypt + ``` + +2. **Configure Git Identity:** + ```bash + git config --global user.name "Your Name" + git config --global user.email "your.email@domain.com" + git config --global core.autocrlf true # Windows only + ``` + +--- + +### Step 2: Clone Repository & Initialize Secrets + +**Time:** 10-15 minutes + +1. **Clone from Source:** + + **Option A: GitHub (Initial Clone):** + ```bash + cd ~/dev # Or your preferred code directory + git clone https://github.com/your-username/homelab.git + cd homelab + ``` + + **Option B: Gitea (Production Environment):** + ```bash + cd ~/dev + git clone https://git.castaldifamily.com/nathan/homelab.git + cd homelab + ``` + +2. **Unlock Encrypted Secrets (If Repository Already Uses Git-crypt):** + ```bash + # Import the symmetric key (retrieve from password manager) + git-crypt unlock /path/to/homelab-secrets.key + + # Verify decryption + ls -lh nodes/heimdall/core/.env.secrets + # File should be readable plaintext, not binary + ``` + + **⚠️ Security Warning:** Store `homelab-secrets.key` in: + - Password manager (1Password, Bitwarden, etc.) + - Encrypted backup drive + - **NEVER** commit it to the repository + +3. **Initialize Git-crypt (First-Time Setup Only):** + ```bash + # If repository is NOT yet encrypted + git-crypt init + git-crypt export-key ~/homelab-secrets.key + + # Secure the key immediately + chmod 600 ~/homelab-secrets.key + ``` + +--- + +## Ansible Control Node Setup + +### Step 3: Configure Watchtower as Control Node + +**Time:** 25-35 minutes + +**Rationale:** Watchtower (Raspberry Pi 5) serves as the Ansible control node to manage all infrastructure, including itself. + +1. **SSH to Watchtower:** + ```bash + ssh chester@10.0.0.200 + ``` + +2. **Install Ansible Toolchain:** + ```bash + # Update package index + sudo apt update + + # Install Ansible and dependencies + sudo apt install -y ansible ansible-lint sshpass python3-pip git + + # Install Python libraries + pip3 install proxmoxer requests --break-system-packages + + # Verify installation + ansible --version + # Expected: ansible [core 2.x.x] + ``` + +3. **Generate SSH Keys for Automation:** + ```bash + # Generate ED25519 key (modern cryptography) + ssh-keygen -t ed25519 -C "ansible@watchtower" -f ~/.ssh/id_ed25519 -N "" + + # Set proper permissions + chmod 600 ~/.ssh/id_ed25519 + chmod 644 ~/.ssh/id_ed25519.pub + ``` + +4. **Distribute Keys to All Nodes:** + ```bash + # Deploy to Heimdall + ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151 + + # Deploy to Waldorf + ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.251 + + # Deploy to localhost (self-management) + ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@localhost + ``` + +5. **Validate Passwordless Authentication:** + ```bash + # Test each node + ssh -i ~/.ssh/id_ed25519 chester@10.0.0.151 "hostname" + # Expected: heimdall + + ssh -i ~/.ssh/id_ed25519 chester@10.0.0.251 "hostname" + # Expected: waldorf + + ssh -i ~/.ssh/id_ed25519 chester@localhost "hostname" + # Expected: watchtower + ``` + +6. **Clone Repository to Control Node:** + ```bash + cd ~ + git clone https://git.castaldifamily.com/nathan/homelab.git + cd homelab + + # Unlock secrets (if using git-crypt) + # Transfer key securely via scp from workstation + git-crypt unlock ~/homelab-secrets.key + ``` + +--- + +## Core Infrastructure Deployment + +### Step 4: Deploy Core Stack on Heimdall + +**Time:** 20-30 minutes + +**Core Stack Components:** +- Docker Socket Proxy (security boundary) +- Traefik (reverse proxy with automatic SSL) +- Redis (caching layer) +- Komodo Core (container orchestration) + +**Deployment Method:** Manual Docker Compose (Ansible automation planned for future state) + +1. **SSH to Heimdall:** + ```bash + ssh chester@10.0.0.151 + ``` + +2. **Install Docker & Docker Compose:** + ```bash + # Install Docker + curl -fsSL https://get.docker.com -o get-docker.sh + sudo sh get-docker.sh + + # Add user to docker group + sudo usermod -aG docker $USER + + # Log out and back in for group to take effect + exit + ssh chester@10.0.0.151 + + # Verify Docker installation + docker --version + docker compose version + ``` + +3. **Create Komodo Directory Structure:** + ```bash + sudo mkdir -p /etc/komodo/{stacks,repos,volumes} + sudo chown -R $USER:$USER /etc/komodo + ``` + +4. **Mount NFS Storage (If Required):** + ```bash + # Install NFS client + sudo apt install -y nfs-common + + # Create mount point + sudo mkdir -p /mnt/nas + + # Add to /etc/fstab (persistent mount) + echo "10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0" | sudo tee -a /etc/fstab + + # Mount immediately + sudo mount -a + + # Verify mount + df -h | grep nas + ``` + +5. **Clone Repository to Heimdall:** + ```bash + cd ~ + git clone https://git.castaldifamily.com/nathan/homelab.git + cd homelab + + # Unlock secrets if repository uses git-crypt + git-crypt unlock ~/homelab-secrets.key + ``` + +6. **Deploy Core Stack:** + ```bash + cd ~/homelab/nodes/heimdall/core + + # Review configuration + cat compose.yaml + cat .env.secrets # Verify secrets are decrypted + + # Pull images + docker compose pull + + # Start services in detached mode + docker compose up -d + + # Monitor logs + docker compose logs -f + # Press Ctrl+C to exit log streaming + ``` + +7. **Verify Core Services:** + ```bash + # Check running containers + docker ps + + # Expected containers: + # - dockerproxy + # - traefik + # - redis + # - komodo-core + + # Check health + docker compose ps + # All services should show "running" status + ``` + +--- + +## Validation & Health Checks + +### Step 5: Service Verification + +**Time:** 15-20 minutes + +1. **Test Internal Connectivity:** + ```bash + # From Heimdall + + # Test Komodo Core + curl -I http://localhost:9000 + # Expected: HTTP/1.1 200 OK + + # Test Redis + docker exec -it redis redis-cli ping + # Expected: PONG + + # Test Docker Socket Proxy + curl http://localhost:2375/version + # Expected: JSON response with Docker version + ``` + +2. **Test External Access (From Workstation):** + ```bash + # Test Traefik dashboard (if exposed) + curl -I https://traefik.castaldifamily.com + + # Test Komodo Core UI + curl -I https://komodo.castaldifamily.com + # Expected: HTTP/2 200 + ``` + +3. **Verify Traefik SSL Certificates:** + ```bash + # SSH to Heimdall + ssh chester@10.0.0.151 + + # Check Traefik logs for ACME certificate retrieval + docker logs traefik 2>&1 | grep -i "certificate" + + # Verify cert storage + ls -lh /etc/komodo/volumes/traefik/acme.json + ``` + +4. **Komodo Core Initial Configuration:** + - Navigate to `https://komodo.castaldifamily.com` in browser + - Complete first-time setup wizard + - Create admin account + - Add server nodes (Heimdall, Waldorf, Watchtower) + +--- + +## Post-Deployment Configuration + +### Step 6: Configure GitOps Integration + +**Time:** 20-25 minutes + +1. **Install Komodo Periphery on Remote Nodes:** + + **On Waldorf (10.0.0.251):** + ```bash + ssh chester@10.0.0.251 + + # Install Docker + curl -fsSL https://get.docker.com -o get-docker.sh + sudo sh get-docker.sh + sudo usermod -aG docker $USER + + # Create Komodo directory + sudo mkdir -p /etc/komodo/{stacks,repos} + sudo chown -R $USER:$USER /etc/komodo + + # Deploy Periphery (via Komodo UI or manually) + # See Komodo documentation for Periphery setup + ``` + + **On Watchtower (10.0.0.200):** + ```bash + # Repeat same process as Waldorf + ``` + +2. **Configure Repository Cloning in Komodo:** + + In Komodo UI: + - Navigate to **Settings** → **Git Providers** + - Add Gitea provider: + - **URL:** `https://git.castaldifamily.com` + - **Token:** Generate from Gitea Settings → Applications + - Test connection + +3. **Create Git-Linked Stacks:** + + For each service (Plex, Tunarr, etc.): + - Navigate to **Stacks** → **New Stack** + - Select **Git Repository** as source + - Configure: + - **Repo:** `nathan/homelab` + - **Branch:** `main` + - **Path:** `nodes/{node-name}/{service-name}` + - **Compose File:** `compose.yaml` + - Enable **Auto-Deploy on Push** + +4. **Configure Gitea Webhooks:** + + In Gitea repository settings: + - Navigate to **Settings** → **Webhooks** + - Add webhook: + - **URL:** `https://komodo.castaldifamily.com/api/webhook/pull-stack/{stack-id}` + - **Secret:** From Komodo stack configuration + - **Events:** Push events only + - **Active:** Enabled + +--- + +## Troubleshooting + +### Common Issues + +**Issue:** `git-crypt unlock` fails with "File is not encrypted" + +**Resolution:** +- Verify you're in the correct repository directory +- Check if repository is actually using git-crypt: `git-crypt status` +- Ensure `.gitattributes` file exists and defines encryption rules + +--- + +**Issue:** SSH key authentication fails to nodes + +**Resolution:** +```bash +# Verify key permissions +ls -lh ~/.ssh/id_ed25519 +# Should be: -rw------- (600) + +# Test manual SSH with verbose logging +ssh -vvv -i ~/.ssh/id_ed25519 chester@10.0.0.151 + +# Check authorized_keys on target node +ssh chester@10.0.0.151 "cat ~/.ssh/authorized_keys" +``` + +--- + +**Issue:** Docker Compose fails with "network not found" + +**Resolution:** +```bash +# Recreate default Docker networks +docker network prune -f +docker compose up -d --force-recreate +``` + +--- + +**Issue:** NFS mount fails with "Operation not permitted" + +**Resolution:** +```bash +# Check NFS server exports +showmount -e 10.0.0.250 + +# Force NFSv3 (avoid ID mapping issues) +sudo mount -t nfs -o nfsvers=3 10.0.0.250:/Volume1/appdata /mnt/nas + +# Update fstab with explicit version +# 10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0 +``` + +--- + +## Emergency Rollback + +### Complete Stack Teardown + +If deployment fails and rollback is required: + +```bash +# On Heimdall +cd ~/homelab/nodes/heimdall/core +docker compose down -v # -v removes volumes (DESTRUCTIVE) + +# Preserve data (omit -v flag) +docker compose down + +# Remove repository clone +cd ~ +rm -rf homelab +``` + +### Restore Previous State + +```bash +# Re-clone repository at specific commit +git clone https://git.castaldifamily.com/nathan/homelab.git +cd homelab +git checkout {commit-hash} # Hash before failed deployment + +# Unlock secrets and redeploy +git-crypt unlock ~/homelab-secrets.key +cd nodes/heimdall/core +docker compose up -d +``` + +--- + +## Success Criteria + +Deployment is **complete** when: + +- [ ] All core services running on Heimdall (Komodo, Traefik, Redis, Docker Proxy) +- [ ] Komodo Periphery agents connected on Waldorf and Watchtower +- [ ] Traefik SSL certificates issued and valid +- [ ] Komodo UI accessible at `https://komodo.castaldifamily.com` +- [ ] Git-linked stacks successfully pull from Gitea +- [ ] Webhooks trigger automatic deployments on push +- [ ] NFS mounts stable across all nodes +- [ ] Ansible control node (Watchtower) can execute playbooks against all nodes + +--- + +## Next Steps + +After successful deployment: + +1. **Deploy Application Stacks:** + - Use [SOP-001: Migrate Stack from UI to Git](SOP-001-Migrate-Stack-from-UI-to-Git.md) for each service + - Prioritize critical services: Plex, Gitea, Tunarr + +2. **Configure Backups:** + - Implement automated Gitea repository backups + - Schedule NFS snapshot retention policy + - Export Komodo configuration regularly + +3. **Security Hardening:** + - Enable Traefik authentication for internal services + - Configure fail2ban for SSH protection + - Implement network segmentation (VLANs) + +4. **Monitoring & Observability:** + - Deploy Prometheus/Grafana stack + - Configure health check endpoints + - Set up uptime monitoring (Uptime Kuma) + +--- + +## Related Documentation + +- [SOP-001: Migrate Stack from UI to Git](SOP-001-Migrate-Stack-from-UI-to-Git.md) - Convert existing services to GitOps +- [KBA-001: Komodo GitOps Deployment Failures](../KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md) - Troubleshooting guide +- [plan-ansibleSetup.md](../plans/plan-ansibleSetup.md) - Detailed Ansible control node configuration +- [plan-gitcryptMigration.md](../plans/plan-gitcryptMigration.md) - Comprehensive git-crypt setup guide +- [TECHNICAL_RUNBOOK.md](../TECHNICAL_RUNBOOK.md) - Emergency procedures and reference + +--- + +## Revision History + +| Date | Version | Change Description | +|------|---------|-------------------| +| 2026-04-12 | 1.0 | Initial SOP creation |