feat(documentation): add SOP for initial infrastructure deployment with GitOps integration

This commit is contained in:
nathan 2026-04-12 01:41:43 -04:00
parent 325c4b98a5
commit 7cfc01eea8
2 changed files with 585 additions and 0 deletions

View File

@ -30,6 +30,11 @@ Structured troubleshooting articles following the incident → resolution format
Step-by-step guides for operational tasks and migrations.
### Infrastructure Deployment
- **[SOP-002: Initial Infrastructure Deployment](SOPs/SOP-002-Initial-Infrastructure-Deployment.md)**
Complete guide for deploying the homelab from scratch, including secure repository setup, Ansible control node configuration, core service deployment, and GitOps integration.
### Stack Management
- **[SOP-001: Migrate Stack from UI to Git](SOPs/SOP-001-Migrate-Stack-from-UI-to-Git.md)**

View File

@ -0,0 +1,580 @@
# SOP-002: Initial Infrastructure Deployment
**Status:** Active
**Created:** April 12, 2026
**Last Updated:** April 12, 2026
**Owner:** Nathan Castaldi
**Applies To:** Fresh homelab deployments and disaster recovery scenarios
---
## Purpose
Deploy the complete homelab infrastructure from a clean state using GitOps principles and automation. This SOP covers:
- Secure repository setup with encrypted secrets
- Ansible control node configuration
- Core service deployment (Komodo, Traefik, Gitea, Redis)
- Validation and health checks
**Use Cases:**
- New homelab initialization
- Disaster recovery (full infrastructure rebuild)
- Node replacement or migration
---
## Prerequisites
### Required Access
- [ ] Physical or console access to all nodes (Heimdall, Waldorf, Watchtower)
- [ ] GitHub account with access to `homelab` repository
- [ ] Gitea credentials (if repository already hosted locally)
- [ ] Root/sudo privileges on all nodes
### Required Infrastructure
- [ ] Nodes have base OS installed (Debian/Ubuntu recommended)
- [ ] Network connectivity between all nodes
- [ ] NFS storage accessible at `10.0.0.250:/Volume1/appdata`
- [ ] DNS/hosts file configured for node resolution
- [ ] Internet access for package installation
### Security Requirements
- [ ] Git-crypt symmetric key (if repository already encrypted)
- [ ] Password manager for storing credentials
- [ ] Secure workstation for handling keys and secrets
---
## Security & Pre-Deployment Setup
### Step 1: Prepare Your Workstation
**Time:** 15-20 minutes
1. **Install Required Tools:**
**Linux/MacOS:**
```bash
# Install git-crypt
brew install git-crypt # MacOS
# OR
sudo apt install git-crypt # Debian/Ubuntu
# Verify installation
git-crypt --version
```
**Windows (Git Bash/WSL):**
```bash
# Download git-crypt binary
curl -L https://github.com/AGWA/git-crypt/releases/download/0.7.0/git-crypt-0.7.0-x86_64.exe -o /usr/local/bin/git-crypt
chmod +x /usr/local/bin/git-crypt
```
2. **Configure Git Identity:**
```bash
git config --global user.name "Your Name"
git config --global user.email "your.email@domain.com"
git config --global core.autocrlf true # Windows only
```
---
### Step 2: Clone Repository & Initialize Secrets
**Time:** 10-15 minutes
1. **Clone from Source:**
**Option A: GitHub (Initial Clone):**
```bash
cd ~/dev # Or your preferred code directory
git clone https://github.com/your-username/homelab.git
cd homelab
```
**Option B: Gitea (Production Environment):**
```bash
cd ~/dev
git clone https://git.castaldifamily.com/nathan/homelab.git
cd homelab
```
2. **Unlock Encrypted Secrets (If Repository Already Uses Git-crypt):**
```bash
# Import the symmetric key (retrieve from password manager)
git-crypt unlock /path/to/homelab-secrets.key
# Verify decryption
ls -lh nodes/heimdall/core/.env.secrets
# File should be readable plaintext, not binary
```
**⚠️ Security Warning:** Store `homelab-secrets.key` in:
- Password manager (1Password, Bitwarden, etc.)
- Encrypted backup drive
- **NEVER** commit it to the repository
3. **Initialize Git-crypt (First-Time Setup Only):**
```bash
# If repository is NOT yet encrypted
git-crypt init
git-crypt export-key ~/homelab-secrets.key
# Secure the key immediately
chmod 600 ~/homelab-secrets.key
```
---
## Ansible Control Node Setup
### Step 3: Configure Watchtower as Control Node
**Time:** 25-35 minutes
**Rationale:** Watchtower (Raspberry Pi 5) serves as the Ansible control node to manage all infrastructure, including itself.
1. **SSH to Watchtower:**
```bash
ssh chester@10.0.0.200
```
2. **Install Ansible Toolchain:**
```bash
# Update package index
sudo apt update
# Install Ansible and dependencies
sudo apt install -y ansible ansible-lint sshpass python3-pip git
# Install Python libraries
pip3 install proxmoxer requests --break-system-packages
# Verify installation
ansible --version
# Expected: ansible [core 2.x.x]
```
3. **Generate SSH Keys for Automation:**
```bash
# Generate ED25519 key (modern cryptography)
ssh-keygen -t ed25519 -C "ansible@watchtower" -f ~/.ssh/id_ed25519 -N ""
# Set proper permissions
chmod 600 ~/.ssh/id_ed25519
chmod 644 ~/.ssh/id_ed25519.pub
```
4. **Distribute Keys to All Nodes:**
```bash
# Deploy to Heimdall
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151
# Deploy to Waldorf
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.251
# Deploy to localhost (self-management)
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@localhost
```
5. **Validate Passwordless Authentication:**
```bash
# Test each node
ssh -i ~/.ssh/id_ed25519 chester@10.0.0.151 "hostname"
# Expected: heimdall
ssh -i ~/.ssh/id_ed25519 chester@10.0.0.251 "hostname"
# Expected: waldorf
ssh -i ~/.ssh/id_ed25519 chester@localhost "hostname"
# Expected: watchtower
```
6. **Clone Repository to Control Node:**
```bash
cd ~
git clone https://git.castaldifamily.com/nathan/homelab.git
cd homelab
# Unlock secrets (if using git-crypt)
# Transfer key securely via scp from workstation
git-crypt unlock ~/homelab-secrets.key
```
---
## Core Infrastructure Deployment
### Step 4: Deploy Core Stack on Heimdall
**Time:** 20-30 minutes
**Core Stack Components:**
- Docker Socket Proxy (security boundary)
- Traefik (reverse proxy with automatic SSL)
- Redis (caching layer)
- Komodo Core (container orchestration)
**Deployment Method:** Manual Docker Compose (Ansible automation planned for future state)
1. **SSH to Heimdall:**
```bash
ssh chester@10.0.0.151
```
2. **Install Docker & Docker Compose:**
```bash
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Add user to docker group
sudo usermod -aG docker $USER
# Log out and back in for group to take effect
exit
ssh chester@10.0.0.151
# Verify Docker installation
docker --version
docker compose version
```
3. **Create Komodo Directory Structure:**
```bash
sudo mkdir -p /etc/komodo/{stacks,repos,volumes}
sudo chown -R $USER:$USER /etc/komodo
```
4. **Mount NFS Storage (If Required):**
```bash
# Install NFS client
sudo apt install -y nfs-common
# Create mount point
sudo mkdir -p /mnt/nas
# Add to /etc/fstab (persistent mount)
echo "10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0" | sudo tee -a /etc/fstab
# Mount immediately
sudo mount -a
# Verify mount
df -h | grep nas
```
5. **Clone Repository to Heimdall:**
```bash
cd ~
git clone https://git.castaldifamily.com/nathan/homelab.git
cd homelab
# Unlock secrets if repository uses git-crypt
git-crypt unlock ~/homelab-secrets.key
```
6. **Deploy Core Stack:**
```bash
cd ~/homelab/nodes/heimdall/core
# Review configuration
cat compose.yaml
cat .env.secrets # Verify secrets are decrypted
# Pull images
docker compose pull
# Start services in detached mode
docker compose up -d
# Monitor logs
docker compose logs -f
# Press Ctrl+C to exit log streaming
```
7. **Verify Core Services:**
```bash
# Check running containers
docker ps
# Expected containers:
# - dockerproxy
# - traefik
# - redis
# - komodo-core
# Check health
docker compose ps
# All services should show "running" status
```
---
## Validation & Health Checks
### Step 5: Service Verification
**Time:** 15-20 minutes
1. **Test Internal Connectivity:**
```bash
# From Heimdall
# Test Komodo Core
curl -I http://localhost:9000
# Expected: HTTP/1.1 200 OK
# Test Redis
docker exec -it redis redis-cli ping
# Expected: PONG
# Test Docker Socket Proxy
curl http://localhost:2375/version
# Expected: JSON response with Docker version
```
2. **Test External Access (From Workstation):**
```bash
# Test Traefik dashboard (if exposed)
curl -I https://traefik.castaldifamily.com
# Test Komodo Core UI
curl -I https://komodo.castaldifamily.com
# Expected: HTTP/2 200
```
3. **Verify Traefik SSL Certificates:**
```bash
# SSH to Heimdall
ssh chester@10.0.0.151
# Check Traefik logs for ACME certificate retrieval
docker logs traefik 2>&1 | grep -i "certificate"
# Verify cert storage
ls -lh /etc/komodo/volumes/traefik/acme.json
```
4. **Komodo Core Initial Configuration:**
- Navigate to `https://komodo.castaldifamily.com` in browser
- Complete first-time setup wizard
- Create admin account
- Add server nodes (Heimdall, Waldorf, Watchtower)
---
## Post-Deployment Configuration
### Step 6: Configure GitOps Integration
**Time:** 20-25 minutes
1. **Install Komodo Periphery on Remote Nodes:**
**On Waldorf (10.0.0.251):**
```bash
ssh chester@10.0.0.251
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
# Create Komodo directory
sudo mkdir -p /etc/komodo/{stacks,repos}
sudo chown -R $USER:$USER /etc/komodo
# Deploy Periphery (via Komodo UI or manually)
# See Komodo documentation for Periphery setup
```
**On Watchtower (10.0.0.200):**
```bash
# Repeat same process as Waldorf
```
2. **Configure Repository Cloning in Komodo:**
In Komodo UI:
- Navigate to **Settings** → **Git Providers**
- Add Gitea provider:
- **URL:** `https://git.castaldifamily.com`
- **Token:** Generate from Gitea Settings → Applications
- Test connection
3. **Create Git-Linked Stacks:**
For each service (Plex, Tunarr, etc.):
- Navigate to **Stacks** → **New Stack**
- Select **Git Repository** as source
- Configure:
- **Repo:** `nathan/homelab`
- **Branch:** `main`
- **Path:** `nodes/{node-name}/{service-name}`
- **Compose File:** `compose.yaml`
- Enable **Auto-Deploy on Push**
4. **Configure Gitea Webhooks:**
In Gitea repository settings:
- Navigate to **Settings** → **Webhooks**
- Add webhook:
- **URL:** `https://komodo.castaldifamily.com/api/webhook/pull-stack/{stack-id}`
- **Secret:** From Komodo stack configuration
- **Events:** Push events only
- **Active:** Enabled
---
## Troubleshooting
### Common Issues
**Issue:** `git-crypt unlock` fails with "File is not encrypted"
**Resolution:**
- Verify you're in the correct repository directory
- Check if repository is actually using git-crypt: `git-crypt status`
- Ensure `.gitattributes` file exists and defines encryption rules
---
**Issue:** SSH key authentication fails to nodes
**Resolution:**
```bash
# Verify key permissions
ls -lh ~/.ssh/id_ed25519
# Should be: -rw------- (600)
# Test manual SSH with verbose logging
ssh -vvv -i ~/.ssh/id_ed25519 chester@10.0.0.151
# Check authorized_keys on target node
ssh chester@10.0.0.151 "cat ~/.ssh/authorized_keys"
```
---
**Issue:** Docker Compose fails with "network not found"
**Resolution:**
```bash
# Recreate default Docker networks
docker network prune -f
docker compose up -d --force-recreate
```
---
**Issue:** NFS mount fails with "Operation not permitted"
**Resolution:**
```bash
# Check NFS server exports
showmount -e 10.0.0.250
# Force NFSv3 (avoid ID mapping issues)
sudo mount -t nfs -o nfsvers=3 10.0.0.250:/Volume1/appdata /mnt/nas
# Update fstab with explicit version
# 10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0
```
---
## Emergency Rollback
### Complete Stack Teardown
If deployment fails and rollback is required:
```bash
# On Heimdall
cd ~/homelab/nodes/heimdall/core
docker compose down -v # -v removes volumes (DESTRUCTIVE)
# Preserve data (omit -v flag)
docker compose down
# Remove repository clone
cd ~
rm -rf homelab
```
### Restore Previous State
```bash
# Re-clone repository at specific commit
git clone https://git.castaldifamily.com/nathan/homelab.git
cd homelab
git checkout {commit-hash} # Hash before failed deployment
# Unlock secrets and redeploy
git-crypt unlock ~/homelab-secrets.key
cd nodes/heimdall/core
docker compose up -d
```
---
## Success Criteria
Deployment is **complete** when:
- [ ] All core services running on Heimdall (Komodo, Traefik, Redis, Docker Proxy)
- [ ] Komodo Periphery agents connected on Waldorf and Watchtower
- [ ] Traefik SSL certificates issued and valid
- [ ] Komodo UI accessible at `https://komodo.castaldifamily.com`
- [ ] Git-linked stacks successfully pull from Gitea
- [ ] Webhooks trigger automatic deployments on push
- [ ] NFS mounts stable across all nodes
- [ ] Ansible control node (Watchtower) can execute playbooks against all nodes
---
## Next Steps
After successful deployment:
1. **Deploy Application Stacks:**
- Use [SOP-001: Migrate Stack from UI to Git](SOP-001-Migrate-Stack-from-UI-to-Git.md) for each service
- Prioritize critical services: Plex, Gitea, Tunarr
2. **Configure Backups:**
- Implement automated Gitea repository backups
- Schedule NFS snapshot retention policy
- Export Komodo configuration regularly
3. **Security Hardening:**
- Enable Traefik authentication for internal services
- Configure fail2ban for SSH protection
- Implement network segmentation (VLANs)
4. **Monitoring & Observability:**
- Deploy Prometheus/Grafana stack
- Configure health check endpoints
- Set up uptime monitoring (Uptime Kuma)
---
## Related Documentation
- [SOP-001: Migrate Stack from UI to Git](SOP-001-Migrate-Stack-from-UI-to-Git.md) - Convert existing services to GitOps
- [KBA-001: Komodo GitOps Deployment Failures](../KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md) - Troubleshooting guide
- [plan-ansibleSetup.md](../plans/plan-ansibleSetup.md) - Detailed Ansible control node configuration
- [plan-gitcryptMigration.md](../plans/plan-gitcryptMigration.md) - Comprehensive git-crypt setup guide
- [TECHNICAL_RUNBOOK.md](../TECHNICAL_RUNBOOK.md) - Emergency procedures and reference
---
## Revision History
| Date | Version | Change Description |
|------|---------|-------------------|
| 2026-04-12 | 1.0 | Initial SOP creation |