BREAKING CHANGE: day0bootstrap.sh deprecated in favor of bootstrap.sh - Add scripts/bootstrap.sh (488 lines): Unified entrypoint supporting multiple hardware types (Proxmox/Docker VMs/Pi) - Create scripts/lib/ modular library system: - detection.sh: OS/hardware/container detection (362 lines) - fingerprint.sh: System fingerprinting and inventory (494 lines) - network.sh: IP configuration and VLAN placement (356 lines) - proxmox.sh: PVE post-install automation (453 lines) - validation.sh: Comprehensive pre-flight checks (510 lines) - Add validation tools: validate-node.sh, onboarding.sh, pi_init.sh - Deprecate scripts/day0bootstrap.sh with graceful redirect wrapper - Document architecture in scripts/README.md (495 lines) and PROXMOX-COMPARISON.md - Update SOP-002 with new bootstrap workflow - Add nodes/watchtower/compose.yaml (Raspberry Pi 5 stack) Migration: Existing day0bootstrap.sh users automatically redirected to new system after 5-second warning. No manual intervention required. Ref: Infrastructure automation modernization per active-tasks.md
631 lines
16 KiB
Markdown
631 lines
16 KiB
Markdown
# SOP-002: Initial Infrastructure Deployment
|
|
|
|
**Status:** Active
|
|
**Created:** April 12, 2026
|
|
**Last Updated:** April 12, 2026
|
|
**Owner:** Nathan Castaldi
|
|
**Applies To:** Fresh homelab deployments and disaster recovery scenarios
|
|
|
|
---
|
|
|
|
## Purpose
|
|
|
|
Deploy the complete homelab infrastructure from a clean state using GitOps principles and automation. This SOP covers:
|
|
- Secure repository setup with encrypted secrets
|
|
- Ansible control node configuration
|
|
- Core service deployment (Komodo, Traefik, Gitea, Redis)
|
|
- Validation and health checks
|
|
|
|
**Use Cases:**
|
|
- New homelab initialization
|
|
- Disaster recovery (full infrastructure rebuild)
|
|
- Node replacement or migration
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
### Required Access
|
|
|
|
- [ ] Physical or console access to all nodes (Heimdall, Waldorf, Watchtower)
|
|
- [ ] GitHub account with access to `homelab` repository
|
|
- [ ] Gitea credentials (if repository already hosted locally)
|
|
- [ ] Root/sudo privileges on all nodes
|
|
|
|
### Required Infrastructure
|
|
|
|
- [ ] Nodes have base OS installed (Debian/Ubuntu recommended)
|
|
- [ ] Network connectivity between all nodes
|
|
- [ ] NFS storage accessible at `10.0.0.250:/Volume1/appdata`
|
|
- [ ] DNS/hosts file configured for node resolution
|
|
- [ ] Internet access for package installation
|
|
|
|
### Security Requirements
|
|
|
|
- [ ] Git-crypt symmetric key (if repository already encrypted)
|
|
- [ ] Password manager for storing credentials
|
|
- [ ] Secure workstation for handling keys and secrets
|
|
|
|
---
|
|
|
|
## Security & Pre-Deployment Setup
|
|
|
|
### Step 1: Prepare Your Workstation
|
|
|
|
**Time:** 15-20 minutes
|
|
|
|
1. **Install Required Tools:**
|
|
|
|
**Linux/MacOS:**
|
|
```bash
|
|
# Install git-crypt
|
|
brew install git-crypt # MacOS
|
|
# OR
|
|
sudo apt install git-crypt # Debian/Ubuntu
|
|
|
|
# Verify installation
|
|
git-crypt --version
|
|
```
|
|
|
|
**Windows (Git Bash/WSL):**
|
|
```bash
|
|
# Download git-crypt binary
|
|
curl -L https://github.com/AGWA/git-crypt/releases/download/0.7.0/git-crypt-0.7.0-x86_64.exe -o /usr/local/bin/git-crypt
|
|
chmod +x /usr/local/bin/git-crypt
|
|
```
|
|
|
|
2. **Configure Git Identity:**
|
|
```bash
|
|
git config --global user.name "Your Name"
|
|
git config --global user.email "your.email@domain.com"
|
|
git config --global core.autocrlf true # Windows only
|
|
```
|
|
|
|
---
|
|
|
|
### Step 2: Clone Repository & Initialize Secrets
|
|
|
|
**Time:** 10-15 minutes
|
|
|
|
1. **Clone from Source:**
|
|
|
|
**Option A: GitHub (Initial Clone):**
|
|
```bash
|
|
cd ~/dev # Or your preferred code directory
|
|
git clone https://github.com/your-username/homelab.git
|
|
cd homelab
|
|
```
|
|
|
|
**Option B: Gitea (Production Environment):**
|
|
```bash
|
|
cd ~/dev
|
|
git clone https://git.castaldifamily.com/nathan/homelab.git
|
|
cd homelab
|
|
```
|
|
|
|
2. **Unlock Encrypted Secrets (If Repository Already Uses Git-crypt):**
|
|
```bash
|
|
# Import the symmetric key (retrieve from password manager)
|
|
git-crypt unlock /path/to/homelab-secrets.key
|
|
|
|
# Verify decryption
|
|
ls -lh nodes/heimdall/core/.env.secrets
|
|
# File should be readable plaintext, not binary
|
|
```
|
|
|
|
**⚠️ Security Warning:** Store `homelab-secrets.key` in:
|
|
- Password manager (1Password, Bitwarden, etc.)
|
|
- Encrypted backup drive
|
|
- **NEVER** commit it to the repository
|
|
|
|
3. **Initialize Git-crypt (First-Time Setup Only):**
|
|
```bash
|
|
# If repository is NOT yet encrypted
|
|
git-crypt init
|
|
git-crypt export-key ~/homelab-secrets.key
|
|
|
|
# Secure the key immediately
|
|
chmod 600 ~/homelab-secrets.key
|
|
```
|
|
|
|
---
|
|
|
|
## Ansible Control Node Setup
|
|
|
|
### Step 3: Bootstrap Watchtower as Control Node
|
|
|
|
**Time:** 15-20 minutes (reduced from 25-35 via automation)
|
|
|
|
**Rationale:** Watchtower (Raspberry Pi 5) serves as the Ansible control node to manage all infrastructure, including itself.
|
|
|
|
**New Method:** Use the unified bootstrap script for automated, idempotent configuration.
|
|
|
|
1. **Transfer Bootstrap Script to Watchtower:**
|
|
|
|
**Option A: From local repository (if cloned on workstation):**
|
|
```bash
|
|
# From your workstation
|
|
scp -r homelab/scripts chester@10.0.0.200:~/
|
|
```
|
|
|
|
**Option B: Direct clone on Watchtower:**
|
|
```bash
|
|
# SSH to Watchtower
|
|
ssh chester@10.0.0.200
|
|
|
|
# Minimal clone (scripts only)
|
|
git clone --depth=1 https://git.castaldifamily.com/nathan/homelab.git
|
|
cd homelab/scripts
|
|
```
|
|
|
|
2. **Run Unified Bootstrap Script:**
|
|
```bash
|
|
# Auto-detect and configure (Raspberry Pi will be detected)
|
|
./bootstrap.sh
|
|
|
|
# The script will:
|
|
# - Detect Raspberry Pi hardware
|
|
# - Configure static IP (10.0.0.200)
|
|
# - Install Docker with Debian Trixie compatibility
|
|
# - Install Ansible and proxmoxer
|
|
# - Generate ED25519 SSH keys
|
|
# - Run comprehensive validation
|
|
# - Generate hardware fingerprint
|
|
```
|
|
|
|
**⚠️ Important:** SSH connection will drop during network reconfiguration.
|
|
Reconnect after ~10 seconds:
|
|
```bash
|
|
ssh chester@10.0.0.200
|
|
```
|
|
|
|
3. **Verify Bootstrap Success:**
|
|
```bash
|
|
# After reconnecting
|
|
cd homelab/scripts
|
|
|
|
# Check validation report
|
|
cat ../ansible/archive/outputs/bootstrap-validation-watchtower-*.log
|
|
|
|
# Verify installations
|
|
docker --version # Should show Docker 24.x or newer
|
|
ansible --version # Should show ansible [core 2.x.x]
|
|
|
|
# Check SSH key
|
|
ls -lh ~/.ssh/id_ed25519.pub
|
|
cat ~/.ssh/id_ed25519.pub # Copy this for distribution
|
|
```
|
|
|
|
4. **Distribute SSH Keys to Managed Nodes:**
|
|
```bash
|
|
# The bootstrap script generated keys, now distribute them
|
|
|
|
# Deploy to Heimdall
|
|
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151
|
|
|
|
# Deploy to Waldorf
|
|
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.251
|
|
|
|
# Deploy to localhost (self-management)
|
|
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@localhost
|
|
```
|
|
|
|
5. **Validate Passwordless Authentication:**
|
|
```bash
|
|
# Test each node
|
|
ssh -i ~/.ssh/id_ed25519 chester@10.0.0.151 "hostname"
|
|
# Expected: heimdall
|
|
|
|
ssh -i ~/.ssh/id_ed25519 chester@10.0.0.251 "hostname"
|
|
# Expected: waldorf
|
|
|
|
ssh -i ~/.ssh/id_ed25519 chester@localhost "hostname"
|
|
# Expected: watchtower
|
|
```
|
|
|
|
6. **Clone Full Repository (If Not Already Present):**
|
|
```bash
|
|
cd ~
|
|
|
|
# If you only did shallow clone earlier, get full repo
|
|
rm -rf homelab # Remove shallow clone
|
|
git clone https://git.castaldifamily.com/nathan/homelab.git
|
|
cd homelab
|
|
|
|
# Unlock secrets (if using git-crypt)
|
|
# Transfer key securely via scp from workstation
|
|
git-crypt unlock ~/homelab-secrets.key
|
|
```
|
|
|
|
**Troubleshooting:**
|
|
- **Bootstrap fails:** Run with `--dry-run` first to preview actions: `./bootstrap.sh --dry-run`
|
|
- **Network doesn't reconnect:** Wait 30 seconds and retry SSH
|
|
- **Validation errors:** Review the validation log, address critical errors before proceeding
|
|
- **Manual intervention needed:** Use `./validate-node.sh` to re-check after fixes
|
|
|
|
---
|
|
|
|
## Core Infrastructure Deployment
|
|
|
|
### Step 4: Bootstrap and Deploy Core Stack on Heimdall
|
|
|
|
**Time:** 15-25 minutes (reduced from 20-30 via automation)
|
|
|
|
**Core Stack Components:**
|
|
- Docker Socket Proxy (security boundary)
|
|
- Traefik (reverse proxy with automatic SSL)
|
|
- Redis (caching layer)
|
|
- Komodo Core (container orchestration)
|
|
|
|
1. **Bootstrap Heimdall Node:**
|
|
|
|
**Option A: Remote bootstrap from Watchtower (recommended):**
|
|
```bash
|
|
# From Watchtower control node
|
|
cd ~/homelab
|
|
|
|
# Copy bootstrap script to Heimdall
|
|
scp -r scripts chester@10.0.0.151:~/
|
|
|
|
# SSH and run bootstrap
|
|
ssh chester@10.0.0.151 "cd scripts && ./bootstrap.sh --hardware-type docker-vm"
|
|
```
|
|
|
|
**Option B: Direct console access:**
|
|
```bash
|
|
# Login to Heimdall directly
|
|
ssh chester@10.0.0.151
|
|
|
|
# Clone repo or copy scripts
|
|
git clone --depth=1 https://git.castaldifamily.com/nathan/homelab.git
|
|
cd homelab/scripts
|
|
|
|
# Run bootstrap
|
|
./bootstrap.sh --hardware-type docker-vm --target-ip 10.0.0.151
|
|
```
|
|
|
|
2. **Verify Docker Installation:**
|
|
```bash
|
|
# After bootstrap completes
|
|
ssh chester@10.0.0.151
|
|
|
|
docker --version
|
|
docker compose version
|
|
docker ps # Should return empty list (no containers yet)
|
|
```
|
|
|
|
3. **Create Komodo Directory Structure:**
|
|
```bash
|
|
sudo mkdir -p /etc/komodo/{stacks,repos,volumes}
|
|
sudo chown -R $USER:$USER /etc/komodo
|
|
```
|
|
|
|
4. **Mount NFS Storage (If Required):**
|
|
```bash
|
|
# Install NFS client
|
|
sudo apt install -y nfs-common
|
|
|
|
# Create mount point
|
|
sudo mkdir -p /mnt/nas
|
|
|
|
# Add to /etc/fstab (persistent mount)
|
|
echo "10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0" | sudo tee -a /etc/fstab
|
|
|
|
# Mount immediately
|
|
sudo mount -a
|
|
|
|
# Verify mount
|
|
df -h | grep nas
|
|
```
|
|
|
|
5. **Clone Repository to Heimdall:**
|
|
```bash
|
|
cd ~
|
|
git clone https://git.castaldifamily.com/nathan/homelab.git
|
|
cd homelab
|
|
|
|
# Unlock secrets if repository uses git-crypt
|
|
git-crypt unlock ~/homelab-secrets.key
|
|
```
|
|
|
|
6. **Deploy Core Stack:**
|
|
```bash
|
|
cd ~/homelab/nodes/heimdall/core
|
|
|
|
# Review configuration
|
|
cat compose.yaml
|
|
cat .env.secrets # Verify secrets are decrypted
|
|
|
|
# Pull images
|
|
docker compose pull
|
|
|
|
# Start services in detached mode
|
|
docker compose up -d
|
|
|
|
# Monitor logs
|
|
docker compose logs -f
|
|
# Press Ctrl+C to exit log streaming
|
|
```
|
|
|
|
7. **Verify Core Services:**
|
|
```bash
|
|
# Check running containers
|
|
docker ps
|
|
|
|
# Expected containers:
|
|
# - dockerproxy
|
|
# - traefik
|
|
# - redis
|
|
# - komodo-core
|
|
|
|
# Check health
|
|
docker compose ps
|
|
# All services should show "running" status
|
|
```
|
|
|
|
---
|
|
|
|
## Validation & Health Checks
|
|
|
|
### Step 5: Service Verification
|
|
|
|
**Time:** 15-20 minutes
|
|
|
|
1. **Test Internal Connectivity:**
|
|
```bash
|
|
# From Heimdall
|
|
|
|
# Test Komodo Core
|
|
curl -I http://localhost:9000
|
|
# Expected: HTTP/1.1 200 OK
|
|
|
|
# Test Redis
|
|
docker exec -it redis redis-cli ping
|
|
# Expected: PONG
|
|
|
|
# Test Docker Socket Proxy
|
|
curl http://localhost:2375/version
|
|
# Expected: JSON response with Docker version
|
|
```
|
|
|
|
2. **Test External Access (From Workstation):**
|
|
```bash
|
|
# Test Traefik dashboard (if exposed)
|
|
curl -I https://traefik.castaldifamily.com
|
|
|
|
# Test Komodo Core UI
|
|
curl -I https://komodo.castaldifamily.com
|
|
# Expected: HTTP/2 200
|
|
```
|
|
|
|
3. **Verify Traefik SSL Certificates:**
|
|
```bash
|
|
# SSH to Heimdall
|
|
ssh chester@10.0.0.151
|
|
|
|
# Check Traefik logs for ACME certificate retrieval
|
|
docker logs traefik 2>&1 | grep -i "certificate"
|
|
|
|
# Verify cert storage
|
|
ls -lh /etc/komodo/volumes/traefik/acme.json
|
|
```
|
|
|
|
4. **Komodo Core Initial Configuration:**
|
|
- Navigate to `https://komodo.castaldifamily.com` in browser
|
|
- Complete first-time setup wizard
|
|
- Create admin account
|
|
- Add server nodes (Heimdall, Waldorf, Watchtower)
|
|
|
|
---
|
|
|
|
## Post-Deployment Configuration
|
|
|
|
### Step 6: Configure GitOps Integration
|
|
|
|
**Time:** 20-25 minutes
|
|
|
|
1. **Install Komodo Periphery on Remote Nodes:**
|
|
|
|
**On Waldorf (10.0.0.251):**
|
|
```bash
|
|
ssh chester@10.0.0.251
|
|
|
|
# Install Docker
|
|
curl -fsSL https://get.docker.com -o get-docker.sh
|
|
sudo sh get-docker.sh
|
|
sudo usermod -aG docker $USER
|
|
|
|
# Create Komodo directory
|
|
sudo mkdir -p /etc/komodo/{stacks,repos}
|
|
sudo chown -R $USER:$USER /etc/komodo
|
|
|
|
# Deploy Periphery (via Komodo UI or manually)
|
|
# See Komodo documentation for Periphery setup
|
|
```
|
|
|
|
**On Watchtower (10.0.0.200):**
|
|
```bash
|
|
# Repeat same process as Waldorf
|
|
```
|
|
|
|
2. **Configure Repository Cloning in Komodo:**
|
|
|
|
In Komodo UI:
|
|
- Navigate to **Settings** → **Git Providers**
|
|
- Add Gitea provider:
|
|
- **URL:** `https://git.castaldifamily.com`
|
|
- **Token:** Generate from Gitea Settings → Applications
|
|
- Test connection
|
|
|
|
3. **Create Git-Linked Stacks:**
|
|
|
|
For each service (Plex, Tunarr, etc.):
|
|
- Navigate to **Stacks** → **New Stack**
|
|
- Select **Git Repository** as source
|
|
- Configure:
|
|
- **Repo:** `nathan/homelab`
|
|
- **Branch:** `main`
|
|
- **Path:** `nodes/{node-name}/{service-name}`
|
|
- **Compose File:** `compose.yaml`
|
|
- Enable **Auto-Deploy on Push**
|
|
|
|
4. **Configure Gitea Webhooks:**
|
|
|
|
In Gitea repository settings:
|
|
- Navigate to **Settings** → **Webhooks**
|
|
- Add webhook:
|
|
- **URL:** `https://komodo.castaldifamily.com/api/webhook/pull-stack/{stack-id}`
|
|
- **Secret:** From Komodo stack configuration
|
|
- **Events:** Push events only
|
|
- **Active:** Enabled
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Issue:** `git-crypt unlock` fails with "File is not encrypted"
|
|
|
|
**Resolution:**
|
|
- Verify you're in the correct repository directory
|
|
- Check if repository is actually using git-crypt: `git-crypt status`
|
|
- Ensure `.gitattributes` file exists and defines encryption rules
|
|
|
|
---
|
|
|
|
**Issue:** SSH key authentication fails to nodes
|
|
|
|
**Resolution:**
|
|
```bash
|
|
# Verify key permissions
|
|
ls -lh ~/.ssh/id_ed25519
|
|
# Should be: -rw------- (600)
|
|
|
|
# Test manual SSH with verbose logging
|
|
ssh -vvv -i ~/.ssh/id_ed25519 chester@10.0.0.151
|
|
|
|
# Check authorized_keys on target node
|
|
ssh chester@10.0.0.151 "cat ~/.ssh/authorized_keys"
|
|
```
|
|
|
|
---
|
|
|
|
**Issue:** Docker Compose fails with "network not found"
|
|
|
|
**Resolution:**
|
|
```bash
|
|
# Recreate default Docker networks
|
|
docker network prune -f
|
|
docker compose up -d --force-recreate
|
|
```
|
|
|
|
---
|
|
|
|
**Issue:** NFS mount fails with "Operation not permitted"
|
|
|
|
**Resolution:**
|
|
```bash
|
|
# Check NFS server exports
|
|
showmount -e 10.0.0.250
|
|
|
|
# Force NFSv3 (avoid ID mapping issues)
|
|
sudo mount -t nfs -o nfsvers=3 10.0.0.250:/Volume1/appdata /mnt/nas
|
|
|
|
# Update fstab with explicit version
|
|
# 10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0
|
|
```
|
|
|
|
---
|
|
|
|
## Emergency Rollback
|
|
|
|
### Complete Stack Teardown
|
|
|
|
If deployment fails and rollback is required:
|
|
|
|
```bash
|
|
# On Heimdall
|
|
cd ~/homelab/nodes/heimdall/core
|
|
docker compose down -v # -v removes volumes (DESTRUCTIVE)
|
|
|
|
# Preserve data (omit -v flag)
|
|
docker compose down
|
|
|
|
# Remove repository clone
|
|
cd ~
|
|
rm -rf homelab
|
|
```
|
|
|
|
### Restore Previous State
|
|
|
|
```bash
|
|
# Re-clone repository at specific commit
|
|
git clone https://git.castaldifamily.com/nathan/homelab.git
|
|
cd homelab
|
|
git checkout {commit-hash} # Hash before failed deployment
|
|
|
|
# Unlock secrets and redeploy
|
|
git-crypt unlock ~/homelab-secrets.key
|
|
cd nodes/heimdall/core
|
|
docker compose up -d
|
|
```
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
Deployment is **complete** when:
|
|
|
|
- [ ] All core services running on Heimdall (Komodo, Traefik, Redis, Docker Proxy)
|
|
- [ ] Komodo Periphery agents connected on Waldorf and Watchtower
|
|
- [ ] Traefik SSL certificates issued and valid
|
|
- [ ] Komodo UI accessible at `https://komodo.castaldifamily.com`
|
|
- [ ] Git-linked stacks successfully pull from Gitea
|
|
- [ ] Webhooks trigger automatic deployments on push
|
|
- [ ] NFS mounts stable across all nodes
|
|
- [ ] Ansible control node (Watchtower) can execute playbooks against all nodes
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
After successful deployment:
|
|
|
|
1. **Deploy Application Stacks:**
|
|
- Use [SOP-001: Migrate Stack from UI to Git](SOP-001-Migrate-Stack-from-UI-to-Git.md) for each service
|
|
- Prioritize critical services: Plex, Gitea, Tunarr
|
|
|
|
2. **Configure Backups:**
|
|
- Implement automated Gitea repository backups
|
|
- Schedule NFS snapshot retention policy
|
|
- Export Komodo configuration regularly
|
|
|
|
3. **Security Hardening:**
|
|
- Enable Traefik authentication for internal services
|
|
- Configure fail2ban for SSH protection
|
|
- Implement network segmentation (VLANs)
|
|
|
|
4. **Monitoring & Observability:**
|
|
- Deploy Prometheus/Grafana stack
|
|
- Configure health check endpoints
|
|
- Set up uptime monitoring (Uptime Kuma)
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- [SOP-001: Migrate Stack from UI to Git](SOP-001-Migrate-Stack-from-UI-to-Git.md) - Convert existing services to GitOps
|
|
- [KBA-001: Komodo GitOps Deployment Failures](../KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md) - Troubleshooting guide
|
|
- [plan-ansibleSetup.md](../plans/plan-ansibleSetup.md) - Detailed Ansible control node configuration
|
|
- [plan-gitcryptMigration.md](../plans/plan-gitcryptMigration.md) - Comprehensive git-crypt setup guide
|
|
- [TECHNICAL_RUNBOOK.md](../TECHNICAL_RUNBOOK.md) - Emergency procedures and reference
|
|
|
|
---
|
|
|
|
## Revision History
|
|
|
|
| Date | Version | Change Description |
|
|
|------|---------|-------------------|
|
|
| 2026-04-12 | 1.0 | Initial SOP creation |
|