BREAKING CHANGE: day0bootstrap.sh deprecated in favor of bootstrap.sh - Add scripts/bootstrap.sh (488 lines): Unified entrypoint supporting multiple hardware types (Proxmox/Docker VMs/Pi) - Create scripts/lib/ modular library system: - detection.sh: OS/hardware/container detection (362 lines) - fingerprint.sh: System fingerprinting and inventory (494 lines) - network.sh: IP configuration and VLAN placement (356 lines) - proxmox.sh: PVE post-install automation (453 lines) - validation.sh: Comprehensive pre-flight checks (510 lines) - Add validation tools: validate-node.sh, onboarding.sh, pi_init.sh - Deprecate scripts/day0bootstrap.sh with graceful redirect wrapper - Document architecture in scripts/README.md (495 lines) and PROXMOX-COMPARISON.md - Update SOP-002 with new bootstrap workflow - Add nodes/watchtower/compose.yaml (Raspberry Pi 5 stack) Migration: Existing day0bootstrap.sh users automatically redirected to new system after 5-second warning. No manual intervention required. Ref: Infrastructure automation modernization per active-tasks.md
16 KiB
SOP-002: Initial Infrastructure Deployment
Status: Active
Created: April 12, 2026
Last Updated: April 12, 2026
Owner: Nathan Castaldi
Applies To: Fresh homelab deployments and disaster recovery scenarios
Purpose
Deploy the complete homelab infrastructure from a clean state using GitOps principles and automation. This SOP covers:
- Secure repository setup with encrypted secrets
- Ansible control node configuration
- Core service deployment (Komodo, Traefik, Gitea, Redis)
- Validation and health checks
Use Cases:
- New homelab initialization
- Disaster recovery (full infrastructure rebuild)
- Node replacement or migration
Prerequisites
Required Access
- Physical or console access to all nodes (Heimdall, Waldorf, Watchtower)
- GitHub account with access to
homelabrepository - Gitea credentials (if repository already hosted locally)
- Root/sudo privileges on all nodes
Required Infrastructure
- Nodes have base OS installed (Debian/Ubuntu recommended)
- Network connectivity between all nodes
- NFS storage accessible at
10.0.0.250:/Volume1/appdata - DNS/hosts file configured for node resolution
- Internet access for package installation
Security Requirements
- Git-crypt symmetric key (if repository already encrypted)
- Password manager for storing credentials
- Secure workstation for handling keys and secrets
Security & Pre-Deployment Setup
Step 1: Prepare Your Workstation
Time: 15-20 minutes
-
Install Required Tools:
Linux/MacOS:
# Install git-crypt brew install git-crypt # MacOS # OR sudo apt install git-crypt # Debian/Ubuntu # Verify installation git-crypt --versionWindows (Git Bash/WSL):
# Download git-crypt binary curl -L https://github.com/AGWA/git-crypt/releases/download/0.7.0/git-crypt-0.7.0-x86_64.exe -o /usr/local/bin/git-crypt chmod +x /usr/local/bin/git-crypt -
Configure Git Identity:
git config --global user.name "Your Name" git config --global user.email "your.email@domain.com" git config --global core.autocrlf true # Windows only
Step 2: Clone Repository & Initialize Secrets
Time: 10-15 minutes
-
Clone from Source:
Option A: GitHub (Initial Clone):
cd ~/dev # Or your preferred code directory git clone https://github.com/your-username/homelab.git cd homelabOption B: Gitea (Production Environment):
cd ~/dev git clone https://git.castaldifamily.com/nathan/homelab.git cd homelab -
Unlock Encrypted Secrets (If Repository Already Uses Git-crypt):
# Import the symmetric key (retrieve from password manager) git-crypt unlock /path/to/homelab-secrets.key # Verify decryption ls -lh nodes/heimdall/core/.env.secrets # File should be readable plaintext, not binary⚠️ Security Warning: Store
homelab-secrets.keyin:- Password manager (1Password, Bitwarden, etc.)
- Encrypted backup drive
- NEVER commit it to the repository
-
Initialize Git-crypt (First-Time Setup Only):
# If repository is NOT yet encrypted git-crypt init git-crypt export-key ~/homelab-secrets.key # Secure the key immediately chmod 600 ~/homelab-secrets.key
Ansible Control Node Setup
Step 3: Bootstrap Watchtower as Control Node
Time: 15-20 minutes (reduced from 25-35 via automation)
Rationale: Watchtower (Raspberry Pi 5) serves as the Ansible control node to manage all infrastructure, including itself.
New Method: Use the unified bootstrap script for automated, idempotent configuration.
-
Transfer Bootstrap Script to Watchtower:
Option A: From local repository (if cloned on workstation):
# From your workstation scp -r homelab/scripts chester@10.0.0.200:~/Option B: Direct clone on Watchtower:
# SSH to Watchtower ssh chester@10.0.0.200 # Minimal clone (scripts only) git clone --depth=1 https://git.castaldifamily.com/nathan/homelab.git cd homelab/scripts -
Run Unified Bootstrap Script:
# Auto-detect and configure (Raspberry Pi will be detected) ./bootstrap.sh # The script will: # - Detect Raspberry Pi hardware # - Configure static IP (10.0.0.200) # - Install Docker with Debian Trixie compatibility # - Install Ansible and proxmoxer # - Generate ED25519 SSH keys # - Run comprehensive validation # - Generate hardware fingerprint⚠️ Important: SSH connection will drop during network reconfiguration. Reconnect after ~10 seconds:
ssh chester@10.0.0.200 -
Verify Bootstrap Success:
# After reconnecting cd homelab/scripts # Check validation report cat ../ansible/archive/outputs/bootstrap-validation-watchtower-*.log # Verify installations docker --version # Should show Docker 24.x or newer ansible --version # Should show ansible [core 2.x.x] # Check SSH key ls -lh ~/.ssh/id_ed25519.pub cat ~/.ssh/id_ed25519.pub # Copy this for distribution -
Distribute SSH Keys to Managed Nodes:
# The bootstrap script generated keys, now distribute them # Deploy to Heimdall ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151 # Deploy to Waldorf ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.251 # Deploy to localhost (self-management) ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@localhost -
Validate Passwordless Authentication:
# Test each node ssh -i ~/.ssh/id_ed25519 chester@10.0.0.151 "hostname" # Expected: heimdall ssh -i ~/.ssh/id_ed25519 chester@10.0.0.251 "hostname" # Expected: waldorf ssh -i ~/.ssh/id_ed25519 chester@localhost "hostname" # Expected: watchtower -
Clone Full Repository (If Not Already Present):
cd ~ # If you only did shallow clone earlier, get full repo rm -rf homelab # Remove shallow clone git clone https://git.castaldifamily.com/nathan/homelab.git cd homelab # Unlock secrets (if using git-crypt) # Transfer key securely via scp from workstation git-crypt unlock ~/homelab-secrets.key
Troubleshooting:
- Bootstrap fails: Run with
--dry-runfirst to preview actions:./bootstrap.sh --dry-run - Network doesn't reconnect: Wait 30 seconds and retry SSH
- Validation errors: Review the validation log, address critical errors before proceeding
- Manual intervention needed: Use
./validate-node.shto re-check after fixes
Core Infrastructure Deployment
Step 4: Bootstrap and Deploy Core Stack on Heimdall
Time: 15-25 minutes (reduced from 20-30 via automation)
Core Stack Components:
- Docker Socket Proxy (security boundary)
- Traefik (reverse proxy with automatic SSL)
- Redis (caching layer)
- Komodo Core (container orchestration)
-
Bootstrap Heimdall Node:
Option A: Remote bootstrap from Watchtower (recommended):
# From Watchtower control node cd ~/homelab # Copy bootstrap script to Heimdall scp -r scripts chester@10.0.0.151:~/ # SSH and run bootstrap ssh chester@10.0.0.151 "cd scripts && ./bootstrap.sh --hardware-type docker-vm"Option B: Direct console access:
# Login to Heimdall directly ssh chester@10.0.0.151 # Clone repo or copy scripts git clone --depth=1 https://git.castaldifamily.com/nathan/homelab.git cd homelab/scripts # Run bootstrap ./bootstrap.sh --hardware-type docker-vm --target-ip 10.0.0.151 -
Verify Docker Installation:
# After bootstrap completes ssh chester@10.0.0.151 docker --version docker compose version docker ps # Should return empty list (no containers yet) -
Create Komodo Directory Structure:
sudo mkdir -p /etc/komodo/{stacks,repos,volumes} sudo chown -R $USER:$USER /etc/komodo -
Mount NFS Storage (If Required):
# Install NFS client sudo apt install -y nfs-common # Create mount point sudo mkdir -p /mnt/nas # Add to /etc/fstab (persistent mount) echo "10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0" | sudo tee -a /etc/fstab # Mount immediately sudo mount -a # Verify mount df -h | grep nas -
Clone Repository to Heimdall:
cd ~ git clone https://git.castaldifamily.com/nathan/homelab.git cd homelab # Unlock secrets if repository uses git-crypt git-crypt unlock ~/homelab-secrets.key -
Deploy Core Stack:
cd ~/homelab/nodes/heimdall/core # Review configuration cat compose.yaml cat .env.secrets # Verify secrets are decrypted # Pull images docker compose pull # Start services in detached mode docker compose up -d # Monitor logs docker compose logs -f # Press Ctrl+C to exit log streaming -
Verify Core Services:
# Check running containers docker ps # Expected containers: # - dockerproxy # - traefik # - redis # - komodo-core # Check health docker compose ps # All services should show "running" status
Validation & Health Checks
Step 5: Service Verification
Time: 15-20 minutes
-
Test Internal Connectivity:
# From Heimdall # Test Komodo Core curl -I http://localhost:9000 # Expected: HTTP/1.1 200 OK # Test Redis docker exec -it redis redis-cli ping # Expected: PONG # Test Docker Socket Proxy curl http://localhost:2375/version # Expected: JSON response with Docker version -
Test External Access (From Workstation):
# Test Traefik dashboard (if exposed) curl -I https://traefik.castaldifamily.com # Test Komodo Core UI curl -I https://komodo.castaldifamily.com # Expected: HTTP/2 200 -
Verify Traefik SSL Certificates:
# SSH to Heimdall ssh chester@10.0.0.151 # Check Traefik logs for ACME certificate retrieval docker logs traefik 2>&1 | grep -i "certificate" # Verify cert storage ls -lh /etc/komodo/volumes/traefik/acme.json -
Komodo Core Initial Configuration:
- Navigate to
https://komodo.castaldifamily.comin browser - Complete first-time setup wizard
- Create admin account
- Add server nodes (Heimdall, Waldorf, Watchtower)
- Navigate to
Post-Deployment Configuration
Step 6: Configure GitOps Integration
Time: 20-25 minutes
-
Install Komodo Periphery on Remote Nodes:
On Waldorf (10.0.0.251):
ssh chester@10.0.0.251 # Install Docker curl -fsSL https://get.docker.com -o get-docker.sh sudo sh get-docker.sh sudo usermod -aG docker $USER # Create Komodo directory sudo mkdir -p /etc/komodo/{stacks,repos} sudo chown -R $USER:$USER /etc/komodo # Deploy Periphery (via Komodo UI or manually) # See Komodo documentation for Periphery setupOn Watchtower (10.0.0.200):
# Repeat same process as Waldorf -
Configure Repository Cloning in Komodo:
In Komodo UI:
- Navigate to Settings → Git Providers
- Add Gitea provider:
- URL:
https://git.castaldifamily.com - Token: Generate from Gitea Settings → Applications
- URL:
- Test connection
-
Create Git-Linked Stacks:
For each service (Plex, Tunarr, etc.):
- Navigate to Stacks → New Stack
- Select Git Repository as source
- Configure:
- Repo:
nathan/homelab - Branch:
main - Path:
nodes/{node-name}/{service-name} - Compose File:
compose.yaml
- Repo:
- Enable Auto-Deploy on Push
-
Configure Gitea Webhooks:
In Gitea repository settings:
- Navigate to Settings → Webhooks
- Add webhook:
- URL:
https://komodo.castaldifamily.com/api/webhook/pull-stack/{stack-id} - Secret: From Komodo stack configuration
- Events: Push events only
- Active: Enabled
- URL:
Troubleshooting
Common Issues
Issue: git-crypt unlock fails with "File is not encrypted"
Resolution:
- Verify you're in the correct repository directory
- Check if repository is actually using git-crypt:
git-crypt status - Ensure
.gitattributesfile exists and defines encryption rules
Issue: SSH key authentication fails to nodes
Resolution:
# Verify key permissions
ls -lh ~/.ssh/id_ed25519
# Should be: -rw------- (600)
# Test manual SSH with verbose logging
ssh -vvv -i ~/.ssh/id_ed25519 chester@10.0.0.151
# Check authorized_keys on target node
ssh chester@10.0.0.151 "cat ~/.ssh/authorized_keys"
Issue: Docker Compose fails with "network not found"
Resolution:
# Recreate default Docker networks
docker network prune -f
docker compose up -d --force-recreate
Issue: NFS mount fails with "Operation not permitted"
Resolution:
# Check NFS server exports
showmount -e 10.0.0.250
# Force NFSv3 (avoid ID mapping issues)
sudo mount -t nfs -o nfsvers=3 10.0.0.250:/Volume1/appdata /mnt/nas
# Update fstab with explicit version
# 10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0
Emergency Rollback
Complete Stack Teardown
If deployment fails and rollback is required:
# On Heimdall
cd ~/homelab/nodes/heimdall/core
docker compose down -v # -v removes volumes (DESTRUCTIVE)
# Preserve data (omit -v flag)
docker compose down
# Remove repository clone
cd ~
rm -rf homelab
Restore Previous State
# Re-clone repository at specific commit
git clone https://git.castaldifamily.com/nathan/homelab.git
cd homelab
git checkout {commit-hash} # Hash before failed deployment
# Unlock secrets and redeploy
git-crypt unlock ~/homelab-secrets.key
cd nodes/heimdall/core
docker compose up -d
Success Criteria
Deployment is complete when:
- All core services running on Heimdall (Komodo, Traefik, Redis, Docker Proxy)
- Komodo Periphery agents connected on Waldorf and Watchtower
- Traefik SSL certificates issued and valid
- Komodo UI accessible at
https://komodo.castaldifamily.com - Git-linked stacks successfully pull from Gitea
- Webhooks trigger automatic deployments on push
- NFS mounts stable across all nodes
- Ansible control node (Watchtower) can execute playbooks against all nodes
Next Steps
After successful deployment:
-
Deploy Application Stacks:
- Use SOP-001: Migrate Stack from UI to Git for each service
- Prioritize critical services: Plex, Gitea, Tunarr
-
Configure Backups:
- Implement automated Gitea repository backups
- Schedule NFS snapshot retention policy
- Export Komodo configuration regularly
-
Security Hardening:
- Enable Traefik authentication for internal services
- Configure fail2ban for SSH protection
- Implement network segmentation (VLANs)
-
Monitoring & Observability:
- Deploy Prometheus/Grafana stack
- Configure health check endpoints
- Set up uptime monitoring (Uptime Kuma)
Related Documentation
- SOP-001: Migrate Stack from UI to Git - Convert existing services to GitOps
- KBA-001: Komodo GitOps Deployment Failures - Troubleshooting guide
- plan-ansibleSetup.md - Detailed Ansible control node configuration
- plan-gitcryptMigration.md - Comprehensive git-crypt setup guide
- TECHNICAL_RUNBOOK.md - Emergency procedures and reference
Revision History
| Date | Version | Change Description |
|---|---|---|
| 2026-04-12 | 1.0 | Initial SOP creation |