homelab/documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md
nathan e16f98a183 feat(bootstrap)!: introduce unified bootstrap system with modular libraries
BREAKING CHANGE: day0bootstrap.sh deprecated in favor of bootstrap.sh

- Add scripts/bootstrap.sh (488 lines): Unified entrypoint supporting multiple hardware types (Proxmox/Docker VMs/Pi)
- Create scripts/lib/ modular library system:
  - detection.sh: OS/hardware/container detection (362 lines)
  - fingerprint.sh: System fingerprinting and inventory (494 lines)
  - network.sh: IP configuration and VLAN placement (356 lines)
  - proxmox.sh: PVE post-install automation (453 lines)
  - validation.sh: Comprehensive pre-flight checks (510 lines)
- Add validation tools: validate-node.sh, onboarding.sh, pi_init.sh
- Deprecate scripts/day0bootstrap.sh with graceful redirect wrapper
- Document architecture in scripts/README.md (495 lines) and PROXMOX-COMPARISON.md
- Update SOP-002 with new bootstrap workflow
- Add nodes/watchtower/compose.yaml (Raspberry Pi 5 stack)

Migration: Existing day0bootstrap.sh users automatically redirected to new system after 5-second warning. No manual intervention required.

Ref: Infrastructure automation modernization per active-tasks.md
2026-04-12 22:48:19 -04:00

16 KiB

SOP-002: Initial Infrastructure Deployment

Status: Active
Created: April 12, 2026
Last Updated: April 12, 2026
Owner: Nathan Castaldi
Applies To: Fresh homelab deployments and disaster recovery scenarios


Purpose

Deploy the complete homelab infrastructure from a clean state using GitOps principles and automation. This SOP covers:

  • Secure repository setup with encrypted secrets
  • Ansible control node configuration
  • Core service deployment (Komodo, Traefik, Gitea, Redis)
  • Validation and health checks

Use Cases:

  • New homelab initialization
  • Disaster recovery (full infrastructure rebuild)
  • Node replacement or migration

Prerequisites

Required Access

  • Physical or console access to all nodes (Heimdall, Waldorf, Watchtower)
  • GitHub account with access to homelab repository
  • Gitea credentials (if repository already hosted locally)
  • Root/sudo privileges on all nodes

Required Infrastructure

  • Nodes have base OS installed (Debian/Ubuntu recommended)
  • Network connectivity between all nodes
  • NFS storage accessible at 10.0.0.250:/Volume1/appdata
  • DNS/hosts file configured for node resolution
  • Internet access for package installation

Security Requirements

  • Git-crypt symmetric key (if repository already encrypted)
  • Password manager for storing credentials
  • Secure workstation for handling keys and secrets

Security & Pre-Deployment Setup

Step 1: Prepare Your Workstation

Time: 15-20 minutes

  1. Install Required Tools:

    Linux/MacOS:

    # Install git-crypt
    brew install git-crypt  # MacOS
    # OR
    sudo apt install git-crypt  # Debian/Ubuntu
    
    # Verify installation
    git-crypt --version
    

    Windows (Git Bash/WSL):

    # Download git-crypt binary
    curl -L https://github.com/AGWA/git-crypt/releases/download/0.7.0/git-crypt-0.7.0-x86_64.exe -o /usr/local/bin/git-crypt
    chmod +x /usr/local/bin/git-crypt
    
  2. Configure Git Identity:

    git config --global user.name "Your Name"
    git config --global user.email "your.email@domain.com"
    git config --global core.autocrlf true  # Windows only
    

Step 2: Clone Repository & Initialize Secrets

Time: 10-15 minutes

  1. Clone from Source:

    Option A: GitHub (Initial Clone):

    cd ~/dev  # Or your preferred code directory
    git clone https://github.com/your-username/homelab.git
    cd homelab
    

    Option B: Gitea (Production Environment):

    cd ~/dev
    git clone https://git.castaldifamily.com/nathan/homelab.git
    cd homelab
    
  2. Unlock Encrypted Secrets (If Repository Already Uses Git-crypt):

    # Import the symmetric key (retrieve from password manager)
    git-crypt unlock /path/to/homelab-secrets.key
    
    # Verify decryption
    ls -lh nodes/heimdall/core/.env.secrets
    # File should be readable plaintext, not binary
    

    ⚠️ Security Warning: Store homelab-secrets.key in:

    • Password manager (1Password, Bitwarden, etc.)
    • Encrypted backup drive
    • NEVER commit it to the repository
  3. Initialize Git-crypt (First-Time Setup Only):

    # If repository is NOT yet encrypted
    git-crypt init
    git-crypt export-key ~/homelab-secrets.key
    
    # Secure the key immediately
    chmod 600 ~/homelab-secrets.key
    

Ansible Control Node Setup

Step 3: Bootstrap Watchtower as Control Node

Time: 15-20 minutes (reduced from 25-35 via automation)

Rationale: Watchtower (Raspberry Pi 5) serves as the Ansible control node to manage all infrastructure, including itself.

New Method: Use the unified bootstrap script for automated, idempotent configuration.

  1. Transfer Bootstrap Script to Watchtower:

    Option A: From local repository (if cloned on workstation):

    # From your workstation
    scp -r homelab/scripts chester@10.0.0.200:~/
    

    Option B: Direct clone on Watchtower:

    # SSH to Watchtower
    ssh chester@10.0.0.200
    
    # Minimal clone (scripts only)
    git clone --depth=1 https://git.castaldifamily.com/nathan/homelab.git
    cd homelab/scripts
    
  2. Run Unified Bootstrap Script:

    # Auto-detect and configure (Raspberry Pi will be detected)
    ./bootstrap.sh
    
    # The script will:
    # - Detect Raspberry Pi hardware
    # - Configure static IP (10.0.0.200)
    # - Install Docker with Debian Trixie compatibility
    # - Install Ansible and proxmoxer
    # - Generate ED25519 SSH keys
    # - Run comprehensive validation
    # - Generate hardware fingerprint
    

    ⚠️ Important: SSH connection will drop during network reconfiguration. Reconnect after ~10 seconds:

    ssh chester@10.0.0.200
    
  3. Verify Bootstrap Success:

    # After reconnecting
    cd homelab/scripts
    
    # Check validation report
    cat ../ansible/archive/outputs/bootstrap-validation-watchtower-*.log
    
    # Verify installations
    docker --version       # Should show Docker 24.x or newer
    ansible --version      # Should show ansible [core 2.x.x]
    
    # Check SSH key
    ls -lh ~/.ssh/id_ed25519.pub
    cat ~/.ssh/id_ed25519.pub  # Copy this for distribution
    
  4. Distribute SSH Keys to Managed Nodes:

    # The bootstrap script generated keys, now distribute them
    
    # Deploy to Heimdall
    ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151
    
    # Deploy to Waldorf  
    ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.251
    
    # Deploy to localhost (self-management)
    ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@localhost
    
  5. Validate Passwordless Authentication:

    # Test each node
    ssh -i ~/.ssh/id_ed25519 chester@10.0.0.151 "hostname"
    # Expected: heimdall
    
    ssh -i ~/.ssh/id_ed25519 chester@10.0.0.251 "hostname"  
    # Expected: waldorf
    
    ssh -i ~/.ssh/id_ed25519 chester@localhost "hostname"
    # Expected: watchtower
    
  6. Clone Full Repository (If Not Already Present):

    cd ~
    
    # If you only did shallow clone earlier, get full repo
    rm -rf homelab  # Remove shallow clone
    git clone https://git.castaldifamily.com/nathan/homelab.git
    cd homelab
    
    # Unlock secrets (if using git-crypt)
    # Transfer key securely via scp from workstation
    git-crypt unlock ~/homelab-secrets.key
    

Troubleshooting:

  • Bootstrap fails: Run with --dry-run first to preview actions: ./bootstrap.sh --dry-run
  • Network doesn't reconnect: Wait 30 seconds and retry SSH
  • Validation errors: Review the validation log, address critical errors before proceeding
  • Manual intervention needed: Use ./validate-node.sh to re-check after fixes

Core Infrastructure Deployment

Step 4: Bootstrap and Deploy Core Stack on Heimdall

Time: 15-25 minutes (reduced from 20-30 via automation)

Core Stack Components:

  • Docker Socket Proxy (security boundary)
  • Traefik (reverse proxy with automatic SSL)
  • Redis (caching layer)
  • Komodo Core (container orchestration)
  1. Bootstrap Heimdall Node:

    Option A: Remote bootstrap from Watchtower (recommended):

    # From Watchtower control node
    cd ~/homelab
    
    # Copy bootstrap script to Heimdall
    scp -r scripts chester@10.0.0.151:~/
    
    # SSH and run bootstrap
    ssh chester@10.0.0.151 "cd scripts && ./bootstrap.sh --hardware-type docker-vm"
    

    Option B: Direct console access:

    # Login to Heimdall directly
    ssh chester@10.0.0.151
    
    # Clone repo or copy scripts
    git clone --depth=1 https://git.castaldifamily.com/nathan/homelab.git
    cd homelab/scripts
    
    # Run bootstrap
    ./bootstrap.sh --hardware-type docker-vm --target-ip 10.0.0.151
    
  2. Verify Docker Installation:

    # After bootstrap completes
    ssh chester@10.0.0.151
    
    docker --version
    docker compose version
    docker ps  # Should return empty list (no containers yet)
    
  3. Create Komodo Directory Structure:

    sudo mkdir -p /etc/komodo/{stacks,repos,volumes}
    sudo chown -R $USER:$USER /etc/komodo
    
  4. Mount NFS Storage (If Required):

    # Install NFS client
    sudo apt install -y nfs-common
    
    # Create mount point
    sudo mkdir -p /mnt/nas
    
    # Add to /etc/fstab (persistent mount)
    echo "10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0" | sudo tee -a /etc/fstab
    
    # Mount immediately
    sudo mount -a
    
    # Verify mount
    df -h | grep nas
    
  5. Clone Repository to Heimdall:

    cd ~
    git clone https://git.castaldifamily.com/nathan/homelab.git
    cd homelab
    
    # Unlock secrets if repository uses git-crypt
    git-crypt unlock ~/homelab-secrets.key
    
  6. Deploy Core Stack:

    cd ~/homelab/nodes/heimdall/core
    
    # Review configuration
    cat compose.yaml
    cat .env.secrets  # Verify secrets are decrypted
    
    # Pull images
    docker compose pull
    
    # Start services in detached mode
    docker compose up -d
    
    # Monitor logs
    docker compose logs -f
    # Press Ctrl+C to exit log streaming
    
  7. Verify Core Services:

    # Check running containers
    docker ps
    
    # Expected containers:
    # - dockerproxy
    # - traefik
    # - redis
    # - komodo-core
    
    # Check health
    docker compose ps
    # All services should show "running" status
    

Validation & Health Checks

Step 5: Service Verification

Time: 15-20 minutes

  1. Test Internal Connectivity:

    # From Heimdall
    
    # Test Komodo Core
    curl -I http://localhost:9000
    # Expected: HTTP/1.1 200 OK
    
    # Test Redis
    docker exec -it redis redis-cli ping
    # Expected: PONG
    
    # Test Docker Socket Proxy
    curl http://localhost:2375/version
    # Expected: JSON response with Docker version
    
  2. Test External Access (From Workstation):

    # Test Traefik dashboard (if exposed)
    curl -I https://traefik.castaldifamily.com
    
    # Test Komodo Core UI
    curl -I https://komodo.castaldifamily.com
    # Expected: HTTP/2 200
    
  3. Verify Traefik SSL Certificates:

    # SSH to Heimdall
    ssh chester@10.0.0.151
    
    # Check Traefik logs for ACME certificate retrieval
    docker logs traefik 2>&1 | grep -i "certificate"
    
    # Verify cert storage
    ls -lh /etc/komodo/volumes/traefik/acme.json
    
  4. Komodo Core Initial Configuration:

    • Navigate to https://komodo.castaldifamily.com in browser
    • Complete first-time setup wizard
    • Create admin account
    • Add server nodes (Heimdall, Waldorf, Watchtower)

Post-Deployment Configuration

Step 6: Configure GitOps Integration

Time: 20-25 minutes

  1. Install Komodo Periphery on Remote Nodes:

    On Waldorf (10.0.0.251):

    ssh chester@10.0.0.251
    
    # Install Docker
    curl -fsSL https://get.docker.com -o get-docker.sh
    sudo sh get-docker.sh
    sudo usermod -aG docker $USER
    
    # Create Komodo directory
    sudo mkdir -p /etc/komodo/{stacks,repos}
    sudo chown -R $USER:$USER /etc/komodo
    
    # Deploy Periphery (via Komodo UI or manually)
    # See Komodo documentation for Periphery setup
    

    On Watchtower (10.0.0.200):

    # Repeat same process as Waldorf
    
  2. Configure Repository Cloning in Komodo:

    In Komodo UI:

    • Navigate to SettingsGit Providers
    • Add Gitea provider:
      • URL: https://git.castaldifamily.com
      • Token: Generate from Gitea Settings → Applications
    • Test connection
  3. Create Git-Linked Stacks:

    For each service (Plex, Tunarr, etc.):

    • Navigate to StacksNew Stack
    • Select Git Repository as source
    • Configure:
      • Repo: nathan/homelab
      • Branch: main
      • Path: nodes/{node-name}/{service-name}
      • Compose File: compose.yaml
    • Enable Auto-Deploy on Push
  4. Configure Gitea Webhooks:

    In Gitea repository settings:

    • Navigate to SettingsWebhooks
    • Add webhook:
      • URL: https://komodo.castaldifamily.com/api/webhook/pull-stack/{stack-id}
      • Secret: From Komodo stack configuration
      • Events: Push events only
      • Active: Enabled

Troubleshooting

Common Issues

Issue: git-crypt unlock fails with "File is not encrypted"

Resolution:

  • Verify you're in the correct repository directory
  • Check if repository is actually using git-crypt: git-crypt status
  • Ensure .gitattributes file exists and defines encryption rules

Issue: SSH key authentication fails to nodes

Resolution:

# Verify key permissions
ls -lh ~/.ssh/id_ed25519
# Should be: -rw------- (600)

# Test manual SSH with verbose logging
ssh -vvv -i ~/.ssh/id_ed25519 chester@10.0.0.151

# Check authorized_keys on target node
ssh chester@10.0.0.151 "cat ~/.ssh/authorized_keys"

Issue: Docker Compose fails with "network not found"

Resolution:

# Recreate default Docker networks
docker network prune -f
docker compose up -d --force-recreate

Issue: NFS mount fails with "Operation not permitted"

Resolution:

# Check NFS server exports
showmount -e 10.0.0.250

# Force NFSv3 (avoid ID mapping issues)
sudo mount -t nfs -o nfsvers=3 10.0.0.250:/Volume1/appdata /mnt/nas

# Update fstab with explicit version
# 10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0

Emergency Rollback

Complete Stack Teardown

If deployment fails and rollback is required:

# On Heimdall
cd ~/homelab/nodes/heimdall/core
docker compose down -v  # -v removes volumes (DESTRUCTIVE)

# Preserve data (omit -v flag)
docker compose down

# Remove repository clone
cd ~
rm -rf homelab

Restore Previous State

# Re-clone repository at specific commit
git clone https://git.castaldifamily.com/nathan/homelab.git
cd homelab
git checkout {commit-hash}  # Hash before failed deployment

# Unlock secrets and redeploy
git-crypt unlock ~/homelab-secrets.key
cd nodes/heimdall/core
docker compose up -d

Success Criteria

Deployment is complete when:

  • All core services running on Heimdall (Komodo, Traefik, Redis, Docker Proxy)
  • Komodo Periphery agents connected on Waldorf and Watchtower
  • Traefik SSL certificates issued and valid
  • Komodo UI accessible at https://komodo.castaldifamily.com
  • Git-linked stacks successfully pull from Gitea
  • Webhooks trigger automatic deployments on push
  • NFS mounts stable across all nodes
  • Ansible control node (Watchtower) can execute playbooks against all nodes

Next Steps

After successful deployment:

  1. Deploy Application Stacks:

  2. Configure Backups:

    • Implement automated Gitea repository backups
    • Schedule NFS snapshot retention policy
    • Export Komodo configuration regularly
  3. Security Hardening:

    • Enable Traefik authentication for internal services
    • Configure fail2ban for SSH protection
    • Implement network segmentation (VLANs)
  4. Monitoring & Observability:

    • Deploy Prometheus/Grafana stack
    • Configure health check endpoints
    • Set up uptime monitoring (Uptime Kuma)


Revision History

Date Version Change Description
2026-04-12 1.0 Initial SOP creation