homelab/documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md

14 KiB

SOP-002: Initial Infrastructure Deployment

Status: Active
Created: April 12, 2026
Last Updated: April 12, 2026
Owner: Nathan Castaldi
Applies To: Fresh homelab deployments and disaster recovery scenarios


Purpose

Deploy the complete homelab infrastructure from a clean state using GitOps principles and automation. This SOP covers:

  • Secure repository setup with encrypted secrets
  • Ansible control node configuration
  • Core service deployment (Komodo, Traefik, Gitea, Redis)
  • Validation and health checks

Use Cases:

  • New homelab initialization
  • Disaster recovery (full infrastructure rebuild)
  • Node replacement or migration

Prerequisites

Required Access

  • Physical or console access to all nodes (Heimdall, Waldorf, Watchtower)
  • GitHub account with access to homelab repository
  • Gitea credentials (if repository already hosted locally)
  • Root/sudo privileges on all nodes

Required Infrastructure

  • Nodes have base OS installed (Debian/Ubuntu recommended)
  • Network connectivity between all nodes
  • NFS storage accessible at 10.0.0.250:/Volume1/appdata
  • DNS/hosts file configured for node resolution
  • Internet access for package installation

Security Requirements

  • Git-crypt symmetric key (if repository already encrypted)
  • Password manager for storing credentials
  • Secure workstation for handling keys and secrets

Security & Pre-Deployment Setup

Step 1: Prepare Your Workstation

Time: 15-20 minutes

  1. Install Required Tools:

    Linux/MacOS:

    # Install git-crypt
    brew install git-crypt  # MacOS
    # OR
    sudo apt install git-crypt  # Debian/Ubuntu
    
    # Verify installation
    git-crypt --version
    

    Windows (Git Bash/WSL):

    # Download git-crypt binary
    curl -L https://github.com/AGWA/git-crypt/releases/download/0.7.0/git-crypt-0.7.0-x86_64.exe -o /usr/local/bin/git-crypt
    chmod +x /usr/local/bin/git-crypt
    
  2. Configure Git Identity:

    git config --global user.name "Your Name"
    git config --global user.email "your.email@domain.com"
    git config --global core.autocrlf true  # Windows only
    

Step 2: Clone Repository & Initialize Secrets

Time: 10-15 minutes

  1. Clone from Source:

    Option A: GitHub (Initial Clone):

    cd ~/dev  # Or your preferred code directory
    git clone https://github.com/your-username/homelab.git
    cd homelab
    

    Option B: Gitea (Production Environment):

    cd ~/dev
    git clone https://git.castaldifamily.com/nathan/homelab.git
    cd homelab
    
  2. Unlock Encrypted Secrets (If Repository Already Uses Git-crypt):

    # Import the symmetric key (retrieve from password manager)
    git-crypt unlock /path/to/homelab-secrets.key
    
    # Verify decryption
    ls -lh nodes/heimdall/core/.env.secrets
    # File should be readable plaintext, not binary
    

    ⚠️ Security Warning: Store homelab-secrets.key in:

    • Password manager (1Password, Bitwarden, etc.)
    • Encrypted backup drive
    • NEVER commit it to the repository
  3. Initialize Git-crypt (First-Time Setup Only):

    # If repository is NOT yet encrypted
    git-crypt init
    git-crypt export-key ~/homelab-secrets.key
    
    # Secure the key immediately
    chmod 600 ~/homelab-secrets.key
    

Ansible Control Node Setup

Step 3: Configure Watchtower as Control Node

Time: 25-35 minutes

Rationale: Watchtower (Raspberry Pi 5) serves as the Ansible control node to manage all infrastructure, including itself.

  1. SSH to Watchtower:

    ssh chester@10.0.0.200
    
  2. Install Ansible Toolchain:

    # Update package index
    sudo apt update
    
    # Install Ansible and dependencies
    sudo apt install -y ansible ansible-lint sshpass python3-pip git
    
    # Install Python libraries
    pip3 install proxmoxer requests --break-system-packages
    
    # Verify installation
    ansible --version
    # Expected: ansible [core 2.x.x]
    
  3. Generate SSH Keys for Automation:

    # Generate ED25519 key (modern cryptography)
    ssh-keygen -t ed25519 -C "ansible@watchtower" -f ~/.ssh/id_ed25519 -N ""
    
    # Set proper permissions
    chmod 600 ~/.ssh/id_ed25519
    chmod 644 ~/.ssh/id_ed25519.pub
    
  4. Distribute Keys to All Nodes:

    # Deploy to Heimdall
    ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151
    
    # Deploy to Waldorf
    ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.251
    
    # Deploy to localhost (self-management)
    ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@localhost
    
  5. Validate Passwordless Authentication:

    # Test each node
    ssh -i ~/.ssh/id_ed25519 chester@10.0.0.151 "hostname"
    # Expected: heimdall
    
    ssh -i ~/.ssh/id_ed25519 chester@10.0.0.251 "hostname"  
    # Expected: waldorf
    
    ssh -i ~/.ssh/id_ed25519 chester@localhost "hostname"
    # Expected: watchtower
    
  6. Clone Repository to Control Node:

    cd ~
    git clone https://git.castaldifamily.com/nathan/homelab.git
    cd homelab
    
    # Unlock secrets (if using git-crypt)
    # Transfer key securely via scp from workstation
    git-crypt unlock ~/homelab-secrets.key
    

Core Infrastructure Deployment

Step 4: Deploy Core Stack on Heimdall

Time: 20-30 minutes

Core Stack Components:

  • Docker Socket Proxy (security boundary)
  • Traefik (reverse proxy with automatic SSL)
  • Redis (caching layer)
  • Komodo Core (container orchestration)

Deployment Method: Manual Docker Compose (Ansible automation planned for future state)

  1. SSH to Heimdall:

    ssh chester@10.0.0.151
    
  2. Install Docker & Docker Compose:

    # Install Docker
    curl -fsSL https://get.docker.com -o get-docker.sh
    sudo sh get-docker.sh
    
    # Add user to docker group
    sudo usermod -aG docker $USER
    
    # Log out and back in for group to take effect
    exit
    ssh chester@10.0.0.151
    
    # Verify Docker installation
    docker --version
    docker compose version
    
  3. Create Komodo Directory Structure:

    sudo mkdir -p /etc/komodo/{stacks,repos,volumes}
    sudo chown -R $USER:$USER /etc/komodo
    
  4. Mount NFS Storage (If Required):

    # Install NFS client
    sudo apt install -y nfs-common
    
    # Create mount point
    sudo mkdir -p /mnt/nas
    
    # Add to /etc/fstab (persistent mount)
    echo "10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0" | sudo tee -a /etc/fstab
    
    # Mount immediately
    sudo mount -a
    
    # Verify mount
    df -h | grep nas
    
  5. Clone Repository to Heimdall:

    cd ~
    git clone https://git.castaldifamily.com/nathan/homelab.git
    cd homelab
    
    # Unlock secrets if repository uses git-crypt
    git-crypt unlock ~/homelab-secrets.key
    
  6. Deploy Core Stack:

    cd ~/homelab/nodes/heimdall/core
    
    # Review configuration
    cat compose.yaml
    cat .env.secrets  # Verify secrets are decrypted
    
    # Pull images
    docker compose pull
    
    # Start services in detached mode
    docker compose up -d
    
    # Monitor logs
    docker compose logs -f
    # Press Ctrl+C to exit log streaming
    
  7. Verify Core Services:

    # Check running containers
    docker ps
    
    # Expected containers:
    # - dockerproxy
    # - traefik
    # - redis
    # - komodo-core
    
    # Check health
    docker compose ps
    # All services should show "running" status
    

Validation & Health Checks

Step 5: Service Verification

Time: 15-20 minutes

  1. Test Internal Connectivity:

    # From Heimdall
    
    # Test Komodo Core
    curl -I http://localhost:9000
    # Expected: HTTP/1.1 200 OK
    
    # Test Redis
    docker exec -it redis redis-cli ping
    # Expected: PONG
    
    # Test Docker Socket Proxy
    curl http://localhost:2375/version
    # Expected: JSON response with Docker version
    
  2. Test External Access (From Workstation):

    # Test Traefik dashboard (if exposed)
    curl -I https://traefik.castaldifamily.com
    
    # Test Komodo Core UI
    curl -I https://komodo.castaldifamily.com
    # Expected: HTTP/2 200
    
  3. Verify Traefik SSL Certificates:

    # SSH to Heimdall
    ssh chester@10.0.0.151
    
    # Check Traefik logs for ACME certificate retrieval
    docker logs traefik 2>&1 | grep -i "certificate"
    
    # Verify cert storage
    ls -lh /etc/komodo/volumes/traefik/acme.json
    
  4. Komodo Core Initial Configuration:

    • Navigate to https://komodo.castaldifamily.com in browser
    • Complete first-time setup wizard
    • Create admin account
    • Add server nodes (Heimdall, Waldorf, Watchtower)

Post-Deployment Configuration

Step 6: Configure GitOps Integration

Time: 20-25 minutes

  1. Install Komodo Periphery on Remote Nodes:

    On Waldorf (10.0.0.251):

    ssh chester@10.0.0.251
    
    # Install Docker
    curl -fsSL https://get.docker.com -o get-docker.sh
    sudo sh get-docker.sh
    sudo usermod -aG docker $USER
    
    # Create Komodo directory
    sudo mkdir -p /etc/komodo/{stacks,repos}
    sudo chown -R $USER:$USER /etc/komodo
    
    # Deploy Periphery (via Komodo UI or manually)
    # See Komodo documentation for Periphery setup
    

    On Watchtower (10.0.0.200):

    # Repeat same process as Waldorf
    
  2. Configure Repository Cloning in Komodo:

    In Komodo UI:

    • Navigate to SettingsGit Providers
    • Add Gitea provider:
      • URL: https://git.castaldifamily.com
      • Token: Generate from Gitea Settings → Applications
    • Test connection
  3. Create Git-Linked Stacks:

    For each service (Plex, Tunarr, etc.):

    • Navigate to StacksNew Stack
    • Select Git Repository as source
    • Configure:
      • Repo: nathan/homelab
      • Branch: main
      • Path: nodes/{node-name}/{service-name}
      • Compose File: compose.yaml
    • Enable Auto-Deploy on Push
  4. Configure Gitea Webhooks:

    In Gitea repository settings:

    • Navigate to SettingsWebhooks
    • Add webhook:
      • URL: https://komodo.castaldifamily.com/api/webhook/pull-stack/{stack-id}
      • Secret: From Komodo stack configuration
      • Events: Push events only
      • Active: Enabled

Troubleshooting

Common Issues

Issue: git-crypt unlock fails with "File is not encrypted"

Resolution:

  • Verify you're in the correct repository directory
  • Check if repository is actually using git-crypt: git-crypt status
  • Ensure .gitattributes file exists and defines encryption rules

Issue: SSH key authentication fails to nodes

Resolution:

# Verify key permissions
ls -lh ~/.ssh/id_ed25519
# Should be: -rw------- (600)

# Test manual SSH with verbose logging
ssh -vvv -i ~/.ssh/id_ed25519 chester@10.0.0.151

# Check authorized_keys on target node
ssh chester@10.0.0.151 "cat ~/.ssh/authorized_keys"

Issue: Docker Compose fails with "network not found"

Resolution:

# Recreate default Docker networks
docker network prune -f
docker compose up -d --force-recreate

Issue: NFS mount fails with "Operation not permitted"

Resolution:

# Check NFS server exports
showmount -e 10.0.0.250

# Force NFSv3 (avoid ID mapping issues)
sudo mount -t nfs -o nfsvers=3 10.0.0.250:/Volume1/appdata /mnt/nas

# Update fstab with explicit version
# 10.0.0.250:/Volume1/appdata /mnt/nas nfs defaults,nfsvers=3 0 0

Emergency Rollback

Complete Stack Teardown

If deployment fails and rollback is required:

# On Heimdall
cd ~/homelab/nodes/heimdall/core
docker compose down -v  # -v removes volumes (DESTRUCTIVE)

# Preserve data (omit -v flag)
docker compose down

# Remove repository clone
cd ~
rm -rf homelab

Restore Previous State

# Re-clone repository at specific commit
git clone https://git.castaldifamily.com/nathan/homelab.git
cd homelab
git checkout {commit-hash}  # Hash before failed deployment

# Unlock secrets and redeploy
git-crypt unlock ~/homelab-secrets.key
cd nodes/heimdall/core
docker compose up -d

Success Criteria

Deployment is complete when:

  • All core services running on Heimdall (Komodo, Traefik, Redis, Docker Proxy)
  • Komodo Periphery agents connected on Waldorf and Watchtower
  • Traefik SSL certificates issued and valid
  • Komodo UI accessible at https://komodo.castaldifamily.com
  • Git-linked stacks successfully pull from Gitea
  • Webhooks trigger automatic deployments on push
  • NFS mounts stable across all nodes
  • Ansible control node (Watchtower) can execute playbooks against all nodes

Next Steps

After successful deployment:

  1. Deploy Application Stacks:

  2. Configure Backups:

    • Implement automated Gitea repository backups
    • Schedule NFS snapshot retention policy
    • Export Komodo configuration regularly
  3. Security Hardening:

    • Enable Traefik authentication for internal services
    • Configure fail2ban for SSH protection
    • Implement network segmentation (VLANs)
  4. Monitoring & Observability:

    • Deploy Prometheus/Grafana stack
    • Configure health check endpoints
    • Set up uptime monitoring (Uptime Kuma)


Revision History

Date Version Change Description
2026-04-12 1.0 Initial SOP creation