homelab/documentation/TECHNICAL_RUNBOOK.md

5.8 KiB

Technical Runbook: Castaldi Family Lab

Status: ACTIVE & OPERATIONAL
Last Updated: April 11, 2026
Maintainer: Nathan Castaldi


Table of Contents

  1. Infrastructure Overview
  2. Critical Fixes
  3. Lessons Learned
  4. Network Map
  5. Active Tasks
  6. Emergency Procedures

Infrastructure Overview

Node Inventory

Node IP Address Hardware Services
Heimdall 10.0.0.151 Proxmox VM Komodo Core, Gitea, Traefik
Waldorf 10.0.0.XXX NVIDIA GTX 1060 Plex, Tunarr
Watchtower 10.0.0.200 Raspberry Pi Komodo Periphery
TerraMaster 10.0.0.250 NAS NFS Storage (/Volume1/appdata)

Repository Structure

/nodes
  /heimdall
    /core         # Komodo Core
    /gitea        # Git Repository Server
  /waldorf
    /plex         # Media Server (NVIDIA Optimized)
    /tunarr       # Channel Management (GPU Passthrough)
  /watchtower
    # Komodo Periphery

Critical Fixes

⚠️ DO NOT REVERT THESE CONFIGURATIONS

1. NFS Mount: Watchtower (Raspberry Pi)

Problem: Permission Denied on /mnt/appdata despite matching UIDs.

Root Cause: NFSv4 ID-domain mismatch between Pi and TerraMaster NAS.

Solution:

# /etc/fstab entry (Force NFSv3)
10.0.0.250:/Volume1/appdata /mnt/appdata nfs rw,nfsvers=3,hard,intr,x-systemd.automount,nolock 0 0

Mount Point Ownership:

# Set ownership WHILE UNMOUNTED
sudo chown chester:chester /mnt/appdata

2. Komodo Periphery Connectivity

Problem: Hairpin NAT prevents *.castaldifamily.com access from internal nodes.

Solution:

  • Core URL (Internal): ws://10.0.0.151:9120
  • Key Paths: /config/keys/periphery.pub
  • Environment Variable: file:/config/keys/periphery.pub

3. Gitea & GitOps

Problem: SSH Key Exchange (Kex) errors on Windows (diffie-hellman-group1-sha1).

Solution:

# Use HTTPS instead of SSH
git clone https://git.castaldifamily.com/nathan/homelab.git

# Windows Credential Storage
git config --global credential.helper wincred

# Cross-Platform Line Endings
git config --global core.autocrlf true

# Network Share Permissions
git config --global safe.directory "*"

4. GPU Passthrough (Plex/Waldorf)

Problem: Plex sees GPU but doesn't use it for hardware transcoding.

Solution:

# compose.yaml
services:
  plex:
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Verification:

  • Monitor Plex Dashboard for (hw) status during transcoding.

Lessons Learned

"NFSv4 is too smart"

Modern NFS (v4) tries to sync user identities across a "Domain." If the Pi and NAS don't agree on the domain name, it defaults to nobody.

Fix: Force NFSv3—it only checks UID numbers (1000).


"Naked Mount Point"

If the local folder (/mnt/appdata) is owned by root, you can't "pass through" to see NAS data once it mounts.

Fix: chown the mount point to the user while unmounted.


"Hairpin NAT"

Many routers won't let internal traffic go out to a public IP and then back in (Hairpinning).

Fix: Use Internal IPs (10.0.0.X) for node-to-node communication.


"GPU Passthrough"

Docker isolation is strict. Simply having drivers on the host isn't enough.

Fix: Use deploy: resources: reservations block in Compose to "hand the keys" of the hardware to the container.


Network Map

Service Protocol Internal Address External URL
Komodo Core HTTP 10.0.0.151:9000 komodo.castaldifamily.com
Gitea HTTPS 10.0.0.151:3000 git.castaldifamily.com
Plex Host Network 10.0.0.XXX:32400 plex.castaldifamily.com
Tunarr HTTP 10.0.0.XXX:8000 tunarr.castaldifamily.com

Active Tasks

Current Focus

  1. Git-ify Stacks

    • plex and tunarr pushed to Gitea
    • Convert remaining "Manual" stacks to "Git" sources in Komodo
  2. Webhooks

    • Ensure Gitea Webhooks fire to Komodo Stack URLs for auto-deployment
  3. Hardware Transcoding

    • Monitor Waldorf for (hw) status in Plex

Emergency Procedures

🔥 NFS Mount Failure (Watchtower)

# Check NFS Server
ping 10.0.0.250

# Remount NFS Share
sudo umount /mnt/appdata
sudo mount -a

# Verify Mount
df -h | grep appdata

🔥 Komodo Periphery Offline

# Check Core Connectivity
curl -v ws://10.0.0.151:9120

# Restart Periphery Container
docker restart komodo-periphery
docker logs -f komodo-periphery

🔥 Plex Not Using GPU

# Verify NVIDIA Runtime
docker info | grep -i nvidia

# Check GPU Access in Container
docker exec -it plex nvidia-smi

🔥 Git Authentication Failure

# Regenerate Gitea App Token
# Settings > Applications > Generate New Token

# Update Credential Helper
git config --global credential.helper wincred

# Test Clone
git clone https://git.castaldifamily.com/nathan/homelab.git

Credential Management

  • DO NOT store passwords in compose.yaml in Git repo
  • DO use Komodo Stack "Environment Variables" to inject secrets
  • DO use Gitea App Tokens for Git authentication (iPad/Windows)

Maintenance Schedule

Task Frequency Notes
Update Docker Images Weekly Via Komodo or Watchtower
Backup Gitea Weekly /data/gitea directory
Backup Plex Metadata Monthly /config/Library directory
Check NFS Mount Health Monthly df -h, verify permissions

End of Runbook