nathan 016d38d5ab feat(prompts): add Docker service lifecycle and session management workflows
- Add service management prompts (review, standardize, troubleshoot, integration)
- Add Docker Swarm migration and tutoring workflows (swarm-migration, swarm-tutor)
- Add SSO onboarding guide for Authentik integration (sso-onboarding)
- Add session lifecycle prompts (start, end, status) for context continuity
- Add node bootstrap scripts for Debian Trixie (day0bootstrap.sh) and Ubuntu/Debian (pi_init.sh)

These prompts implement gated, step-by-step workflows with explicit confirmation
requirements to prevent accidental changes during service operations. Bootstrap
scripts standardize IP configuration (10.0.0.200) and install Docker + Ansible
on new nodes.
2026-04-12 16:30:53 -04:00

Castaldi Family Homelab

A GitOps-managed, self-hosted infrastructure running media services, container orchestration, and automation across distributed ARM and x86 nodes.

GitOps Infrastructure Documentation


🚀 Why This Homelab?

  • Zero-Touch Deployments: Push to Git → Auto-deploy via webhooks → Containers update automatically
  • Infrastructure as Code: All services defined in version-controlled compose.yaml files
  • GPU Transcoding: Hardware-accelerated media streaming with NVIDIA GTX 1060
  • Distributed Architecture: Services intelligently distributed across VM, physical server, and Raspberry Pi
  • Self-Hosted Git: No external dependencies—Gitea runs on-premise with automated backups
  • Production-Grade Networking: Traefik reverse proxy with automatic SSL (Cloudflare DNS challenge)

🏗️ Architecture

graph TB
    subgraph Internet
        CF[Cloudflare DNS]
    end
    
    subgraph "Heimdall (Proxmox VM - 10.0.0.151)"
        Traefik[Traefik Reverse Proxy<br/>:80, :443]
        Komodo[Komodo Core<br/>Container Orchestrator]
        Gitea[Gitea<br/>Self-Hosted Git]
        Redis[Redis Cache]
    end
    
    subgraph "Waldorf (Physical Server - 10.0.0.251)"
        Plex[Plex Media Server<br/>GPU Transcoding]
        Tunarr[Tunarr<br/>IPTV Channels]
        GPU[NVIDIA GTX 1060]
    end
    
    subgraph "Watchtower (Raspberry Pi 5 - 10.0.0.200)"
        Periphery[Komodo Periphery<br/>Remote Agent]
    end
    
    subgraph "TerraMaster NAS (10.0.0.250)"
        NFS[NFS Storage<br/>/Volume1/appdata]
    end
    
    CF -->|HTTPS| Traefik
    Traefik --> Gitea
    Traefik --> Komodo
    Traefik --> Plex
    Traefik --> Tunarr
    
    Komodo <-->|WebSocket| Periphery
    Gitea -->|Webhook| Komodo
    
    Plex --> GPU
    Tunarr --> GPU
    
    Heimdall -.->|NFSv3| NFS
    Waldorf -.->|NFSv3| NFS
    Watchtower -.->|NFSv3| NFS
    
    style Traefik fill:#326ce5,color:#fff
    style Komodo fill:#ff6b6b,color:#fff
    style GPU fill:#76b900,color:#fff
    style NFS fill:#f9a825,color:#000

📦 Infrastructure Inventory

Node IP Hardware Role Services
Heimdall 10.0.0.151 Proxmox VM
Intel N100, 16GB RAM
Core Services Komodo, Gitea, Traefik, Redis
Waldorf 10.0.0.251 Physical Server
i7-7820HQ, GTX 1060, 16GB
Media Processing Plex, Tunarr (GPU transcoding)
Watchtower 10.0.0.200 Raspberry Pi 5
ARM Cortex-A76, 16GB
Periphery Node Komodo Agent
TerraMaster 10.0.0.250 NAS Shared Storage NFSv3 (/Volume1/appdata)

Quick Start

Prerequisites

  • SSH access to nodes
  • Git configured with credentials:
    git config --global credential.helper wincred  # Windows
    git config --global core.autocrlf true
    

Clone & Deploy

# Clone from self-hosted Gitea
git clone https://git.castaldifamily.com/nathan/homelab.git
cd homelab

# Deploy a service (via Komodo UI or SSH)
ssh chester@10.0.0.251
cd /etc/komodo/stacks/tunarr
docker compose up -d

Automated GitOps Workflow

  1. Edit nodes/{node}/{service}/compose.yaml
  2. Commit and push to main branch
  3. Webhook triggers Komodo pull
  4. Auto-deploy updates running containers

🎯 Active Missions

Traffic Light System: 🟢 Complete | 🟡 In Progress | 🔴 Blocked

Status Mission Details
🟢 GitOps Migration All production stacks migrated to Git-based deployment
🟢 Webhook Automation Gitea webhooks trigger auto-deploy on push
🟢 GPU Passthrough NVIDIA GTX 1060 accessible in Plex/Tunarr containers
🟢 Documentation Structure KBAs and SOPs organized in documentation/
🟡 Hardware Transcoding Validation Monitor Plex for (hw) indicator during active streams
🟢 NFS Mount Stability NFSv3 forced on Raspberry Pi to prevent ID-domain errors
🟢 Credential Security Secrets managed via Komodo Environment Variables (not Git)

📂 Repository Structure

homelab/
├── nodes/                      # Service definitions per node
│   ├── heimdall/               # Core infrastructure (VM)
│   │   ├── core/               # Komodo, Traefik, Redis
│   │   └── gitea/              # Self-hosted Git
│   ├── waldorf/                # Media services (Physical)
│   │   ├── plex/               # Media server + GPU
│   │   └── tunarr/             # IPTV channels + GPU
│   └── watchtower/             # Periphery agent (Pi 5)
├── documentation/              # Technical knowledge base
│   ├── KBAs/                   # Troubleshooting guides
│   ├── SOPs/                   # Operational procedures
│   └── TECHNICAL_RUNBOOK.md    # Emergency reference
├── ansible/                    # (Future) Automated provisioning
└── scripts/                    # Utility scripts

🔧 Common Operations

Deploy a New Stack

# 1. Create directory structure
mkdir -p nodes/waldorf/sonarr

# 2. Create compose.yaml
cat > nodes/waldorf/sonarr/compose.yaml <<EOF
services:
  sonarr:
    image: lscr.io/linuxserver/sonarr:latest
    restart: unless-stopped
    ports:
      - 8989:8989
    volumes:
      - /mnt/appdata/sonarr:/config
EOF

# 3. Commit and push
git add nodes/waldorf/sonarr/
git commit -m "feat(stacks): add Sonarr to Waldorf"
git push

# 4. Configure in Komodo UI
# - Source Type: Git Repo
# - Run Directory: nodes/waldorf/sonarr
# - Deploy!

Check Service Status

# Via Komodo API
curl http://10.0.0.151:9000/api/stacks

# Direct SSH to node
ssh chester@10.0.0.251
docker ps | grep tunarr
docker logs tunarr --tail 50

Emergency Rollback

# In Komodo UI: Click "Rollback" on stack
# Or via Git:
git revert HEAD
git push  # Triggers auto-rollback

📚 Documentation

Document Purpose
TECHNICAL_RUNBOOK.md Infrastructure overview, emergency procedures, maintenance schedule
KBA-001 Troubleshooting Git-linked stack failures
SOP-001 Step-by-step guide to migrate stacks to GitOps
Node READMEs Hardware specs and service details per node

🛡️ Security & Best Practices

Secrets Management

  • NEVER commit passwords, API keys, or tokens to Git
  • DO use Komodo Environment Variables for secrets
  • DO use Gitea App Tokens for authentication (avoids SSH key exchange issues)

Example:

# In Git (compose.yaml)
environment:
  - PLEX_CLAIM=${PLEX_CLAIM}  # Placeholder

# In Komodo UI → Stack → Environment Variables
PLEX_CLAIM=claim-xxxxxxxxx

NFS Mount Configuration

Critical: Raspberry Pi requires NFSv3 (not v4) due to ID-domain mismatches:

# /etc/fstab on Watchtower
10.0.0.250:/Volume1/appdata /mnt/appdata nfs rw,nfsvers=3,hard,intr,x-systemd.automount,nolock 0 0

🔥 Emergency Procedures

NFS Mount Failure

# Check connectivity
ping 10.0.0.250

# Remount
sudo umount /mnt/appdata
sudo mount -a
df -h | grep appdata

Komodo Periphery Offline

# Check WebSocket connectivity
curl -v ws://10.0.0.151:9120

# Restart agent
docker restart komodo-periphery
docker logs -f komodo-periphery

Traefik SSL Certificate Issues

# Check Cloudflare API token
docker exec traefik cat /etc/traefik/traefik.yml

# Force certificate renewal
docker restart traefik
docker logs traefik | grep -i "cloudflare\|certificate"

🤝 Contributing

This is a personal homelab, but documentation improvements and issue reports are welcome!

  1. Fork via Gitea: https://git.castaldifamily.com/nathan/homelab
  2. Create feature branch: git checkout -b feat/my-improvement
  3. Commit using Conventional Commits
  4. Push and create Pull Request

📊 Stats

  • Total Nodes: 4 (3 compute + 1 storage)
  • Container Orchestration: Komodo v2.1.2
  • Active Services: 8+ (Traefik, Plex, Tunarr, Gitea, etc.)
  • Total RAM: 48GB (across compute nodes)
  • GPU Acceleration: NVIDIA GTX 1060 Mobile (6GB)
  • Storage: TerraMaster NAS + local node storage

📜 License

Personal infrastructure configuration. Documentation licensed under CC BY-SA 4.0.


Maintained by: Nathan Castaldi
Last Updated: April 12, 2026
Status: 🟢 Operational

Description
A GitOps-managed, Ansible-automated infrastructure running media services, container orchestration, and hypervisor management across distributed ARM and x86 nodes.
Readme 1.2 MiB
Languages
Shell 100%