From e087670ca576590bb5ef7dbede2cb4531b88888c Mon Sep 17 00:00:00 2001 From: Nathan Date: Mon, 13 Apr 2026 21:01:57 -0400 Subject: [PATCH] feat(readme): update infrastructure description and enhance automation details --- README.md | 200 ++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 142 insertions(+), 58 deletions(-) diff --git a/README.md b/README.md index 86b45bc..e9c2843 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,9 @@ # Castaldi Family Homelab -> **A GitOps-managed, self-hosted infrastructure running media services, container orchestration, and automation across distributed ARM and x86 nodes.** +> **A GitOps-managed, Ansible-automated infrastructure running media services, container orchestration, and hypervisor management across distributed ARM and x86 nodes.** [![GitOps](https://img.shields.io/badge/GitOps-Komodo-blue?style=flat-square)](https://komo.do) +[![Automation](https://img.shields.io/badge/Automation-Ansible-red?style=flat-square)](#automation) [![Infrastructure](https://img.shields.io/badge/Infrastructure-Multi--Node-green?style=flat-square)](#architecture) [![Documentation](https://img.shields.io/badge/Docs-KBAs%20%2B%20SOPs-orange?style=flat-square)](documentation/) @@ -11,11 +12,13 @@ ## πŸš€ Why This Homelab? - **Zero-Touch Deployments:** Push to Git β†’ Auto-deploy via webhooks β†’ Containers update automatically -- **Infrastructure as Code:** All services defined in version-controlled `compose.yaml` files -- **GPU Transcoding:** Hardware-accelerated media streaming with NVIDIA GTX 1060 -- **Distributed Architecture:** Services intelligently distributed across VM, physical server, and Raspberry Pi +- **Ansible Automation:** All nodes managed by Ansible from watchtower control plane +- **Infrastructure as Code:** Services defined in `compose.yaml` + infrastructure managed with Ansible playbooks +- **GPU Transcoding:** Hardware-accelerated media streaming with NVIDIA GTX 1060 Mobile +- **Distributed Architecture:** Services across Proxmox hypervisor, VMs, physical servers, and Raspberry Pi - **Self-Hosted Git:** No external dependenciesβ€”Gitea runs on-premise with automated backups - **Production-Grade Networking:** Traefik reverse proxy with automatic SSL (Cloudflare DNS challenge) +- **Hypervisor Management:** Proxmox VE for VM orchestration with automated post-install configuration --- @@ -27,25 +30,32 @@ graph TB CF[Cloudflare DNS] end - subgraph "Heimdall (Proxmox VM - 10.0.0.151)" - Traefik[Traefik Reverse Proxy
:80, :443] - Komodo[Komodo Core
Container Orchestrator] - Gitea[Gitea
Self-Hosted Git] - Redis[Redis Cache] + subgraph "PVE01 - Proxmox VE Hypervisor (10.0.0.201)" + subgraph "Heimdall VM (10.0.0.151)" + Traefik[Traefik Reverse Proxy
:80, :443] + Komodo[Komodo Core
Container Orchestrator] + Gitea[Gitea
Self-Hosted Git] + Redis[Redis Cache] + Trek[Trek] + Vault[Vaultwarden] + end end - subgraph "Waldorf (Physical Server - 10.0.0.251)" + subgraph "Waldorf - Physical Server (10.0.0.251)" Plex[Plex Media Server
GPU Transcoding] Tunarr[Tunarr
IPTV Channels] - GPU[NVIDIA GTX 1060] + GPU[NVIDIA GTX 1060 Mobile
6GB VRAM] + KomodoW[Komodo Periphery] end - subgraph "Watchtower (Raspberry Pi 5 - 10.0.0.200)" - Periphery[Komodo Periphery
Remote Agent] + subgraph "Watchtower - Raspberry Pi 5 (10.0.0.200)" + Ansible[Ansible Control Node
Infrastructure Automation] + KomodoP[Komodo Periphery] + VSCode[VS Code Server] end subgraph "TerraMaster NAS (10.0.0.250)" - NFS[NFS Storage
/Volume1/appdata] + NFS[NFS Storage
Volume1: /appdata
Volume2: /media] end CF -->|HTTPS| Traefik @@ -54,32 +64,36 @@ graph TB Traefik --> Plex Traefik --> Tunarr - Komodo <-->|WebSocket| Periphery + Komodo <-->|WebSocket| KomodoW + Komodo <-->|WebSocket| KomodoP Gitea -->|Webhook| Komodo + Ansible -.->|SSH| PVE01 + Ansible -.->|SSH| Heimdall + Ansible -.->|SSH| Waldorf + Plex --> GPU Tunarr --> GPU - Heimdall -.->|NFSv3| NFS - Waldorf -.->|NFSv3| NFS - Watchtower -.->|NFSv3| NFS + Heimdall -.->|NFS v4| NFS + Waldorf -.->|NFS v4| NFS + Watchtower -.->|NFS v3| NFS - style Traefik fill:#326ce5,color:#fff - style Komodo fill:#ff6b6b,color:#fff - style GPU fill:#76b900,color:#fff style NFS fill:#f9a825,color:#000 + style PVE01 fill:#e57000,color:#fff ``` --- ## πŸ“¦ Infrastructure Inventory -| Node | IP | Hardware | Role | Services | -|------|------|----------|------|----------| -| **Heimdall** | `10.0.0.151` | Proxmox VM
Intel N100, 16GB RAM | Core Services | Komodo, Gitea, Traefik, Redis | -| **Waldorf** | `10.0.0.251` | Physical Server
i7-7820HQ, GTX 1060, 16GB | Media Processing | Plex, Tunarr (GPU transcoding) | -| **Watchtower** | `10.0.0.200` | Raspberry Pi 5
ARM Cortex-A76, 16GB | Periphery Node | Komodo Agent | -| **TerraMaster** | `10.0.0.250` | NAS | Shared Storage | NFSv3 (`/Volume1/appdata`) | +| Node | IP | Hardware | Platform/OS | Role | Services | +|------|------|----------|----------|------|----------| +| **PVE01** | `10.0.0.201` | Physical Server
Intel i5-13500T (14c), 15GB RAM | Proxmox VE 9.1.7 | Hypervisor | Hosts Heimdall VM | +| **Heimdall** | `10.0.0.151` | Proxmox VM on PVE01
Intel N100 (4c), 15GB RAM | Ubuntu 24.04 | Core Services | Komodo Core, Gitea, Traefik, Redis, Trek, Vaultwarden | +| **Waldorf** | `10.0.0.251` | Physical Server
i7-7820HQ (8c), GTX 1060, 16GB | Ubuntu 24.04 | Media Processing | Plex, Tunarr (GPU transcoding), Komodo Periphery | +| **Watchtower** | `10.0.0.200` | Raspberry Pi 5
ARM Cortex-A76 (4c), 16GB | Debian Trixie | Control Plane | Ansible, Komodo Periphery, VS Code Server | +| **TerraMaster** | `10.0.0.250` | NAS | TOS | Shared Storage | NFS (Volume1: `/appdata`, Volume2: `/media`) | --- @@ -109,10 +123,49 @@ docker compose up -d ### Automated GitOps Workflow -1. **Edit** `nodes/{node}/{service}/compose.yaml` -2. **Commit** and push to `main` branch -3. **Webhook** triggers Komodo pull -4. **Auto-deploy** updates running containers +1. **Edit** `nodes/{node}/{service}/compose.yaml` locally +2. **Commit** and push to Gitea: `git add . && git commit -m "feat: update service" && git push` +3. **Webhook** triggers Komodo Core (heimdall) +4. **Auto-deploy** pulls latest code and restarts containers +5. **Monitor** via Komodo UI at `http://10.0.0.151:9000` + +--- + +## βš™οΈ Automation + +### Ansible Control Plane + +**Watchtower** (10.0.0.200) manages all infrastructure via Ansible: + +```bash +# SSH into control node +ssh chester@10.0.0.200 +cd ~/homelab/ansible + +# Test connectivity to all nodes +ansible all -m ping + +# Gather live system facts +ansible-playbook playbooks/gather-node-facts.yml + +# Deploy Proxmox post-install config +ansible-playbook playbooks/onboard-proxmox.yml --limit pve01 + +# Run commands across node groups +ansible docker_nodes -m command -a "docker ps" +ansible proxmox_cluster -m command -a "pveversion" +``` + +### Managed Node Groups + +```yaml +control_plane: watchtower +docker_nodes: heimdall, waldorf +proxmox_cluster: pve01 +nfs_clients: heimdall, waldorf +core_services: heimdall +media_services: waldorf +``` --- @@ -122,13 +175,13 @@ docker compose up -d | Status | Mission | Details | |--------|---------|---------| -| 🟒 | **GitOps Migration** | All production stacks migrated to Git-based deployment | -| 🟒 | **Webhook Automation** | Gitea webhooks trigger auto-deploy on push | -| 🟒 | **GPU Passthrough** | NVIDIA GTX 1060 accessible in Plex/Tunarr containers | +| 🟒 | **Komodo GitOps** | All stacks migrated to Git sources with webhook automation | +| 🟒 | **GPU Transcoding** | GTX 1060 Mobile accessible in Plex/Tunarr containers | | 🟒 | **Documentation Structure** | KBAs and SOPs organized in `documentation/` | +| 🟒 | **Ansible Automation** | All 4 nodes onboarded and managed by Ansible from Watchtower | +| 🟒 | **Proxmox Post-Install** | PVE01 configured: subscription nag removed, repos optimized | | 🟑 | **Hardware Transcoding Validation** | Monitor Plex for `(hw)` indicator during active streams | -| 🟒 | **NFS Mount Stability** | NFSv3 forced on Raspberry Pi to prevent ID-domain errors | -| 🟒 | **Credential Security** | Secrets managed via Komodo Environment Variables (not Git) | +| 🟒 | **NFS Mount Stability** | NFSv3 on Pi, NFSv4 on x86 nodes | --- @@ -136,20 +189,36 @@ docker compose up -d ``` homelab/ +β”œβ”€β”€ ansible/ # Ansible automation (active) +β”‚ β”œβ”€β”€ inventory/ # Managed hosts and groups +β”‚ β”‚ β”œβ”€β”€ hosts.ini # 4-node inventory +β”‚ β”‚ └── host_vars/ # Per-node configuration +β”‚ β”œβ”€β”€ playbooks/ # Automation workflows +β”‚ β”‚ β”œβ”€β”€ onboard-nodes.yml # Node SSH key deployment +β”‚ β”‚ β”œβ”€β”€ onboard-proxmox.yml # Proxmox post-install +β”‚ β”‚ └── gather-node-facts.yml # System discovery +β”‚ β”œβ”€β”€ roles/ # Reusable automation +β”‚ β”‚ └── proxmox_post_install/ # Nag removal, repo config +β”‚ └── group_vars/ # Global variables β”œβ”€β”€ nodes/ # Service definitions per node -β”‚ β”œβ”€β”€ heimdall/ # Core infrastructure (VM) +β”‚ β”œβ”€β”€ heimdall/ # Core infrastructure (VM on PVE01) β”‚ β”‚ β”œβ”€β”€ core/ # Komodo, Traefik, Redis -β”‚ β”‚ └── gitea/ # Self-hosted Git +β”‚ β”‚ β”œβ”€β”€ trek/ # Trek service +β”‚ β”‚ β”œβ”€β”€ vaultwarden/ # Password manager +β”‚ β”‚ └── (gitea via Komodo) # Self-hosted Git β”‚ β”œβ”€β”€ waldorf/ # Media services (Physical) β”‚ β”‚ β”œβ”€β”€ plex/ # Media server + GPU β”‚ β”‚ └── tunarr/ # IPTV channels + GPU -β”‚ └── watchtower/ # Periphery agent (Pi 5) +β”‚ └── watchtower/ # Control plane (Pi 5) +β”‚ └── vscode/ # Remote development β”œβ”€β”€ documentation/ # Technical knowledge base β”‚ β”œβ”€β”€ KBAs/ # Troubleshooting guides β”‚ β”œβ”€β”€ SOPs/ # Operational procedures +β”‚ β”œβ”€β”€ plans/ # Implementation roadmaps β”‚ └── TECHNICAL_RUNBOOK.md # Emergency reference -β”œβ”€β”€ ansible/ # (Future) Automated provisioning └── scripts/ # Utility scripts + β”œβ”€β”€ bootstrap.sh # Day-0 node initialization + └── lib/ # Shared function libraries ``` --- @@ -231,10 +300,11 @@ Example: ```yaml # In Git (compose.yaml) environment: - - PLEX_CLAIM=${PLEX_CLAIM} # Placeholder + - PUID=1000 + - PGID=1000 + - API_KEY=${PLEX_API_KEY} # Injected by Komodo -# In Komodo UI β†’ Stack β†’ Environment Variables -PLEX_CLAIM=claim-xxxxxxxxx +# In Komodo UI: Set PLEX_API_KEY in Environment Variables ``` ### NFS Mount Configuration @@ -242,10 +312,34 @@ PLEX_CLAIM=claim-xxxxxxxxx **Critical:** Raspberry Pi requires NFSv3 (not v4) due to ID-domain mismatches: ```bash -# /etc/fstab on Watchtower -10.0.0.250:/Volume1/appdata /mnt/appdata nfs rw,nfsvers=3,hard,intr,x-systemd.automount,nolock 0 0 +# /etc/fstab on Watchtower (Pi 5) +10.0.0.250:/Volume1/appdata /mnt/appdata nfs nfsvers=3,rw,sync 0 0 + +# /etc/fstab on Heimdall/Waldorf (x86 Ubuntu) +10.0.0.250:/Volume1/appdata /mnt/appdata nfs4 rw,sync 0 0 ``` +### Backup Strategy + +- **Git Repository:** Daily backups via Gitea's built-in backup feature +- **Docker Volumes:** Weekly snapshots to `/mnt/appdata/backups/` +- **Proxmox VMs:** Daily snapshots with 7-day retention +- **Configuration Files:** Tracked in Git under `nodes/{hostname}/` + +--- + +## πŸ“Š Stats + +- **Total Nodes:** 5 (1 hypervisor + 3 compute + 1 storage) +- **Automation:** Ansible managing 4 active nodes from Watchtower +- **Container Orchestration:** Komodo v2.1.2 +- **Active Services:** 12+ (Traefik, Plex, Tunarr, Gitea, Trek, Vaultwarden, etc.) +- **Total RAM:** 62GB (15GB PVE01 + 15GB Heimdall + 16GB Waldorf + 16GB Watchtower) +- **Total CPU Cores:** 30 physical (14c i5-13500T + 8c i7-7820HQ + 4c N100 + 4c ARM) +- **Virtualization:** Proxmox VE 9.1.7 hosting 1 VM (expandable) +- **GPU Acceleration:** NVIDIA GTX 1060 Mobile (6GB VRAM) +- **Storage:** TerraMaster NAS (NFSv3/v4) + --- ## πŸ”₯ Emergency Procedures @@ -297,17 +391,6 @@ This is a personal homelab, but documentation improvements and issue reports are --- -## πŸ“Š Stats - -- **Total Nodes:** 4 (3 compute + 1 storage) -- **Container Orchestration:** Komodo v2.1.2 -- **Active Services:** 8+ (Traefik, Plex, Tunarr, Gitea, etc.) -- **Total RAM:** 48GB (across compute nodes) -- **GPU Acceleration:** NVIDIA GTX 1060 Mobile (6GB) -- **Storage:** TerraMaster NAS + local node storage - ---- - ## πŸ“œ License Personal infrastructure configuration. Documentation licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/). @@ -315,5 +398,6 @@ Personal infrastructure configuration. Documentation licensed under [CC BY-SA 4. --- **Maintained by:** Nathan Castaldi -**Last Updated:** April 12, 2026 -**Status:** 🟒 Operational +**Last Updated:** April 13, 2026 +**Status:** 🟒 Operational +**Automation Status:** 🟒 Ansible Fully Deployed