272 lines
5.8 KiB
Markdown
272 lines
5.8 KiB
Markdown
# Technical Runbook: Castaldi Family Lab
|
|
|
|
**Status:** ACTIVE & OPERATIONAL
|
|
**Last Updated:** April 11, 2026
|
|
**Maintainer:** Nathan Castaldi
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Infrastructure Overview](#infrastructure-overview)
|
|
2. [Critical Fixes](#critical-fixes)
|
|
3. [Lessons Learned](#lessons-learned)
|
|
4. [Network Map](#network-map)
|
|
5. [Active Tasks](#active-tasks)
|
|
6. [Emergency Procedures](#emergency-procedures)
|
|
|
|
---
|
|
|
|
## Infrastructure Overview
|
|
|
|
### Node Inventory
|
|
|
|
| Node | IP Address | Hardware | Services |
|
|
|------|------------|----------|----------|
|
|
| **Heimdall** | 10.0.0.151 | Proxmox VM | Komodo Core, Gitea, Traefik |
|
|
| **Waldorf** | 10.0.0.XXX | NVIDIA GTX 1060 | Plex, Tunarr |
|
|
| **Watchtower** | 10.0.0.200 | Raspberry Pi | Komodo Periphery |
|
|
| **TerraMaster** | 10.0.0.250 | NAS | NFS Storage (`/Volume1/appdata`) |
|
|
|
|
### Repository Structure
|
|
|
|
```text
|
|
/nodes
|
|
/heimdall
|
|
/core # Komodo Core
|
|
/gitea # Git Repository Server
|
|
/waldorf
|
|
/plex # Media Server (NVIDIA Optimized)
|
|
/tunarr # Channel Management (GPU Passthrough)
|
|
/watchtower
|
|
# Komodo Periphery
|
|
```
|
|
|
|
---
|
|
|
|
## Critical Fixes
|
|
|
|
> ⚠️ **DO NOT REVERT THESE CONFIGURATIONS**
|
|
|
|
### 1. NFS Mount: Watchtower (Raspberry Pi)
|
|
|
|
**Problem:** Permission Denied on `/mnt/appdata` despite matching UIDs.
|
|
|
|
**Root Cause:** NFSv4 ID-domain mismatch between Pi and TerraMaster NAS.
|
|
|
|
**Solution:**
|
|
|
|
```bash
|
|
# /etc/fstab entry (Force NFSv3)
|
|
10.0.0.250:/Volume1/appdata /mnt/appdata nfs rw,nfsvers=3,hard,intr,x-systemd.automount,nolock 0 0
|
|
```
|
|
|
|
**Mount Point Ownership:**
|
|
|
|
```bash
|
|
# Set ownership WHILE UNMOUNTED
|
|
sudo chown chester:chester /mnt/appdata
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Komodo Periphery Connectivity
|
|
|
|
**Problem:** Hairpin NAT prevents `*.castaldifamily.com` access from internal nodes.
|
|
|
|
**Solution:**
|
|
|
|
- **Core URL (Internal):** `ws://10.0.0.151:9120`
|
|
- **Key Paths:** `/config/keys/periphery.pub`
|
|
- **Environment Variable:** `file:/config/keys/periphery.pub`
|
|
|
|
---
|
|
|
|
### 3. Gitea & GitOps
|
|
|
|
**Problem:** SSH Key Exchange (Kex) errors on Windows (`diffie-hellman-group1-sha1`).
|
|
|
|
**Solution:**
|
|
|
|
```bash
|
|
# Use HTTPS instead of SSH
|
|
git clone https://git.castaldifamily.com/nathan/homelab.git
|
|
|
|
# Windows Credential Storage
|
|
git config --global credential.helper wincred
|
|
|
|
# Cross-Platform Line Endings
|
|
git config --global core.autocrlf true
|
|
|
|
# Network Share Permissions
|
|
git config --global safe.directory "*"
|
|
```
|
|
|
|
---
|
|
|
|
### 4. GPU Passthrough (Plex/Waldorf)
|
|
|
|
**Problem:** Plex sees GPU but doesn't use it for hardware transcoding.
|
|
|
|
**Solution:**
|
|
|
|
```yaml
|
|
# compose.yaml
|
|
services:
|
|
plex:
|
|
runtime: nvidia
|
|
deploy:
|
|
resources:
|
|
reservations:
|
|
devices:
|
|
- driver: nvidia
|
|
count: all
|
|
capabilities: [gpu]
|
|
```
|
|
|
|
**Verification:**
|
|
|
|
- Monitor Plex Dashboard for `(hw)` status during transcoding.
|
|
|
|
---
|
|
|
|
## Lessons Learned
|
|
|
|
### "NFSv4 is too smart"
|
|
|
|
Modern NFS (v4) tries to sync user identities across a "Domain." If the Pi and NAS don't agree on the domain name, it defaults to `nobody`.
|
|
|
|
**Fix:** Force NFSv3—it only checks UID numbers (1000).
|
|
|
|
---
|
|
|
|
### "Naked Mount Point"
|
|
|
|
If the local folder (`/mnt/appdata`) is owned by `root`, you can't "pass through" to see NAS data once it mounts.
|
|
|
|
**Fix:** `chown` the mount point to the user **while unmounted**.
|
|
|
|
---
|
|
|
|
### "Hairpin NAT"
|
|
|
|
Many routers won't let internal traffic go out to a public IP and then back in (Hairpinning).
|
|
|
|
**Fix:** Use **Internal IPs** (`10.0.0.X`) for node-to-node communication.
|
|
|
|
---
|
|
|
|
### "GPU Passthrough"
|
|
|
|
Docker isolation is strict. Simply having drivers on the host isn't enough.
|
|
|
|
**Fix:** Use `deploy: resources: reservations` block in Compose to "hand the keys" of the hardware to the container.
|
|
|
|
---
|
|
|
|
## Network Map
|
|
|
|
| Service | Protocol | Internal Address | External URL |
|
|
|---------|----------|------------------|--------------|
|
|
| **Komodo Core** | HTTP | `10.0.0.151:9000` | `komodo.castaldifamily.com` |
|
|
| **Gitea** | HTTPS | `10.0.0.151:3000` | `git.castaldifamily.com` |
|
|
| **Plex** | Host Network | `10.0.0.XXX:32400` | `plex.castaldifamily.com` |
|
|
| **Tunarr** | HTTP | `10.0.0.XXX:8000` | `tunarr.castaldifamily.com` |
|
|
|
|
---
|
|
|
|
## Active Tasks
|
|
|
|
### Current Focus
|
|
|
|
1. **Git-ify Stacks**
|
|
- ✅ `plex` and `tunarr` pushed to Gitea
|
|
- ⏳ Convert remaining "Manual" stacks to "Git" sources in Komodo
|
|
|
|
2. **Webhooks**
|
|
- ⏳ Ensure Gitea Webhooks fire to Komodo Stack URLs for auto-deployment
|
|
|
|
3. **Hardware Transcoding**
|
|
- ⏳ Monitor Waldorf for `(hw)` status in Plex
|
|
|
|
---
|
|
|
|
## Emergency Procedures
|
|
|
|
### 🔥 NFS Mount Failure (Watchtower)
|
|
|
|
```bash
|
|
# Check NFS Server
|
|
ping 10.0.0.250
|
|
|
|
# Remount NFS Share
|
|
sudo umount /mnt/appdata
|
|
sudo mount -a
|
|
|
|
# Verify Mount
|
|
df -h | grep appdata
|
|
```
|
|
|
|
---
|
|
|
|
### 🔥 Komodo Periphery Offline
|
|
|
|
```bash
|
|
# Check Core Connectivity
|
|
curl -v ws://10.0.0.151:9120
|
|
|
|
# Restart Periphery Container
|
|
docker restart komodo-periphery
|
|
docker logs -f komodo-periphery
|
|
```
|
|
|
|
---
|
|
|
|
### 🔥 Plex Not Using GPU
|
|
|
|
```bash
|
|
# Verify NVIDIA Runtime
|
|
docker info | grep -i nvidia
|
|
|
|
# Check GPU Access in Container
|
|
docker exec -it plex nvidia-smi
|
|
```
|
|
|
|
---
|
|
|
|
### 🔥 Git Authentication Failure
|
|
|
|
```bash
|
|
# Regenerate Gitea App Token
|
|
# Settings > Applications > Generate New Token
|
|
|
|
# Update Credential Helper
|
|
git config --global credential.helper wincred
|
|
|
|
# Test Clone
|
|
git clone https://git.castaldifamily.com/nathan/homelab.git
|
|
```
|
|
|
|
---
|
|
|
|
## Credential Management
|
|
|
|
- ❌ **DO NOT** store passwords in `compose.yaml` in Git repo
|
|
- ✅ **DO** use Komodo Stack "Environment Variables" to inject secrets
|
|
- ✅ **DO** use Gitea **App Tokens** for Git authentication (iPad/Windows)
|
|
|
|
---
|
|
|
|
## Maintenance Schedule
|
|
|
|
| Task | Frequency | Notes |
|
|
|------|-----------|-------|
|
|
| Update Docker Images | Weekly | Via Komodo or Watchtower |
|
|
| Backup Gitea | Weekly | `/data/gitea` directory |
|
|
| Backup Plex Metadata | Monthly | `/config/Library` directory |
|
|
| Check NFS Mount Health | Monthly | `df -h`, verify permissions |
|
|
|
|
---
|
|
|
|
**End of Runbook**
|