chore: remove outdated infrastructure inventory and emergency procedures from README.md
This commit is contained in:
parent
7eff91e305
commit
3242383508
295
README.md
295
README.md
@ -22,257 +22,6 @@
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 📦 Infrastructure Inventory
|
|
||||||
|
|
||||||
| Node | IP | Hardware | Platform/OS | Role | Services |
|
|
||||||
|------|------|----------|----------|------|----------|
|
|
||||||
| **PVE01** | `10.0.0.201` | Physical Server<br/>Intel i5-13500T (14c), 15GB RAM | Proxmox VE 9.1.7 | Hypervisor | VM orchestration platform |
|
|
||||||
| **Heimdall** | `10.0.0.151` | Physical Server<br/>Intel N100 (4c), 15GB RAM | Ubuntu 24.04 | Core Services | Komodo, Gitea, Traefik |
|
|
||||||
| **Waldorf** | `10.0.0.251` | Physical Server<br/>i7-7820HQ (8c), GTX 1060, 16GB | Ubuntu 24.04 | Media Processing | Plex and Related Media Services |
|
|
||||||
| **Watchtower** | `10.0.0.200` | Physical Server<br/>ARM Cortex-A76 (4c), 16GB | Debian Trixie | Control Plane | Ansible, VS Code, Monitoring Tools |
|
|
||||||
| **TerraMaster** | `10.0.0.250` | NAS | TOS | Shared Storage | NFS (Volume1: `/appdata`, Volume2: `/media`) |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## ⚡ Quick Start
|
|
||||||
|
|
||||||
### Prerequisites
|
|
||||||
|
|
||||||
- SSH access to nodes
|
|
||||||
- Git configured with credentials:
|
|
||||||
```bash
|
|
||||||
git config --global credential.helper wincred # Windows
|
|
||||||
git config --global core.autocrlf true
|
|
||||||
```
|
|
||||||
|
|
||||||
### Clone & Deploy
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Clone from self-hosted Gitea
|
|
||||||
git clone https://git.castaldifamily.com/nathan/homelab.git
|
|
||||||
cd homelab
|
|
||||||
|
|
||||||
# Deploy a service (via Komodo UI or SSH)
|
|
||||||
ssh chester@10.0.0.251
|
|
||||||
cd /etc/komodo/stacks/tunarr
|
|
||||||
docker compose up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
### Automated GitOps Workflow
|
|
||||||
|
|
||||||
1. **Edit** `nodes/{node}/{service}/compose.yaml` locally
|
|
||||||
2. **Commit** and push to Gitea: `git add . && git commit -m "feat: update service" && git push`
|
|
||||||
3. **Webhook** triggers Komodo Core (heimdall)
|
|
||||||
4. **Auto-deploy** pulls latest code and restarts containers
|
|
||||||
5. **Monitor** via Komodo UI at `http://10.0.0.151:9000`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## ⚙️ Automation
|
|
||||||
|
|
||||||
### Ansible Control Plane
|
|
||||||
|
|
||||||
**Watchtower** (10.0.0.200) manages all infrastructure via Ansible:
|
|
||||||
|
|
||||||
**Status:** 🟢 **PRODUCTION READY** (4 nodes, all responding)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# SSH into control node
|
|
||||||
ssh chester@10.0.0.200
|
|
||||||
cd ~/homelab/ansible
|
|
||||||
|
|
||||||
# Quick health check
|
|
||||||
./validate-environment.sh
|
|
||||||
|
|
||||||
# Test connectivity to all nodes
|
|
||||||
ansible all -m ping
|
|
||||||
|
|
||||||
# Gather live system facts
|
|
||||||
ansible-playbook playbooks/gather-node-facts.yml
|
|
||||||
|
|
||||||
# Deploy Proxmox post-install config
|
|
||||||
ansible-playbook playbooks/onboard-proxmox.yml --limit pve01
|
|
||||||
|
|
||||||
# Run commands across node groups
|
|
||||||
ansible docker_nodes -m command -a "docker ps"
|
|
||||||
ansible proxmox_cluster -m command -a "pveversion"
|
|
||||||
```
|
|
||||||
|
|
||||||
**Quick Reference:** See [ansible/QUICK-REFERENCE.md](ansible/QUICK-REFERENCE.md) for comprehensive command guide.
|
|
||||||
**Setup Documentation:** [documentation/plans/plan-ansibleSetup.md](documentation/plans/plan-ansibleSetup.md)
|
|
||||||
|
|
||||||
### Managed Node Groups
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
control_plane: watchtower
|
|
||||||
docker_nodes: heimdall, waldorf
|
|
||||||
proxmox_cluster: pve01
|
|
||||||
nfs_clients: heimdall, waldorf
|
|
||||||
core_services: heimdall
|
|
||||||
media_services: waldorf
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🎯 Active Missions
|
|
||||||
|
|
||||||
> **Traffic Light System:** 🟢 Complete | 🟡 In Progress | 🔴 Blocked
|
|
||||||
|
|
||||||
| Status | Mission | Details |
|
|
||||||
|--------|---------|---------|
|
|
||||||
| 🟢 | **Komodo GitOps** | All stacks migrated to Git sources with webhook automation |
|
|
||||||
| 🟢 | **GPU Transcoding** | GTX 1060 Mobile accessible in Plex/Tunarr containers |
|
|
||||||
| 🟢 | **Documentation Structure** | KBAs and SOPs organized in `documentation/` |
|
|
||||||
| 🟢 | **Ansible Automation** | All 4 nodes onboarded and managed by Ansible from Watchtower |
|
|
||||||
| 🟢 | **Proxmox Post-Install** | PVE01 configured: subscription nag removed, repos optimized |
|
|
||||||
| 🟡 | **Hardware Transcoding Validation** | Monitor Plex for `(hw)` indicator during active streams |
|
|
||||||
| 🟢 | **NFS Mount Stability** | NFSv3 on Pi, NFSv4 on x86 nodes |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 📂 Repository Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
homelab/
|
|
||||||
├── ansible/ # Ansible automation (active)
|
|
||||||
│ ├── inventory/ # Managed hosts and groups
|
|
||||||
│ │ ├── hosts.ini # 4-node inventory
|
|
||||||
│ │ └── host_vars/ # Per-node configuration
|
|
||||||
│ ├── playbooks/ # Automation workflows
|
|
||||||
│ │ ├── onboard-nodes.yml # Node SSH key deployment
|
|
||||||
│ │ ├── onboard-proxmox.yml # Proxmox post-install
|
|
||||||
│ │ └── gather-node-facts.yml # System discovery
|
|
||||||
│ ├── roles/ # Reusable automation
|
|
||||||
│ │ └── proxmox_post_install/ # Nag removal, repo config
|
|
||||||
│ └── group_vars/ # Global variables
|
|
||||||
├── nodes/ # Service definitions per node
|
|
||||||
│ ├── heimdall/ # Core infrastructure (Physical)
|
|
||||||
│ │ ├── core/ # Komodo, Traefik, Redis
|
|
||||||
│ │ ├── trek/ # Trek service
|
|
||||||
│ │ ├── vaultwarden/ # Password manager
|
|
||||||
│ │ └── (gitea via Komodo) # Self-hosted Git
|
|
||||||
│ ├── waldorf/ # Media services (Physical)
|
|
||||||
│ │ ├── plex/ # Media server + GPU
|
|
||||||
│ │ └── tunarr/ # IPTV channels + GPU
|
|
||||||
│ └── watchtower/ # Control plane (Pi 5)
|
|
||||||
│ └── vscode/ # Remote development
|
|
||||||
├── documentation/ # Technical knowledge base
|
|
||||||
│ ├── KBAs/ # Troubleshooting guides
|
|
||||||
│ ├── SOPs/ # Operational procedures
|
|
||||||
│ ├── plans/ # Implementation roadmaps
|
|
||||||
│ └── TECHNICAL_RUNBOOK.md # Emergency reference
|
|
||||||
└── scripts/ # Utility scripts
|
|
||||||
├── bootstrap.sh # Day-0 node initialization
|
|
||||||
└── lib/ # Shared function libraries
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🔧 Common Operations
|
|
||||||
|
|
||||||
### Deploy a New Stack
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 1. Create directory structure
|
|
||||||
mkdir -p nodes/waldorf/sonarr
|
|
||||||
|
|
||||||
# 2. Create compose.yaml
|
|
||||||
cat > nodes/waldorf/sonarr/compose.yaml <<EOF
|
|
||||||
services:
|
|
||||||
sonarr:
|
|
||||||
image: lscr.io/linuxserver/sonarr:latest
|
|
||||||
restart: unless-stopped
|
|
||||||
ports:
|
|
||||||
- 8989:8989
|
|
||||||
volumes:
|
|
||||||
- /mnt/appdata/sonarr:/config
|
|
||||||
EOF
|
|
||||||
|
|
||||||
# 3. Commit and push
|
|
||||||
git add nodes/waldorf/sonarr/
|
|
||||||
git commit -m "feat(stacks): add Sonarr to Waldorf"
|
|
||||||
git push
|
|
||||||
|
|
||||||
# 4. Configure in Komodo UI
|
|
||||||
# - Source Type: Git Repo
|
|
||||||
# - Run Directory: nodes/waldorf/sonarr
|
|
||||||
# - Deploy!
|
|
||||||
```
|
|
||||||
|
|
||||||
### Check Service Status
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Via Komodo API
|
|
||||||
curl http://10.0.0.151:9000/api/stacks
|
|
||||||
|
|
||||||
# Direct SSH to node
|
|
||||||
ssh chester@10.0.0.251
|
|
||||||
docker ps | grep tunarr
|
|
||||||
docker logs tunarr --tail 50
|
|
||||||
```
|
|
||||||
|
|
||||||
### Emergency Rollback
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# In Komodo UI: Click "Rollback" on stack
|
|
||||||
# Or via Git:
|
|
||||||
git revert HEAD
|
|
||||||
git push # Triggers auto-rollback
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 📚 Documentation
|
|
||||||
|
|
||||||
| Document | Purpose |
|
|
||||||
|----------|---------|
|
|
||||||
| [TECHNICAL_RUNBOOK.md](documentation/TECHNICAL_RUNBOOK.md) | Infrastructure overview, emergency procedures, maintenance schedule |
|
|
||||||
| [KBA-001](documentation/KBAs/KBA-001-Komodo-GitOps-Stack-Deployment-Failures.md) | Troubleshooting Git-linked stack failures |
|
|
||||||
| [SOP-001](documentation/SOPs/SOP-001-Migrate-Stack-from-UI-to-Git.md) | Step-by-step guide to migrate stacks to GitOps |
|
|
||||||
| [Node READMEs](nodes/) | Hardware specs and service details per node |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🛡️ Security & Best Practices
|
|
||||||
|
|
||||||
### Secrets Management
|
|
||||||
|
|
||||||
- ❌ **NEVER** commit passwords, API keys, or tokens to Git
|
|
||||||
- ✅ **DO** use Komodo Environment Variables for secrets
|
|
||||||
- ✅ **DO** use Gitea App Tokens for authentication (avoids SSH key exchange issues)
|
|
||||||
|
|
||||||
Example:
|
|
||||||
```yaml
|
|
||||||
# In Git (compose.yaml)
|
|
||||||
environment:
|
|
||||||
- PUID=1000
|
|
||||||
- PGID=1000
|
|
||||||
- API_KEY=${PLEX_API_KEY} # Injected by Komodo
|
|
||||||
|
|
||||||
# In Komodo UI: Set PLEX_API_KEY in Environment Variables
|
|
||||||
```
|
|
||||||
|
|
||||||
### NFS Mount Configuration
|
|
||||||
|
|
||||||
**Critical:** Raspberry Pi requires NFSv3 (not v4) due to ID-domain mismatches:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# /etc/fstab on Watchtower (Pi 5)
|
|
||||||
10.0.0.250:/Volume1/appdata /mnt/appdata nfs nfsvers=3,rw,sync 0 0
|
|
||||||
|
|
||||||
# /etc/fstab on Heimdall/Waldorf (x86 Ubuntu)
|
|
||||||
10.0.0.250:/Volume1/appdata /mnt/appdata nfs4 rw,sync 0 0
|
|
||||||
```
|
|
||||||
|
|
||||||
### Backup Strategy
|
|
||||||
|
|
||||||
- **Git Repository:** Daily backups via Gitea's built-in backup feature
|
|
||||||
- **Docker Volumes:** Weekly snapshots to `/mnt/appdata/backups/`
|
|
||||||
- **Proxmox VMs:** Daily snapshots with 7-day retention (when VMs are deployed)
|
|
||||||
- **Configuration Files:** Tracked in Git under `nodes/{hostname}/`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 📊 Stats
|
## 📊 Stats
|
||||||
|
|
||||||
- **Total Nodes:** 5 (1 hypervisor + 3 compute + 1 storage)
|
- **Total Nodes:** 5 (1 hypervisor + 3 compute + 1 storage)
|
||||||
@ -287,44 +36,6 @@ environment:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🔥 Emergency Procedures
|
|
||||||
|
|
||||||
### NFS Mount Failure
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check connectivity
|
|
||||||
ping 10.0.0.250
|
|
||||||
|
|
||||||
# Remount
|
|
||||||
sudo umount /mnt/appdata
|
|
||||||
sudo mount -a
|
|
||||||
df -h | grep appdata
|
|
||||||
```
|
|
||||||
|
|
||||||
### Komodo Periphery Offline
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check WebSocket connectivity
|
|
||||||
curl -v ws://10.0.0.151:9120
|
|
||||||
|
|
||||||
# Restart agent
|
|
||||||
docker restart komodo-periphery
|
|
||||||
docker logs -f komodo-periphery
|
|
||||||
```
|
|
||||||
|
|
||||||
### Traefik SSL Certificate Issues
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check Cloudflare API token
|
|
||||||
docker exec traefik cat /etc/traefik/traefik.yml
|
|
||||||
|
|
||||||
# Force certificate renewal
|
|
||||||
docker restart traefik
|
|
||||||
docker logs traefik | grep -i "cloudflare\|certificate"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🤝 Contributing
|
## 🤝 Contributing
|
||||||
|
|
||||||
This is a personal homelab, but documentation improvements and issue reports are welcome!
|
This is a personal homelab, but documentation improvements and issue reports are welcome!
|
||||||
@ -343,6 +54,6 @@ Personal infrastructure configuration. Documentation licensed under [CC BY-SA 4.
|
|||||||
---
|
---
|
||||||
|
|
||||||
**Maintained by:** Nathan Castaldi
|
**Maintained by:** Nathan Castaldi
|
||||||
**Last Updated:** April 13, 2026
|
**Last Updated:** April 21, 2026
|
||||||
**Status:** 🟢 Operational
|
**Status:** 🟢
|
||||||
**Automation Status:** 🟢 Ansible Fully Deployed
|
**Automation Status:** 🟢
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user