homelab/ComputePlane.md at 5672e113b21e22431a57178e80d55784181aa470 - homelab - Gitea: Git with a cup of tea

nathan/homelab

nathan bcd4688523 renamed folder to make contents clearer

2026-04-12 16:24:56 -04:00

2.8 KiB

Raw Blame History

✅ Point 2 – Compute Plane (OptiPlex Proxmox Cluster) – FINAL

Role

Cluster that runs all Docker Swarm workloads
Separate from out-of-band control (Watchtower)
Designed to tolerate loss of one physical node without losing quorum

Physical hosts

3× Dell OptiPlex Micro 7010: pve01-pve03
Local NVMe only; no shared storage dependency
Hosts sized with headroom; no aggressive CPU/RAM overcommit by default

Proxmox cluster

3-node Proxmox VE cluster with Corosync over LAN
Static IPs on all hosts
vmbr0 = primary LAN bridge; VLAN-capable but unused initially
Proxmox HA: off by default (may be added later via separate design)

VM layout per host

Each OptiPlex runs exactly 2× Ubuntu Server LTS VMs:
- 1× Swarm Manager VM
- 1× Swarm Worker VM
No additional "misc" VMs on these hosts without an explicit architecture update

Swarm roles and placement

Total: 3 managers, 3 workers (one of each per host)
Managers hold Swarm Raft state and scheduling decisions
Workers run application workloads
Managers are schedulable only for light/infra tasks; no heavy or noisy apps
Node labels and placement constraints enforce "apps → workers" by default

Resource allocation (initial)

Manager VM
- 2 vCPU
- 4–6 GB RAM
- ~40 GB disk
Worker VM
- 4–6 vCPU
- 16–24 GB RAM
- ≥100 GB disk

Storage model

VM disks: local Proxmox storage (ZFS or LVM-thin), no shared VM disks
Container data: bind-mounts inside VMs
Swarm control plane and core workloads do not depend on shared storage
Production data path:
- Primary: TerraMaster
- Backup: TerraMaster → Synology via rsync
- Offsite: Synology → cloud

Networking assumptions

All Proxmox hosts and VMs attach to primary LAN via vmbr0
Compute plane runs on a flat LAN at baseline
Detailed VLAN and IP design will live in a separate networking architecture document that this spec can reference

Operational constraints ("never do this")

Do not run Docker workloads or Swarm nodes directly on Proxmox hosts
Do not run heavy or stateful application stacks on manager VMs
Do not introduce shared storage as a hard dependency for Swarm or cluster boot
Do not use storage appliances (TerraMaster, Synology, etc.) as Swarm managers or workers

Expansion and change model

To add compute capacity:
- Add a new OptiPlex node to the Proxmox cluster
- Create at least one new Swarm Worker VM on that host
- Join the VM to Swarm with standard labels and constraints
- Gradually rebalance workloads; no redesign of existing nodes required
Any change that alters manager count, enables Proxmox HA, or significantly changes storage/networking models requires an explicit architecture review and doc update