homelab/ansible/archive/documentation/contracts/ComputePlane.md

## ✅ **Point 2 – Compute Plane (OptiPlex Proxmox Cluster) – FINAL**

### **Role**

* Cluster that runs all Docker Swarm workloads
* Separate from out-of-band control (Watchtower)
* Designed to tolerate loss of one physical node without losing quorum

---

### **Physical hosts**

* 3× Dell OptiPlex Micro 7010: pve01-pve03
* Local NVMe only; no shared storage dependency
* Hosts sized with headroom; no aggressive CPU/RAM overcommit by default

---

### **Proxmox cluster**

* 3-node Proxmox VE cluster with Corosync over LAN
* Static IPs on all hosts
* vmbr0 = primary LAN bridge; VLAN-capable but unused initially
* Proxmox HA: **off** by default (may be added later via separate design)

---

### **VM layout per host**

* Each OptiPlex runs exactly 2× Ubuntu Server LTS VMs:
  * 1× Swarm Manager VM
  * 1× Swarm Worker VM
* No additional "misc" VMs on these hosts without an explicit architecture update

---

### **Swarm roles and placement**

* Total: 3 managers, 3 workers (one of each per host)
* Managers hold Swarm Raft state and scheduling decisions
* Workers run application workloads
* Managers are schedulable only for light/infra tasks; no heavy or noisy apps
* Node labels and placement constraints enforce "apps → workers" by default

---

### **Resource allocation (initial)**

* **Manager VM**
  * 2 vCPU
  * 4–6 GB RAM
  * ~40 GB disk
* **Worker VM**
  * 4–6 vCPU
  * 16–24 GB RAM
  * ≥100 GB disk

---

### **Storage model**

* VM disks: local Proxmox storage (ZFS or LVM-thin), no shared VM disks
* Container data: bind-mounts inside VMs
* Swarm control plane and core workloads do **not** depend on shared storage
* Production data path:
  * Primary: TerraMaster
  * Backup: TerraMaster → Synology via rsync
  * Offsite: Synology → cloud

---

### **Networking assumptions**

* All Proxmox hosts and VMs attach to primary LAN via vmbr0
* Compute plane runs on a flat LAN at baseline
* Detailed VLAN and IP design will live in a separate networking architecture document that this spec can reference

---

### **Operational constraints ("never do this")**

* Do **not** run Docker workloads or Swarm nodes directly on Proxmox hosts
* Do **not** run heavy or stateful application stacks on manager VMs
* Do **not** introduce shared storage as a hard dependency for Swarm or cluster boot
* Do **not** use storage appliances (TerraMaster, Synology, etc.) as Swarm managers or workers

---

### **Expansion and change model**

* To add compute capacity:
  * Add a new OptiPlex node to the Proxmox cluster
  * Create at least one new Swarm Worker VM on that host
  * Join the VM to Swarm with standard labels and constraints
  * Gradually rebalance workloads; no redesign of existing nodes required
* Any change that alters manager count, enables Proxmox HA, or significantly changes storage/networking models requires an explicit architecture review and doc update