97 lines
2.8 KiB
Markdown
97 lines
2.8 KiB
Markdown
## ✅ **Point 2 – Compute Plane (OptiPlex Proxmox Cluster) – FINAL**
|
||
|
||
### **Role**
|
||
|
||
* Cluster that runs all Docker Swarm workloads
|
||
* Separate from out-of-band control (Watchtower)
|
||
* Designed to tolerate loss of one physical node without losing quorum
|
||
|
||
---
|
||
|
||
### **Physical hosts**
|
||
|
||
* 3× Dell OptiPlex Micro 7010: pve01-pve03
|
||
* Local NVMe only; no shared storage dependency
|
||
* Hosts sized with headroom; no aggressive CPU/RAM overcommit by default
|
||
|
||
---
|
||
|
||
### **Proxmox cluster**
|
||
|
||
* 3-node Proxmox VE cluster with Corosync over LAN
|
||
* Static IPs on all hosts
|
||
* vmbr0 = primary LAN bridge; VLAN-capable but unused initially
|
||
* Proxmox HA: **off** by default (may be added later via separate design)
|
||
|
||
---
|
||
|
||
### **VM layout per host**
|
||
|
||
* Each OptiPlex runs exactly 2× Ubuntu Server LTS VMs:
|
||
* 1× Swarm Manager VM
|
||
* 1× Swarm Worker VM
|
||
* No additional "misc" VMs on these hosts without an explicit architecture update
|
||
|
||
---
|
||
|
||
### **Swarm roles and placement**
|
||
|
||
* Total: 3 managers, 3 workers (one of each per host)
|
||
* Managers hold Swarm Raft state and scheduling decisions
|
||
* Workers run application workloads
|
||
* Managers are schedulable only for light/infra tasks; no heavy or noisy apps
|
||
* Node labels and placement constraints enforce "apps → workers" by default
|
||
|
||
---
|
||
|
||
### **Resource allocation (initial)**
|
||
|
||
* **Manager VM**
|
||
* 2 vCPU
|
||
* 4–6 GB RAM
|
||
* ~40 GB disk
|
||
* **Worker VM**
|
||
* 4–6 vCPU
|
||
* 16–24 GB RAM
|
||
* ≥100 GB disk
|
||
|
||
---
|
||
|
||
### **Storage model**
|
||
|
||
* VM disks: local Proxmox storage (ZFS or LVM-thin), no shared VM disks
|
||
* Container data: bind-mounts inside VMs
|
||
* Swarm control plane and core workloads do **not** depend on shared storage
|
||
* Production data path:
|
||
* Primary: TerraMaster
|
||
* Backup: TerraMaster → Synology via rsync
|
||
* Offsite: Synology → cloud
|
||
|
||
---
|
||
|
||
### **Networking assumptions**
|
||
|
||
* All Proxmox hosts and VMs attach to primary LAN via vmbr0
|
||
* Compute plane runs on a flat LAN at baseline
|
||
* Detailed VLAN and IP design will live in a separate networking architecture document that this spec can reference
|
||
|
||
---
|
||
|
||
### **Operational constraints ("never do this")**
|
||
|
||
* Do **not** run Docker workloads or Swarm nodes directly on Proxmox hosts
|
||
* Do **not** run heavy or stateful application stacks on manager VMs
|
||
* Do **not** introduce shared storage as a hard dependency for Swarm or cluster boot
|
||
* Do **not** use storage appliances (TerraMaster, Synology, etc.) as Swarm managers or workers
|
||
|
||
---
|
||
|
||
### **Expansion and change model**
|
||
|
||
* To add compute capacity:
|
||
* Add a new OptiPlex node to the Proxmox cluster
|
||
* Create at least one new Swarm Worker VM on that host
|
||
* Join the VM to Swarm with standard labels and constraints
|
||
* Gradually rebalance workloads; no redesign of existing nodes required
|
||
* Any change that alters manager count, enables Proxmox HA, or significantly changes storage/networking models requires an explicit architecture review and doc update
|