97 lines
2.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## ✅ **Point 2 Compute Plane (OptiPlex Proxmox Cluster) FINAL**
### **Role**
* Cluster that runs all Docker Swarm workloads
* Separate from out-of-band control (Watchtower)
* Designed to tolerate loss of one physical node without losing quorum
---
### **Physical hosts**
* 3× Dell OptiPlex Micro 7010: pve01-pve03
* Local NVMe only; no shared storage dependency
* Hosts sized with headroom; no aggressive CPU/RAM overcommit by default
---
### **Proxmox cluster**
* 3-node Proxmox VE cluster with Corosync over LAN
* Static IPs on all hosts
* vmbr0 = primary LAN bridge; VLAN-capable but unused initially
* Proxmox HA: **off** by default (may be added later via separate design)
---
### **VM layout per host**
* Each OptiPlex runs exactly 2× Ubuntu Server LTS VMs:
* 1× Swarm Manager VM
* 1× Swarm Worker VM
* No additional "misc" VMs on these hosts without an explicit architecture update
---
### **Swarm roles and placement**
* Total: 3 managers, 3 workers (one of each per host)
* Managers hold Swarm Raft state and scheduling decisions
* Workers run application workloads
* Managers are schedulable only for light/infra tasks; no heavy or noisy apps
* Node labels and placement constraints enforce "apps → workers" by default
---
### **Resource allocation (initial)**
* **Manager VM**
* 2 vCPU
* 46 GB RAM
* ~40 GB disk
* **Worker VM**
* 46 vCPU
* 1624 GB RAM
* ≥100 GB disk
---
### **Storage model**
* VM disks: local Proxmox storage (ZFS or LVM-thin), no shared VM disks
* Container data: bind-mounts inside VMs
* Swarm control plane and core workloads do **not** depend on shared storage
* Production data path:
* Primary: TerraMaster
* Backup: TerraMaster → Synology via rsync
* Offsite: Synology → cloud
---
### **Networking assumptions**
* All Proxmox hosts and VMs attach to primary LAN via vmbr0
* Compute plane runs on a flat LAN at baseline
* Detailed VLAN and IP design will live in a separate networking architecture document that this spec can reference
---
### **Operational constraints ("never do this")**
* Do **not** run Docker workloads or Swarm nodes directly on Proxmox hosts
* Do **not** run heavy or stateful application stacks on manager VMs
* Do **not** introduce shared storage as a hard dependency for Swarm or cluster boot
* Do **not** use storage appliances (TerraMaster, Synology, etc.) as Swarm managers or workers
---
### **Expansion and change model**
* To add compute capacity:
* Add a new OptiPlex node to the Proxmox cluster
* Create at least one new Swarm Worker VM on that host
* Join the VM to Swarm with standard labels and constraints
* Gradually rebalance workloads; no redesign of existing nodes required
* Any change that alters manager count, enables Proxmox HA, or significantly changes storage/networking models requires an explicit architecture review and doc update