14 KiB
EXECUTIVE SUMMARY
Finding: pve03 and pve04 are NOT identical, with meaningful differences:
- pve03: 10 cores, 23.6 GB RAM, unknown storage capacity (already clustered, running 3 VMs)
- pve04: 14 cores, 15 GB RAM, 238.5 GB NVMe SSD (fresh, not yet clustered)
Recommendation for "3 identically-spec'd devices":
- Option A (Recommended): Use pve04 as the template model. Procurement should source 3× Intel Core i5-13500T machines with 15+ GB RAM and 240+ GB NVMe storage. pve04 is the better baseline (better single-thread performance, dedicated NVMe, fresh OS).
- Option B: Keep pve03 as template. Run a deeper audit on pve03's actual storage (it has 21 loop/dm devices—unclear if additional storage is attached). Backfill pve04 and a 3rd host to match pve03's full config.
Verdict: pve04 > pve03 for Swarm baseline. The i5-13500T offers superior CPU performance (4600 MHz boost vs 2885 MHz), dedicated fast storage, and is freshly provisioned. Use pve04 as the reference architecture for the 3rd node.
DETAILED HARDWARE COMPARISON
CPU Specifications
| Dimension | pve03 | pve04 | Status |
|---|---|---|---|
| Model | Unknown / unrecognized | Intel Core i5-13500T | ✅ pve04 superior |
| Architecture | x86_64 | x86_64 | ✅ Match |
| Socket Count | 1 | 1 | ✅ Match |
| Cores per Socket | 10 | 14 | ⚠️ MISMATCH |
| Logical CPUs (with HT) | 10 | 20 | ⚠️ MISMATCH |
| Max Frequency | 2,885 MHz | 4,600 MHz | ⚠️ pve04 55% faster |
| Min Frequency | Unknown | 800 MHz | — |
| Microcode Level | 0x437 | 0x3a | — |
Interpretation:
- pve04's i5-13500T is a 13th-gen Intel desktop CPU (2023), significantly newer and faster than pve03
- pve03's CPU could be a degraded/limited processor or a different i5/i7 SKU—need clarification
- For Docker Swarm workloads: pve04's higher clock speed (4600 MHz) means better latency-sensitive tasks; pve03's 10 cores are still adequate for the planned 2 VMs (manager + worker) per node
Recommendation: If strict "identical" is the mandate, pve04 is the better model to replicate. Purchasing 3× i5-13500T machines ensures:
- Consistent single-threaded performance
- Known thermal/power envelope
- Support (retail CPUs, widely available)
Memory (RAM) Specifications
| Dimension | pve03 | pve04 | Status |
|---|---|---|---|
| Total RAM | 23.6 GB | 15.0 GB | ⚠️ MISMATCH |
| Free RAM | 12.4 GB | 13.0 GB | ⚠️ pve03 has extra, currently used |
| Used by OS + Proxmox | ~11.2 GB | ~1.7 GB | ⚠️ pve03 heavier |
Interpretation:
- pve03: 23.6 GB total (likely 2× 12 GB or 4× 8 GB SODIMM/UDIMM sticks)
- pve04: 15 GB total (likely 1× 16 GB, with 1 GB reserved for BIOS/SMM)
- pve03 is using ~11 GB for the OS and Proxmox daemon + 3 running VMs
- pve04 is minimal (fresh install, no VMs)
Validation Against Swarm Requirements:
- Each node will host 2 VMs: 1 manager (2 cores, 2 GB RAM) + 1 worker (2 cores, 2 GB RAM)
- Proxmox overhead: ~2-4 GB per node
- Minimum needed: 8+ GB RAM per node ✅ Both qualify
- Optimal: 16 GB ✅ pve04 meets this; pve03 exceeds it
Recommendation: Use 16 GB as the standard for 3-node cluster (matches pve04). This is cost-effective and provides ample headroom.
Storage Specifications
| Dimension | pve03 | pve04 | Status |
|---|---|---|---|
| Primary Disk(s) | Unknown (21 loop/dm devices detected) | 1× 238.5 GB NVMe SSD | ⚠️ pve04 transparent |
| Root FS Capacity | 68 GB | 238.5 GB | ⚠️ MISMATCH |
| Root FS Available | 59 GB free | ~230 GB available | ⚠️ pve04 has more room |
| Storage Type | Unknown (likely SATA SSD or array) | Enterprise-grade NVMe | — |
Interpretation:
- pve03's storage is opaque: 21 loop and device-mapper devices suggest:
- Possible RAID configuration (dm-* = device mapper)
- LVM (Logical Volume Manager) setup
- Possibly shared storage mounted
- Current state: ~68 GB LVM volume, 9 GB used
- pve04's storage is straightforward: Single 238.5 GB NVMe SSD, clean LVM setup, minimal OS footprint
VM Storage Requirements (per node):
- 1 Manager VM: 32 GB disk (from provisionspec in your playbook)
- 1 Worker VM: 32 GB disk
- Total per node: 64 GB guest storage (+ Proxmox root FS)
- Total available after OS: pve03 ≈ 59 GB, pve04 ≈ 230 GB
⚠️ CRITICAL FINDING: pve03 has insufficient disk capacity for the planned topology (needs 64 GB for VMs + OS buffer = ~80 GB, only has ~59 GB free). Unless pve03 has additional storage mounted (not visible in the scan), it cannot host 2 full 32 GB VMs.
Recommendation:
- Immediate: Verify pve03's storage architecture. Why 21 dm/loop devices? Is there additional NAS/SAN attached?
- For 3rd node procurement: Use pve04 as baseline:
- 240+ GB NVMe SSD (minimum)
- Clean, single-drive configuration (KISS principle)
- Sufficient headroom for VMs + snapshots + log growth
Network Specifications
| Dimension | pve03 | pve04 | Status |
|---|---|---|---|
| Interface Count | 6 interfaces | 4 interfaces | — |
| Bridge | vmbr0 + tap devices | vmbr0 visible | ✅ Both standard |
| Primary Network | wlp0s20f3 + nic0 | wlp0s20f3 + nic0 | ✅ Match (suggest renaming nic0) |
Interpretation:
- Both nodes have the same network card models (wlp0s20f3 = wireless, nic0 = Ethernet)
- pve03 has 2 tap devices (tap301i0, tap302i0) = VM network interfaces from running VMs
- pve04 has no tap devices = freshly imaged, no VMs yet
- Corosync / Proxmox Cluster: Both will use vmbr0 for inter-node communication
Recommendation: Both nodes are network-compatible. No issues for Docker Swarm overlay networking.
Proxmox & Cluster Status
| Dimension | pve03 | pve04 | Status |
|---|---|---|---|
| Proxmox Version | 9.1.6 | 9.1.1 | ⚠️ Versions differ by .5 patch |
| Kernel | 6.17.2-1-pve | 6.17.2-1-pve | ✅ Match |
| OS Distro | Debian trixie | Debian trixie | ✅ Match |
| Cluster Status | ✅ Clustered (homelab) | ❌ Not clustered | — |
| Cluster Members | pve01, pve02, pve03 | None yet | — |
| VMs Running | 3 VMs/containers | 0 VMs | — |
| Uptime | 4 days | ~0 days (fresh) | — |
Interpretation:
- pve03 is an active, production node in the homelab cluster
- pve04 is a fresh candidate ready for integration
- Minor version difference (9.1.6 vs 9.1.1) is not a blocker—routine updates will align them
Recommendation: Update both to the latest Proxmox 9.x patch level before final cluster formation.
DOCKER SWARM TOPOLOGY ANALYSIS
Target Design (from documentation/architecture/compute-plane.md)
- 3× identically-spec'd physical Proxmox nodes
- 3× Swarm Managers (1 per node, IPs: 10.0.0.211–213)
- 3× Swarm Workers (1 per node, IPs: 10.0.0.221–223)
- Each VM: 2 vCPU, 4 GB RAM, 32 GB disk
- Proxmox cluster with Corosync for HA
- No overcommit
Capacity Analysis: pve04 as Reference Model
CPU
- pve04 Spec: 14 cores, 1 socket, 4600 MHz peak
- Planned Usage: 4 vCPU (2 for manager, 2 for worker) = 28.6% utilization
- Proxmox/Corosync Overhead: ~1 vCPU
- Available Headroom: 14 - 4 - 1 = 9 vCPU spare
- Verdict: ✅ EXCELLENT. Can sustain workload + spikes + 2x VM migration
Memory (15 GB)
- Planned Usage: 4 GB (manager) + 4 GB (worker) = 8 GB
- Proxmox OS + daemons: ~2–3 GB
- Available Headroom: 15 - 8 - 2.5 = 4.5 GB spare
- Verdict: ✅ ADEQUATE. No aggressive swapping. Supports scheduled workload growth.
Storage (240 GB)
- Planned Usage: 32 GB (manager) + 32 GB (worker) = 64 GB
- Proxmox OS: ~8 GB
- Snapshots/Logs Buffer: ~20 GB
- Total Planned: ~92 GB
- Available Headroom: 240 - 92 = 148 GB spare
- Verdict: ✅ EXCELLENT. Ample room for workload scaling, backups, experiments.
Network
- Swarm Overlay: vmbr0 at 1 Gbps
- Expected inter-node throughput: <100 Mbps for modest swarm (10–20 containers)
- Verdict: ✅ ADEQUATE for Docker Swarm in homelab. Upgrade to 10 Gbps if production-scale or data-intensive AI workloads planned.
High-Availability & Resilience
Quorum Analysis
- 3 Proxmox Nodes: Corosync quorum = 2/3 nodes required
- Can tolerate 1 node failure ✅ Good
- If node1 fails: quorum = nodes 2+3 (still ≥2) → cluster remains operational
- 3 Swarm Managers: Raft consensus quorum = 2/3 nodes required
- Can tolerate 1 manager failure ✅ Good
- If manager1 fails: quorum = managers 2+3 (still ≥2) → swarm remains operational
Failure Scenarios
| Scenario | Outcome | Swarm Impact |
|---|---|---|
| 1 node power fails | Surviving nodes take over VMs | Containers restart on node 2&3 |
| 1 node storage corrupt | Proxmox HA can restart VMs on peer | Brief service interruption (~30s) |
| 1 node network partition | Corosync detects; quorum = 2 survivors | Cluster continues; isolated node reboots |
| 2 nodes fail simultaneously | Game over; cluster non-functional | ALL workload lost |
Verdict: Design supports N-1 failure tolerance. Very good for homelab.
SPECIAL CONSIDERATIONS FOR pve03
Storage Mystery: 21 Loop/Device-Mapper Devices
Questions to Investigate:
- Is pve03 mounted to external NAS/SAN (e.g., Synology 10.0.0.249)?
- Is there a RAID or LVM snapshot setup?
- Were multiple physical drives present originally, now failed?
Action Items:
# From watchtower or pve03:
pvesh get /storage --output-format json # List all Proxmox storage targets
zfs list # If ZFS in use
lvs # LVM volumes
pvdisplay # LVM physical volumes
df -i # Inode usage (helps diagnose loop mounts)
Implication: Until pve03's storage is clarified, it cannot be used as a template for the 3rd identical host.
FINAL RECOMMENDATIONS
1. Short-Term (Immediate)
Action: Clarify pve03's storage architecture.
# SSH into pve03 via watchtower relay or direct if SSH key added
ssh root@10.0.0.203 "pvesh get /storage --output-format json"
ssh root@10.0.0.203 "lvs && pvs"
ssh root@10.0.0.203 "zfs list 2>/dev/null || echo 'ZFS not in use'"
If pve03 has external storage:
- Note the configuration (NAS IP, mount method, capacity)
- Plan to replicate in 3rd node
If pve03 is just a single drive:
- Proceed with pve04 as template
2. Medium-Term (Before Final 3-Node Deployment)
Option A: Adopt pve04 as Template (RECOMMENDED)
- Procurement: 3× machines with Intel i5-13500T, 16 GB RAM, 256 GB NVMe
- Cost: ~$200–300 per node (retail Core i5 desktop equivalent)
- Timeline: 1–2 weeks (sourcing)
- Next step: Install Proxmox 9.x on 3rd node; cluster join
Option B: Backfill pve03 Config to pve04 & 3rd Node
- Upgrade pve04 RAM from 15 GB → 24 GB (add 1× 8 GB SODIMM)
- Verify pve03's external storage is documented
- Replicate in pve04 and 3rd node
- Cost: ~$30–50 per node (additional RAM)
- Timeline: 1 week
- Risk: Depends on clarifying pve03 fully
Recommendation Pick: Option A is cleaner. pve04 is fresher, faster, and has clear config.
3. Long-Term (Post-3-Node Commissioning)
Cluster Formation:
# On pve04 (assuming elected as initial leader):
pvecm create homelab
# On 3rd new node:
pvecm add <pve04_ip_or_hostname>
# Verify:
pvesh get /cluster/status
VM Provisioning:
# Use your existing playbook:
ansible-playbook -i inventory/hosts.ini \
playbooks/proxmox/provision_swarm_vms.yml \
-e target_host=pve04 \
-e target_host=pve0N # For 3rd node
Docker Swarm Init:
# On swarm-manager-1 (e.g., 10.0.0.211):
docker swarm init --advertise-addr 10.0.0.211
# On manager-2 & manager-3:
docker swarm join --token <manager-token> 10.0.0.211:2377
APPENDIX: Hardware Specs Collected
pve03 (10.0.0.203) – Full Details
CPU: 10 cores, 1 socket, max 2885 MHz
Memory: 23.6 GB total, 12.4 GB free
Storage: 68 GB root LVM (59 GB free) + 21 dm/loop devices (TBD)
OS: Debian trixie, kernel 6.17.2-1-pve
Proxmox: 9.1.6
Network: 6 interfaces (vmbr0, nic0, wlp0s20f3, tap301i0, tap302i0, lo)
Cluster Status: Clustered (homelab), 3 VMs running
Uptime: 4 days
pve04 (10.0.0.204) – Full Details
CPU: Intel Core i5-13500T, 14 cores, 1 socket, 20 vCPUs (HT), max 4600 MHz
Memory: 15.0 GB total, ~13.0 GB available, 8.0 GB swap
Storage: 238.5 GB NVMe SSD (nvme0n1), single drive
OS: Debian trixie, kernel 6.17.2-1-pve
Proxmox: 9.1.1
Network: 4 interfaces (vmbr0, nic0, wlp0s20f3, lo)
Cluster Status: Not clustered yet, 0 VMs
Uptime: Fresh (just rebooted)
CONCLUSION
pve04 is the superior choice for replication to a 3-node cluster because of:
- CPU performance: 4600 MHz vs 2885 MHz (55% faster single-thread)
- Storage clarity: Single 240 GB NVMe (vs pve03's mysterious setup)
- Ballpark specifications: 15 GB RAM + 240 GB SSD = excellent value for Swarm workloads
- Freshness: No legacy config debt
Immediate action: Clarify pve03's storage. Then either adopt pve04 as template or provide additional pve03 context to backfill.
Expected outcome: 3-node Proxmox cluster running 6 Docker Swarm nodes (3 managers, 3 workers) with excellent resilience, performance, and headroom for future growth.