homelab/.github/prompts/plan-homelabMcpGatewayMvp.prompt.md

169 lines
8.0 KiB
Markdown

# Plan: Homelab MCP Gateway MVP with Traefik Shard
## TL;DR
Build a modular MCP (Model Context Protocol) Gateway on Waldorf that routes tool requests to specialized shards. MVP includes the Traefik shard (for dynamic route management) plus a template for creating additional shards. Each shard can fetch its service's documentation from the internet on-demand.
**Approach:** Python-based using mcp.server.fastmcp, deploy via single docker-compose on Waldorf, no authentication (trust internal network), web fetching for live documentation.
---
## Steps
### Phase 1: Infrastructure Setup
1. Create unified directory structure on Waldorf
- `/nodes/waldorf/mcp-system/` with single compose.yaml
- `/nodes/waldorf/mcp-system/gateway/` for Gateway code
- `/nodes/waldorf/mcp-system/traefik-shard/` for Traefik Shard code
2. Create shared template directory (*parallel with step 1*)
- `/mcp_root/template/` for shard template files
- Documentation: `/mcp_root/template/README.md`
### Phase 2: Gateway Implementation
3. Build Gateway core functionality (*depends on step 1*)
- Shard registry (discover and register shards)
- Tool routing (forward requests to appropriate shard)
- Health check aggregation
- Startup logic to discover available shards
4. Create Gateway Dockerfile and requirements.txt (*parallel with step 3*)
- Python 3.11 base image
- Install mcp, httpx, pyyaml
### Phase 3: Traefik Shard Implementation
5. Implement Traefik Shard with 7 tools (*depends on step 1*)
- `list_routes` - Query Traefik API for all routes
- `create_route` - Write new YAML file to `/dynamic/mcp-managed/`
- `delete_route` - Remove route YAML file
- `validate_config` - YAML syntax check + Traefik API validation
- `get_backend_status` - Health check backend services
- `check_ssl_status` - Query Traefik API for cert info
- `reload_config` - Trigger Traefik config reload (if needed)
6. Add documentation fetcher to Traefik Shard (*parallel with step 5*)
- Tool: `get_traefik_docs(topic)` - Fetch from docs.traefik.io
- Use httpx to fetch and cache temporarily
- Parse HTML/Markdown for relevant sections
7. Implement shard registration with Gateway (*depends on step 5*)
- Health endpoint for Gateway discovery
- Tool manifest endpoint (list available tools)
8. Create Traefik Shard Dockerfile and requirements.txt (*depends on step 5*)
- Python 3.11 base image
- Install mcp, httpx, pyyaml, beautifulsoup4
9. Create unified docker-compose.yaml (*depends on steps 4, 8*)
- Gateway service with appdata mount
- Traefik Shard service with NFS mount to `/mnt/appdata/traefik/dynamic:rw`
- Shared Docker network for inter-shard communication
- Environment: `TRAEFIK_API_URL=http://10.0.0.151:8080/api` (reach Heimdall)
### Phase 4: Prepare Traefik Integration
10. Create `/mnt/appdata/traefik/dynamic/mcp-managed/` directory (*depends on step 9*)
- Isolated folder for MCP-managed routes (safer, easier cleanup)
- Traefik file watcher will auto-detect changes here
11. Verify Traefik allows write access (*parallel with step 10*)
- Confirm NFS mount on Waldorf allows writes to `/mnt/appdata/traefik/dynamic/`
- If needed, update Traefik mount from `:ro` to `:rw` in `nodes/heimdall/core/compose.yaml`
### Phase 5: Shard Template Creation
12. Create comprehensive shard template (*depends on steps 5-7*)
- `template/shard_template.py` - Skeleton MCP server
- `template/Dockerfile.template` - Standard container build
- `template/compose.yaml.template` - Docker compose service boilerplate
- `template/requirements.txt` - Common dependencies
13. Write template documentation (*parallel with step 12*)
- `/mcp_root/template/README.md` - How to create a new shard
- `/mcp_root/template/INTEGRATION.md` - How shards register with Gateway
- `/mcp_root/ARCHITECTURE.md` - Overall system design
### Phase 6: Deployment & Validation
14. Deploy unified MCP system on Waldorf (*depends on steps 9, 10*)
- `docker compose up` in `/nodes/waldorf/mcp-system/`
- Verify Gateway logs show successful startup and shard discovery
- Verify Traefik Shard registers successfully
15. Test tool execution (*depends on step 14*)
- Gateway → list_routes → Traefik Shard → Traefik API (Heimdall)
- Create test route for validation
- Verify documentation fetcher works
16. Integration with Open WebUI (*depends on step 15*)
- Update `/nodes/waldorf/openwebui/compose.yaml` to connect to MCP Gateway
- Configure MCP Gateway connection in Open WebUI (localhost since same host)
- Test end-to-end LLM → Gateway → Shard flow
---
## Relevant Files
- `ansible/archive/scripts/ansible_mcp_server.py` - Reference implementation showing MCP server patterns, job tracking, configuration
- `nodes/heimdall/core/compose.yaml` - Contains Traefik service definition (lines 10-50), needs mount permission update
- `nodes/waldorf/openwebui/compose.yaml` - Open WebUI config with commented MCP Gateway integration (lines 15-17)
- `ansible/archive/outputs/heimdall-baseline-20260312T214117/traefik_configs/traefik.yml` - Static Traefik config showing API endpoint, providers, file watch
- `ansible/archive/outputs/heimdall-baseline-20260312T214117/traefik_configs/static-backends.yml` - Example dynamic route structure to replicate
- `ansible/archive/outputs/heimdall-baseline-20260312T214117/traefik_configs/middleware.yml` - Existing middleware definitions to reference
---
## Verification
1. **Gateway Health Check**: `curl http://10.0.0.251:9100/health` returns shard registry
2. **Shard Registration**: Gateway logs show Traefik shard discovered and registered
3. **Tool Execution**: Call `list_routes` through Gateway, receive Traefik API response
4. **Route Creation**: Create test route `test.castaldifamily.com` → Appears in Traefik dashboard
5. **Documentation Fetcher**: Call `get_traefik_docs("middlewares")` → Returns relevant Traefik docs
6. **File Validation**: Check `/mnt/appdata/traefik/dynamic/mcp-managed/` contains created routes
7. **Traefik Reload**: Verify Traefik auto-detects new YAML files (file watch enabled)
8. **Open WebUI Integration**: Send message in Open WebUI that triggers MCP tool → See logs in Gateway
9. **Template Usability**: Follow template README to create a stub "Dozzle Shard" → Registers successfully
---
## Decisions
- **Language**: Python (mcp.server.fastmcp) - matches existing Ansible MCP server pattern
- **Deployment Location**: All components on Waldorf (10.0.0.251) - stable 24/7 node with 16GB RAM, runs Open WebUI
- **Single Compose File**: Gateway + all shards in one docker-compose.yaml - simpler MVP, easier debugging
- **Traefik Access**: Shard reaches Traefik API on Heimdall via `http://10.0.0.151:8080/api`, writes to shared NFS mount `/mnt/appdata/traefik/dynamic/`
- **Authentication**: None for MVP - trust internal network isolation (add in future if needed)
- **Documentation Fetching**: On-demand web fetching using httpx - fetch from official service docs when tool is called
- **Route Management**: Create isolated `/mcp-managed/` subdirectory in Traefik dynamic config - safer than mixing with existing routes
- **All 7 Traefik tools included**: list_routes, create_route, delete_route, validate_config, get_backend_status, check_ssl_status, reload_config
---
## Scope Boundaries
**Included:**
- MCP Gateway with shard discovery and routing
- Complete Traefik shard with 7 tools + documentation fetcher
- Comprehensive template for creating new shards
- Integration with Open WebUI
- Single docker-compose deployment on Waldorf
**Excluded:**
- Additional shards (Dozzle, Authentik) - future work, use template to create
- Authentication/authorization - trust network for MVP
- Monitoring/metrics collection - add later if needed
- Web UI for Gateway management - CLI/API only for MVP
- Advanced caching for documentation - simple in-memory cache only
- Cross-node service mesh networking - direct HTTP between containers
- Ansible playbook for automated deployment - manual docker compose for MVP
---
## Further Considerations
None - all clarifications obtained. Ready for implementation.