nathan 479df6bd8a chore: archive legacy Identity, Workday, and Intune folders
- Move Identity/, Workday/, Intune/ to archive/ (superseded by nexus-mcp shards)
- Move 'Local Setup.md' to archive/ (superseded by nexus-mcp/Local-Setup.md)
- Add archive/README.md explaining migration and preserved content
- Clean repository structure: only nexus-mcp, documentation, and .github remain active

All legacy functionality migrated to nexus-mcp sharded architecture.
Archived folders preserved for reference and historical context.

Refs: SESSION_SNAPSHOT_2026-04-13.md
2026-04-13 09:38:42 -04:00

280 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Perfect — this gives us the **real-world shape** of the Workday record we should design around, *before* you ever get API access. Based on what you pasted, your **Employee ID is numeric (8 digits)** and you have a stable set of “Job Details” + “Public Contact” fields we can treat as your **Phase 1 canonical Worker contract**. [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
Below Im going to do two things:
1. Translate your Workday profile into a **clean canonical schema** (fields, types, constraints) that you can enforce with Pydantic (**WIS-010**) now.
2. Show how that schema feeds your **dry-run diff tool** (**WIS-019**) and your later Workday→AD sync logic—without needing Workday access yet.
***
## 1) What your profile tells us about “employee\_id” (and constraints)
From your profile:
* `Employee ID` = `10016638`**numeric string**, **8 digits** (at least in your tenant).
* This is a **better primary key** than email or name for matching to AD.
### Recommended constraint (for now)
* Treat `employee_id` as a **string** (not int) to preserve leading zeros if they exist in other records.
* Enforce pattern: `^[0-9]{8}$` (8 digits)
> If you later discover IDs vary (e.g., 610 digits), you can loosen the regex to `^[0-9]{6,10}$` without breaking downstream contracts.
***
## 2) Canonical Worker Schema (Phase 1 / no Workday API required)
Heres a **normalized Worker model** derived from your field list. Im using names that are stable + automation-friendly, while keeping the original Workday labels clear.
### Field mapping (Workday → canonical)
**Identity**
* `employee_id` ← Employee ID (string, 8-digit numeric)
* `work_email` ← Email (string, email format)
**Org / Reporting**
* `supervisory_organization` ← Supervisory Organization (string)
* `manager_name` ← Supervisory Organization trailing manager name (string, optional)
* `manager_employee_id` ← (not present in your paste; optional until Workday API provides it)
**Job**
* `position` ← Position
* `business_title` ← Business Title
* `job_profile` ← Job Profile
* `employee_type` ← Employee Type (e.g., Regular)
* `management_level` ← Management Level (e.g., Individual Contributor)
* `time_type` ← Time Type (e.g., Full time)
* `fte_percent` ← FTE (decimal percent)
**Employment dates**
* `hire_date` ← Hire Date (date)
* `original_hire_date` ← Original Hire Date (date)
* `continuous_service_date` ← Continuous Service Date (date)
**Location**
* `location` ← Location
* `work_address` ← Work Address (string)
### Why this specific cut matters
This set gives you enough to drive:
* onboarding timing (hire date)
* basic directory attributes (title, department-like org fields, location)
* reporting chain resolution (supervisory org / manager)
* future group mapping (job profile, employee type, management level)
…and its stable enough to validate *before* Workday access.
***
## 3) How Id update your current MCP server **right now** (targeted improvements)
Your current `server.py` is a solid scaffold, but its using:
* mock IDs like `EMP001`, and
* a flat dict with fields (`name`, `status`, `dept`, `email`, `manager_id`) [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
To align with *real Workday data* **before you ever connect**:
### A) Switch your mock dataset to real ID format
Change mock keys from `EMP001``10016638` style.
* This forces you to solve input validation and casing issues now (numeric IDs have no casing).
* It prevents later refactors when Workday is live.
### B) Standardize your tool outputs (strongly recommended)
Right now, tools return either:
* a worker dict OR
* `{"error": ...}` [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
Before Workday access, lock in a stable envelope like:
```json
{
"ok": true,
"data": { ... },
"error": null,
"meta": {
"correlation_id": "uuid",
"source": "mock",
"contract_version": "wis.v1"
}
}
```
Why now? Because Power Automate + connectors love consistent shapes. If you wait, youll end up rewriting flows and schemas later.
### C) Build Pydantic models for:
* Worker input (`WorkerModel`)
* Tool response (`WorkerStatusResponse`)
* Error (`ErrorModel`)
* Envelope (`ToolResponse[T]`)
This is the core of **WIS-010**.
***
## 4) Concrete Pydantic Contract Proposal (WIS-010)
Heres the **contract design** I recommend based on your profile fields and your automation goals.
### WorkerModel constraints (recommended)
* `employee_id`: `^[0-9]{8}$`
* `work_email`: must validate as email
* `fte_percent`: 0100 (float)
* `time_type`: enum (`Full time`, `Part time`, etc.)
* Dates: parsed into real `date` objects
### “Strict mode” recommendation
Start with:
* `extra="forbid"` (reject unknown fields)
* later relax to `extra="ignore"` if Workday payloads contain extra fields you dont care about
This gives you confidence early.
***
## 5) Your Dry-Run Diff Tool (WIS-019) — now shaped by real Workday fields
Once you validate Workday-shaped inputs, your dry-run diff becomes meaningful:
### What it should compare (Phase 1 scope)
From the fields you pasted, your first drift checks should include:
* Title / Business Title / Job Profile
* Location
* Manager / supervisory org reference
* Employee status → AD enabled/disabled state
* Email → UPN/mail alignment
### Diff item example (your intended format + enhancements)
You suggested:
```json
{"field":"department","old":"Sales","new":"Marketing","action":"update"}
```
Id add two fields **now** that become critical in Phase 3:
* `severity` (low/medium/high)
* `reversible` (true/false)
Example:
```json
{
"field": "business_title",
"old": "Team Lead, Deskside Support",
"new": "Team Lead, Deskside Support",
"action": "none",
"severity": "low",
"reversible": true
}
```
Then your approval payload can be generated directly from the diff result (no extra logic later).
***
## 6) Break Tests you can run **today** (with your real constraints)
Since we now know your ID shape, your “try to break it” script should include:
### Employee ID failure cases
* empty string `""`
* non-numeric `"EMP001"`
* wrong length `"1001663"` (7 digits), `"100166380"` (9 digits)
* whitespace `" 10016638 "`
### Data failures (WorkerModel)
* invalid email `"not-an-email"`
* FTE out of range `150.0`
* Hire date invalid `"2021-13-40"`
* Missing required fields (`employee_id`, `work_email`)
### Expected behavior
* Tools return `ok:false`
* Error has stable code (e.g., `VALIDATION_ERROR`)
* No sensitive fields are echoed in logs (see below)
This is how you prove WIS-010 is real, not theoretical.
***
## 7) Security/Hardening decisions informed by your profile fields
Because Workday records contain PII-like data (email, address), your hardening step (WIS-027) should **explicitly redact**:
* `work_email`
* `work_address`
* `employee_id` (optional partial mask like `******38`)
### Redaction policy (recommended)
* In logs: redact always
* In tool responses:
* keep email if the tool is intended for automation and access is controlled
* or provide a config switch: `RETURN_PII=false` (defaults safe)
This is much easier to implement now than after Workday is live.
***
# 8) What I would change in your current `server.py` (actionable list)
In order, **before Workday API access**:
1. **Replace EMP### mock IDs with numeric employee IDs** (like your `10016638`) [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
2. **Add Pydantic WorkerModel + Response envelope** (WIS-010) [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
3. **Normalize tool outputs** into `{ok,data,error,meta}` for every tool [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
4. **Implement dry\_run\_compare() tool** returning a typed DiffResult (WIS-019)
5. **Add rate-limit simulation decorator** and make errors explicitly retryable (WIS-027)
6. **Add log redaction helper** for email/address/employee\_id (WIS-027)
7. **Externalize mock data** into JSON + support multiple scenario datasets
8. **Pytest suite**: manager resolution scenarios (Found/Not Found/No Manager) using your mock structure [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
9. **Pre-commit hooks** to block commits on failing tests/lint
***
## 9) Two quick questions so I lock the schema correctly
1. In your tenant, is **Employee ID always 8 digits**, or have you seen contractors/vendors with a different format?
2. Do you want **Work Address** treated as:
* a field you *store* in AD/Entra, or
* *reference only* (used for location logic but not written anywhere)?
***
## If you want, I can take the next step automatically
If you say “go,” Ill draft:
* A **Pydantic schema pack** (`models.py`) for WorkerModel, DiffResult, Envelope
* A revised `server.py` that:
* uses numeric employee IDs
* returns consistent envelopes
* includes `dry_run_compare`
* includes redaction + rate-limit simulation toggles
* A `/tests` folder with the manager tests + “break tests” script
Just tell me whether you prefer **Pydantic v1 or v2** (most orgs are v2 now, but Ill match your repo dependency reality).