BREADCRUMBS

This commit is contained in:
nathan 2026-04-03 15:03:26 -04:00
parent 4ad78b57b1
commit 475888ad01
3 changed files with 327 additions and 0 deletions

View File

@ -0,0 +1,45 @@
Here is a prioritized list of high-value tasks you can complete right now in your local workday-mcp environment:
1. Expand the "Mismatch" Logic (WIS-014 WIS-018)
Youve built the Manager scanner, but a true Identity Sync needs to detect several other types of drift.
Job Title Mismatch: Build a tool to compare "Workday Title" vs "AD Title".
Department Drift: Identify workers whose cost center in Workday doesn't match their AD Department string.
Legal Name vs. Preferred Name: Build logic to handle cases where AD uses a "Display Name" that differs from the Workday "Legal Name".
Status Reconciliation: Create a tool that specifically flags "Terminated" in Workday but "Enabled" in AD.
2. Implement Schema Validation (WIS-010)
Instead of just returning "any" dictionary, use a library like pydantic to enforce a strict contract.
The Build: Create a WorkerModel that defines exactly what fields are required (e.g., employee_id must be a string of a certain length).
The Test: Write a script that tries to "break" your tools by feeding them bad data to see if your error handling catches it gracefully.
3. Build a "Dry Run" Comparison Tool (WIS-019)
Before you ever automate a "Write" to Active Directory, you need a tool that simulates the change.
The Logic: Create a tool that takes a Workday record and an AD record (both mocked for now) and returns a "Diff" object.
Output Example: {"field": "department", "old": "Sales", "new": "Marketing", "action": "update"}.
4. Hardening & Security (WIS-027 & Priority 6)
Prepare for the "Production" environment requirements.
Log Redaction: Update your server to ensure that if an error occurs, it doesn't print sensitive data (like emails or IDs) to the console/logs.
Environment Configuration: Move your "Constants" (like port numbers or mock file paths) into a .env file and use the python-dotenv library to load them.
Rate Limiting Simulation: Workday APIs have limits. Build a "decorator" for your tools that simulates a delay or a "429 Too Many Requests" error to test how your server handles it.
5. Documentation & "Self-Service" (WIS-027)
Build an MCP Resource: MCP supports "Resources" (read-only files). Create a resource that serves a "Data Dictionary" explaining what every Workday field means.
Installation Script: Write a simple setup.sh or setup.bat that automates the creation of the .venv and installation of requirements for the next person who joins the project.
6. Unit Testing (The "Quality" Pillar)
Pytest Integration: Create a /tests folder and write tests that verify your get_worker_manager logic for all three scenarios (Found, Not Found, No Manager).
CI/CD Simulation: Set up a local Git Hook that prevents you from committing code if it has Pylance errors or failing tests.

View File

@ -0,0 +1,279 @@
Perfect — this gives us the **real-world shape** of the Workday record we should design around, *before* you ever get API access. Based on what you pasted, your **Employee ID is numeric (8 digits)** and you have a stable set of “Job Details” + “Public Contact” fields we can treat as your **Phase 1 canonical Worker contract**. [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
Below Im going to do two things:
1. Translate your Workday profile into a **clean canonical schema** (fields, types, constraints) that you can enforce with Pydantic (**WIS-010**) now.
2. Show how that schema feeds your **dry-run diff tool** (**WIS-019**) and your later Workday→AD sync logic—without needing Workday access yet.
***
## 1) What your profile tells us about “employee\_id” (and constraints)
From your profile:
* `Employee ID` = `10016638`**numeric string**, **8 digits** (at least in your tenant).
* This is a **better primary key** than email or name for matching to AD.
### Recommended constraint (for now)
* Treat `employee_id` as a **string** (not int) to preserve leading zeros if they exist in other records.
* Enforce pattern: `^[0-9]{8}$` (8 digits)
> If you later discover IDs vary (e.g., 610 digits), you can loosen the regex to `^[0-9]{6,10}$` without breaking downstream contracts.
***
## 2) Canonical Worker Schema (Phase 1 / no Workday API required)
Heres a **normalized Worker model** derived from your field list. Im using names that are stable + automation-friendly, while keeping the original Workday labels clear.
### Field mapping (Workday → canonical)
**Identity**
* `employee_id` ← Employee ID (string, 8-digit numeric)
* `work_email` ← Email (string, email format)
**Org / Reporting**
* `supervisory_organization` ← Supervisory Organization (string)
* `manager_name` ← Supervisory Organization trailing manager name (string, optional)
* `manager_employee_id` ← (not present in your paste; optional until Workday API provides it)
**Job**
* `position` ← Position
* `business_title` ← Business Title
* `job_profile` ← Job Profile
* `employee_type` ← Employee Type (e.g., Regular)
* `management_level` ← Management Level (e.g., Individual Contributor)
* `time_type` ← Time Type (e.g., Full time)
* `fte_percent` ← FTE (decimal percent)
**Employment dates**
* `hire_date` ← Hire Date (date)
* `original_hire_date` ← Original Hire Date (date)
* `continuous_service_date` ← Continuous Service Date (date)
**Location**
* `location` ← Location
* `work_address` ← Work Address (string)
### Why this specific cut matters
This set gives you enough to drive:
* onboarding timing (hire date)
* basic directory attributes (title, department-like org fields, location)
* reporting chain resolution (supervisory org / manager)
* future group mapping (job profile, employee type, management level)
…and its stable enough to validate *before* Workday access.
***
## 3) How Id update your current MCP server **right now** (targeted improvements)
Your current `server.py` is a solid scaffold, but its using:
* mock IDs like `EMP001`, and
* a flat dict with fields (`name`, `status`, `dept`, `email`, `manager_id`) [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
To align with *real Workday data* **before you ever connect**:
### A) Switch your mock dataset to real ID format
Change mock keys from `EMP001``10016638` style.
* This forces you to solve input validation and casing issues now (numeric IDs have no casing).
* It prevents later refactors when Workday is live.
### B) Standardize your tool outputs (strongly recommended)
Right now, tools return either:
* a worker dict OR
* `{"error": ...}` [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
Before Workday access, lock in a stable envelope like:
```json
{
"ok": true,
"data": { ... },
"error": null,
"meta": {
"correlation_id": "uuid",
"source": "mock",
"contract_version": "wis.v1"
}
}
```
Why now? Because Power Automate + connectors love consistent shapes. If you wait, youll end up rewriting flows and schemas later.
### C) Build Pydantic models for:
* Worker input (`WorkerModel`)
* Tool response (`WorkerStatusResponse`)
* Error (`ErrorModel`)
* Envelope (`ToolResponse[T]`)
This is the core of **WIS-010**.
***
## 4) Concrete Pydantic Contract Proposal (WIS-010)
Heres the **contract design** I recommend based on your profile fields and your automation goals.
### WorkerModel constraints (recommended)
* `employee_id`: `^[0-9]{8}$`
* `work_email`: must validate as email
* `fte_percent`: 0100 (float)
* `time_type`: enum (`Full time`, `Part time`, etc.)
* Dates: parsed into real `date` objects
### “Strict mode” recommendation
Start with:
* `extra="forbid"` (reject unknown fields)
* later relax to `extra="ignore"` if Workday payloads contain extra fields you dont care about
This gives you confidence early.
***
## 5) Your Dry-Run Diff Tool (WIS-019) — now shaped by real Workday fields
Once you validate Workday-shaped inputs, your dry-run diff becomes meaningful:
### What it should compare (Phase 1 scope)
From the fields you pasted, your first drift checks should include:
* Title / Business Title / Job Profile
* Location
* Manager / supervisory org reference
* Employee status → AD enabled/disabled state
* Email → UPN/mail alignment
### Diff item example (your intended format + enhancements)
You suggested:
```json
{"field":"department","old":"Sales","new":"Marketing","action":"update"}
```
Id add two fields **now** that become critical in Phase 3:
* `severity` (low/medium/high)
* `reversible` (true/false)
Example:
```json
{
"field": "business_title",
"old": "Team Lead, Deskside Support",
"new": "Team Lead, Deskside Support",
"action": "none",
"severity": "low",
"reversible": true
}
```
Then your approval payload can be generated directly from the diff result (no extra logic later).
***
## 6) Break Tests you can run **today** (with your real constraints)
Since we now know your ID shape, your “try to break it” script should include:
### Employee ID failure cases
* empty string `""`
* non-numeric `"EMP001"`
* wrong length `"1001663"` (7 digits), `"100166380"` (9 digits)
* whitespace `" 10016638 "`
### Data failures (WorkerModel)
* invalid email `"not-an-email"`
* FTE out of range `150.0`
* Hire date invalid `"2021-13-40"`
* Missing required fields (`employee_id`, `work_email`)
### Expected behavior
* Tools return `ok:false`
* Error has stable code (e.g., `VALIDATION_ERROR`)
* No sensitive fields are echoed in logs (see below)
This is how you prove WIS-010 is real, not theoretical.
***
## 7) Security/Hardening decisions informed by your profile fields
Because Workday records contain PII-like data (email, address), your hardening step (WIS-027) should **explicitly redact**:
* `work_email`
* `work_address`
* `employee_id` (optional partial mask like `******38`)
### Redaction policy (recommended)
* In logs: redact always
* In tool responses:
* keep email if the tool is intended for automation and access is controlled
* or provide a config switch: `RETURN_PII=false` (defaults safe)
This is much easier to implement now than after Workday is live.
***
# 8) What I would change in your current `server.py` (actionable list)
In order, **before Workday API access**:
1. **Replace EMP### mock IDs with numeric employee IDs** (like your `10016638`) [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
2. **Add Pydantic WorkerModel + Response envelope** (WIS-010) [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
3. **Normalize tool outputs** into `{ok,data,error,meta}` for every tool [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
4. **Implement dry\_run\_compare() tool** returning a typed DiffResult (WIS-019)
5. **Add rate-limit simulation decorator** and make errors explicitly retryable (WIS-027)
6. **Add log redaction helper** for email/address/employee\_id (WIS-027)
7. **Externalize mock data** into JSON + support multiple scenario datasets
8. **Pytest suite**: manager resolution scenarios (Found/Not Found/No Manager) using your mock structure [\[wheelsinc-...epoint.com\]](https://wheelsinc-my.sharepoint.com/personal/castn1_wheels_com/Documents/Microsoft%20Copilot%20Chat%20Files/server.py)
9. **Pre-commit hooks** to block commits on failing tests/lint
***
## 9) Two quick questions so I lock the schema correctly
1. In your tenant, is **Employee ID always 8 digits**, or have you seen contractors/vendors with a different format?
2. Do you want **Work Address** treated as:
* a field you *store* in AD/Entra, or
* *reference only* (used for location logic but not written anywhere)?
***
## If you want, I can take the next step automatically
If you say “go,” Ill draft:
* A **Pydantic schema pack** (`models.py`) for WorkerModel, DiffResult, Envelope
* A revised `server.py` that:
* uses numeric employee IDs
* returns consistent envelopes
* includes `dry_run_compare`
* includes redaction + rate-limit simulation toggles
* A `/tests` folder with the manager tests + “break tests” script
Just tell me whether you prefer **Pydantic v1 or v2** (most orgs are v2 now, but Ill match your repo dependency reality).

View File

@ -0,0 +1,3 @@
# Breadcrumbs
- Folder contains info and ideas to get back on the path after vacation.