docs(jobs): Phase 8/9/10 cutover runbook

Documents:
- Phase 8: 5-day E2E test plan on entech-clone (snapshot, migration,
  audits, smoke tests, rollback test, sign-off criteria)
- Phase 9: Cutover weekend runbook (Friday 6pm stop → Sunday buffer
  → Monday 7am operators back). 4 hours active work.
- Phase 10: 2-week burn-in monitoring + rollback safety net + Day
  14 snapshot drop. Bridge_mrp deprecation options.
- Phase-end polish task list (deferred Minor items from Phase 1-7
  reviews + the Phase 6 operator UI rewrite).
- Communication templates (operator email, manager briefing).
- Open decisions for user before Phase 9 starts.
- File checklist confirming all Phase 1-7 deliverables present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
gsinghpal
2026-04-25 00:17:57 -04:00
parent f9fab699d4
commit f2f98aa9f6

View File

@@ -0,0 +1,385 @@
# Native Job Model — Cutover Runbook (Phases 8, 9, 10)
**Date:** 2026-04-25
**Owner:** Nexa Systems
**Status:** Draft. Verify each step on entech-clone before live cutover.
**Predecessor:** Phases 17 complete (commits up to current HEAD on
`feat/fp-native-job-model`). Spec:
`docs/superpowers/specs/2026-04-25-fp-native-job-model-design.md`. Plan:
`docs/superpowers/plans/2026-04-25-fp-native-job-model.md`.
This runbook covers the operational phases of the migration:
- **Phase 8** — End-to-end testing on a clone of entech (~5 days)
- **Phase 9** — Live cutover weekend (4 hour window)
- **Phase 10** — 2-week burn-in with rollback safety net
---
## Phase 8 — E2E testing on entech-clone (5 days)
### 8.1 Prepare the clone
1. **Snapshot live entech:** `pct snapshot 111 pre_fp_jobs_clone` on pve-worker5.
2. **Spin up a sibling LXC** (e.g. `entech-clone` at LXC 511 / pve-worker5).
- Restore from the snapshot
- Configure new IP: 10.200.1.27 (so it doesn't compete with live entech 10.200.1.26)
- Update `odoo.conf` to a separate database name e.g. `admin_clone`
3. **Update Tailscale:** add `entech-clone` to your Tailscale ACL so SSH works.
4. **Verify clone independence:** any DB writes on entech-clone must NOT bleed
to live entech. Different DB name, different IP.
### 8.2 Pre-migration audit
Run on entech-clone:
```bash
ssh pve-worker5 "pct exec 511 -- bash -c 'su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin_clone\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/audit_pre_migration.py"
```
Expected output: counts of MOs, WOs, dependent records, data quality flags.
**Capture the baseline numbers** in `phase8_baseline.txt` for diffing later.
### 8.3 Run migration
```bash
ssh pve-worker5 "pct exec 511 -- bash -c 'su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin_clone\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/migrate_to_fp_jobs.py"
```
Watch for errors in the output. Audit log at `/tmp/fp_jobs_migration.log`.
### 8.4 Post-migration audit
```bash
ssh pve-worker5 "pct exec 511 -- bash -c 'su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin_clone\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/audit_post_migration.py"
```
Verify:
- `fp.job` count == `mrp.production` count (every MO has a mirror)
- `fp.job.step` count == `mrp.workorder` count
- Dependent x_fc_*_id counts match production_id / workorder_id counts
If any mismatch, dig into the audit log for errors.
### 8.5 Smoke test the new flow
Manual on the clone via browser:
1. Toggle `x_fc_use_native_jobs=True` in Settings → Fusion Plating Jobs.
2. Create a new SO with a plating line.
3. Confirm the SO. Verify a `WH/JOB/...` record appears in **Plating Jobs (new)** menu.
4. Verify the recipe steps generated correctly.
5. Open a step, click Start, then Finish. Verify timelog row, duration_actual,
cost_total all populate.
6. Print the new Job Sticker (6×4"). Verify QR scans to `/fp/job/<id>` and
redirects to the form.
7. Print the Job Traveller. Verify all steps listed.
8. Click **Mark Done** on the job. Verify state=done, draft delivery created,
draft cert created (best-effort).
### 8.6 Replay 30 days of activity
Identify the last 30 days of MO activity on entech (pre-clone) and replay
those operator actions through the new flow on the clone. Look for:
- Operations that succeeded on the legacy flow but error on native
- Reports that render differently
- Cost / margin numbers that differ between legacy and native
Diff certificates byte-for-byte: render 100 random CoC PDFs on legacy and on
migrated native job. They should be visually identical. Any differences are
audit-grade red flags (Nadcap / aerospace).
### 8.7 Performance baseline
Measure on the clone:
- Plant Overview load time with N active steps (grouped by work_centre)
- Job form open time with 50-step recipe
- Job traveller PDF render time
- Job sticker PDF render time
- Migration script runtime (target: < 30 min on entech-scale data)
If anything is significantly slower than the legacy MO/WO flow, investigate
indexes (M2M tables, related stores) before cutover.
### 8.8 Rollback test
On the clone, simulate a rollback:
1. Restore the pre-cutover snapshot.
2. Verify legacy MO/WO data is intact.
3. Verify the `fusion_plating_jobs` module is still installed but inert
(flag is False).
4. Verify nothing in bridge_mrp / fusion_plating_reports / shopfloor /
notifications regressed.
Rollback safety is the most important thing to prove before live cutover.
### 8.9 Sign-off criteria
Before scheduling Phase 9:
- [ ] All Phase 1+2 tests pass (50+ tests)
- [ ] Migration script runs cleanly on clone with 0 errors in audit log
- [ ] Pre/post audit counts match
- [ ] 100 sample CoCs byte-identical
- [ ] All performance baselines within 20% of legacy
- [ ] Rollback test successful
If any item fails, identify the gap, fix in `feat/fp-native-job-model`, and
re-run §§ 8.28.8.
---
## Phase 9 — Cutover weekend (1 calendar day, ~4 hours active work)
### 9.1 Pre-cutover communication (T-7 days)
- Email entech operators: "Saturday MM/DD evening: ~4 hours offline for
system upgrade. Sunday morning normal."
- Brief 2-3 plating managers on the new menu and the demo path.
- Confirm Saturday on-site presence: 1 manager + 1 tech (you).
### 9.2 Friday 6pm — stop new work
- Operators wrap up active jobs. No new SO confirms. No new WOs started.
- Verify no in_progress WOs left running. Pause any timers.
### 9.3 Friday 8pm — backup
```bash
# Full DB dump
ssh pve-worker5 "pct exec 111 -- bash -c 'su - postgres -c \"pg_dump admin\" > /var/backups/admin_pre_fp_jobs_$(date +%Y%m%d).sql'"
# Filesystem snapshot
ssh pve-worker5 "pct snapshot 111 pre_fp_jobs_cutover"
```
Tag the current commit:
```bash
cd /Users/gurpreet/Github/Odoo-Modules
git tag -a pre-cutover-$(date +%Y%m%d) -m "Pre-cutover backup point"
git push origin pre-cutover-$(date +%Y%m%d)
```
### 9.4 Friday 9pm — deploy + migrate
1. Deploy the latest `fusion_plating_jobs` to entech (it should already be
installed from Phase 7 development; just refresh).
```bash
# Sync feat/fp-native-job-model branch state to entech if not already
# (skip if entech is already on this branch)
```
2. Update the module:
```bash
ssh pve-worker5 "pct exec 111 -- bash -c 'systemctl stop odoo && su - odoo -s /bin/bash -c \"/usr/bin/odoo -c /etc/odoo/odoo.conf -d admin -u fusion_plating_jobs --stop-after-init\" && systemctl start odoo'"
```
3. Run the migration:
```bash
ssh pve-worker5 "pct exec 111 -- bash -c 'systemctl stop odoo && su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/migrate_to_fp_jobs.py"
```
4. Verify with the post-audit script.
5. Toggle the cutover flag:
```bash
# Via odoo shell:
env['ir.config_parameter'].sudo().set_param('fusion_plating_jobs.use_native_jobs', 'True')
env.cr.commit()
```
6. Restart Odoo.
### 9.5 Friday 10pm — smoke test
Same as §8.5 but on live entech. If anything fails, restore backup
(§9.7) and abort.
### 9.6 Saturday/Sunday — buffer
Shop is offline weekends. Use the time to:
- Fix anything that surfaced during smoke test
- Run additional spot checks on historical jobs
- Verify that print menus default to the new reports for new jobs
- Test sticker scans on a phone
### 9.7 Rollback procedure (if needed by Sunday evening)
If unrecoverable issues:
```bash
# Stop Odoo
ssh pve-worker5 "pct exec 111 -- systemctl stop odoo"
# Restore DB
ssh pve-worker5 "pct exec 111 -- bash -c 'su - postgres -c \"dropdb admin && createdb admin && psql admin < /var/backups/admin_pre_fp_jobs_<date>.sql\"'"
# Or restore container snapshot (faster, but loses any post-snapshot DB writes)
ssh pve-worker5 "pct rollback 111 pre_fp_jobs_cutover"
# Start Odoo
ssh pve-worker5 "pct exec 111 -- systemctl start odoo"
# Communicate to operators that we're back on the legacy flow
```
After day 7, rollback becomes "forward fix only" — too much new shop activity
to restore.
### 9.8 Monday 7am — operators back on
- 1 manager + 1 tech on site for the first 2 hours
- Walk operators through the new menu (Plating Jobs (new) → Jobs)
- Watch for confusion or errors
- Field tickets as they come in
---
## Phase 10 — Burn-in (2 weeks calendar, ~1 day active work)
### 10.1 Daily monitoring (Days 114)
Check daily:
- Odoo error log: `tail -f /var/log/odoo/odoo-server.log | grep -i error`
- Job creation rate: `SELECT COUNT(*) FROM fp_job WHERE create_date > now() - interval '1 day'`
- Step creation rate: `SELECT COUNT(*) FROM fp_job_step WHERE create_date > now() - interval '1 day'`
- Failed lifecycle hooks: `grep -c "failed to" /var/log/odoo/odoo-server.log`
- Operator support tickets
Run audit_post_migration.py weekly to catch any drift.
### 10.2 Forward-fix
Anything that surfaces during burn-in goes through the standard PR/review
workflow on `feat/fp-native-job-model` (or a new follow-up branch). The
underlying data layer is locked — fixes are mostly UI/report polish.
### 10.3 Day 14 — drop legacy snapshots
After 14 days of stable operation:
```bash
# Drop the pre-cutover snapshot
ssh pve-worker5 "pct delsnapshot 111 pre_fp_jobs_cutover"
# Optional: archive the SQL backup off-site
mv /var/backups/admin_pre_fp_jobs_*.sql /off-site/long-term-archive/
```
### 10.4 Bridge_mrp deprecation
`fusion_plating_bridge_mrp` is still installed and inert (the SO confirm
hook only fires when `x_fc_use_native_jobs=False`, which it never is post-
cutover). Options for full deprecation:
A) Leave it installed forever. Zero impact.
B) Archive (set `installable=False` in its manifest, so a future re-install
wouldn't activate it).
C) Uninstall (write a uninstall hook that drops the bridge tables but
preserves the data already migrated to fp.job).
Recommend (A) for the first 6 months, then revisit.
### 10.5 Phase-end polish
The list of deferred Minor items from Phase 1-7 reviews:
- `currency_id required=True` on fp.work.centre and fp.job (and ondelete
policies on M2Os uniformly across both core and jobs)
- `tracking=True` on fp.job.manager_id, facility_id
- `digits='Product Unit of Measure'` on qty
- `_('New')` translation safety in create
- Field labels: "Reference Product" → cleaner string
- Recipe boolean tests on fp.job.step
- `index=True` on M2Os queried frequently (recipe_id, partner_id)
- Author/website/maintainer block in fusion_plating_jobs manifest
- i18n wrapping (`_()`) on user-visible strings
- `_compute_state_ready` for fp.job.step pending → ready transition (Task 1.5
TODO)
- `button_pause` / `button_skip` / `button_cancel` real implementations
- Operator UI rewrite (Plant Overview, Tablet Station, Manager Dashboard,
Process Tree OWL component) — Phase 6 deferral
These can be batched into one polish PR after burn-in completes (Day 14+).
---
## Appendix A — Communication templates
### Email to operators (T-7)
> Subject: System maintenance Saturday — ~4 hours
>
> Team — we're upgrading the Fusion Plating Jobs system Saturday MM/DD
> from 9pm Friday through Saturday morning. The shop will be offline during
> that window. By Monday 7am everything will be normal except you'll see a
> new "Plating Jobs (new)" menu in addition to the existing menus. Same data,
> better workflow. Manager + tech will be on site Monday morning to help.
>
> No action needed from you. Just don't start any new jobs after 6pm Friday.
>
> Questions? Reply or ping the manager.
### Manager briefing (T-3)
Walk through:
1. The new menu structure
2. The settings flag and how to toggle it
3. The migration script and rollback procedure
4. What to do if an operator reports a bug Monday morning
---
## Appendix B — Open decisions for the user before Phase 9
Schedule the cutover weekend with at least 4 weeks notice. Confirm:
1. Date of cutover weekend
2. Which manager will be on-site Monday morning
3. Whether to keep the legacy menus visible after cutover (recommend: yes,
for the first 14 days, then hide via group permission)
4. Whether to send the operator email template above as-is or customize
5. Acceptance criteria for "burn-in complete" (recommend: 14 days zero
critical errors, zero operator support tickets that map to migration
issues)
---
## Appendix C — File checklist before Phase 8 starts
Verify these are present (committed to feat/fp-native-job-model):
- [x] `fusion_plating_jobs/__manifest__.py` — version >= 19.0.2.0.0, depends on 9 modules
- [x] `fusion_plating_jobs/models/fp_job.py` — _inherit with all extension fields, hooks, helpers, legacy_id
- [x] `fusion_plating_jobs/models/fp_job_node_override.py` — override model
- [x] `fusion_plating_jobs/models/sale_order.py` — SO confirm hook
- [x] `fusion_plating_jobs/models/res_config_settings.py` — flag
- [x] `fusion_plating_jobs/models/fp_portal_job.py` — x_fc_job_id link
- [x] `fusion_plating_jobs/models/fp_batch.py` — x_fc_step_id / x_fc_job_id
- [x] `fusion_plating_jobs/models/fp_quality_hold.py` — x_fc_job_id / x_fc_step_id
- [x] `fusion_plating_jobs/models/fp_certificate.py` — x_fc_job_id
- [x] `fusion_plating_jobs/models/fp_thickness_reading.py` — x_fc_job_id / x_fc_step_id
- [x] `fusion_plating_jobs/models/fp_delivery.py` — x_fc_job_id
- [x] `fusion_plating_jobs/models/fp_racking_inspection.py` — x_fc_job_id
- [x] `fusion_plating_jobs/models/account_move.py` — invoice → job hook
- [x] `fusion_plating_jobs/models/fp_notification_trigger.py` — job_confirmed/job_complete events
- [x] `fusion_plating_jobs/models/fusion_plating_kpi_value.py` — x_fc_source tag
- [x] `fusion_plating_jobs/views/res_config_settings_views.xml` — settings UI
- [x] `fusion_plating_jobs/report/report_fp_job_sticker.xml` — sticker
- [x] `fusion_plating_jobs/report/report_fp_job_traveller.xml` — traveller
- [x] `fusion_plating_jobs/controllers/job_scan.py` — /fp/job/<id>
- [x] `fusion_plating_jobs/controllers/process_tree.py` — /fp/jobs/process_tree
- [x] `fusion_plating_jobs/scripts/audit_pre_migration.py`
- [x] `fusion_plating_jobs/scripts/migrate_to_fp_jobs.py`
- [x] `fusion_plating_jobs/scripts/audit_post_migration.py`
- [x] `fusion_plating_jobs/scripts/README.md`
- [x] `fusion_plating_jobs/README.md` — Phase 6 deferrals doc
- [x] `fusion_plating_jobs/security/ir.model.access.csv` — ACL rows
- [x] `fusion_plating_jobs/tests/test_fp_job_extensions.py` — comprehensive test suite
If anything in this list is missing, fix before Phase 8.