Files
Odoo-Modules/fusion_plating/docs/superpowers/specs/2026-04-25-fp-native-job-cutover-runbook.md
gsinghpal f2f98aa9f6 docs(jobs): Phase 8/9/10 cutover runbook
Documents:
- Phase 8: 5-day E2E test plan on entech-clone (snapshot, migration,
  audits, smoke tests, rollback test, sign-off criteria)
- Phase 9: Cutover weekend runbook (Friday 6pm stop → Sunday buffer
  → Monday 7am operators back). 4 hours active work.
- Phase 10: 2-week burn-in monitoring + rollback safety net + Day
  14 snapshot drop. Bridge_mrp deprecation options.
- Phase-end polish task list (deferred Minor items from Phase 1-7
  reviews + the Phase 6 operator UI rewrite).
- Communication templates (operator email, manager briefing).
- Open decisions for user before Phase 9 starts.
- File checklist confirming all Phase 1-7 deliverables present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 00:17:57 -04:00

386 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Native Job Model — Cutover Runbook (Phases 8, 9, 10)
**Date:** 2026-04-25
**Owner:** Nexa Systems
**Status:** Draft. Verify each step on entech-clone before live cutover.
**Predecessor:** Phases 17 complete (commits up to current HEAD on
`feat/fp-native-job-model`). Spec:
`docs/superpowers/specs/2026-04-25-fp-native-job-model-design.md`. Plan:
`docs/superpowers/plans/2026-04-25-fp-native-job-model.md`.
This runbook covers the operational phases of the migration:
- **Phase 8** — End-to-end testing on a clone of entech (~5 days)
- **Phase 9** — Live cutover weekend (4 hour window)
- **Phase 10** — 2-week burn-in with rollback safety net
---
## Phase 8 — E2E testing on entech-clone (5 days)
### 8.1 Prepare the clone
1. **Snapshot live entech:** `pct snapshot 111 pre_fp_jobs_clone` on pve-worker5.
2. **Spin up a sibling LXC** (e.g. `entech-clone` at LXC 511 / pve-worker5).
- Restore from the snapshot
- Configure new IP: 10.200.1.27 (so it doesn't compete with live entech 10.200.1.26)
- Update `odoo.conf` to a separate database name e.g. `admin_clone`
3. **Update Tailscale:** add `entech-clone` to your Tailscale ACL so SSH works.
4. **Verify clone independence:** any DB writes on entech-clone must NOT bleed
to live entech. Different DB name, different IP.
### 8.2 Pre-migration audit
Run on entech-clone:
```bash
ssh pve-worker5 "pct exec 511 -- bash -c 'su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin_clone\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/audit_pre_migration.py"
```
Expected output: counts of MOs, WOs, dependent records, data quality flags.
**Capture the baseline numbers** in `phase8_baseline.txt` for diffing later.
### 8.3 Run migration
```bash
ssh pve-worker5 "pct exec 511 -- bash -c 'su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin_clone\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/migrate_to_fp_jobs.py"
```
Watch for errors in the output. Audit log at `/tmp/fp_jobs_migration.log`.
### 8.4 Post-migration audit
```bash
ssh pve-worker5 "pct exec 511 -- bash -c 'su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin_clone\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/audit_post_migration.py"
```
Verify:
- `fp.job` count == `mrp.production` count (every MO has a mirror)
- `fp.job.step` count == `mrp.workorder` count
- Dependent x_fc_*_id counts match production_id / workorder_id counts
If any mismatch, dig into the audit log for errors.
### 8.5 Smoke test the new flow
Manual on the clone via browser:
1. Toggle `x_fc_use_native_jobs=True` in Settings → Fusion Plating Jobs.
2. Create a new SO with a plating line.
3. Confirm the SO. Verify a `WH/JOB/...` record appears in **Plating Jobs (new)** menu.
4. Verify the recipe steps generated correctly.
5. Open a step, click Start, then Finish. Verify timelog row, duration_actual,
cost_total all populate.
6. Print the new Job Sticker (6×4"). Verify QR scans to `/fp/job/<id>` and
redirects to the form.
7. Print the Job Traveller. Verify all steps listed.
8. Click **Mark Done** on the job. Verify state=done, draft delivery created,
draft cert created (best-effort).
### 8.6 Replay 30 days of activity
Identify the last 30 days of MO activity on entech (pre-clone) and replay
those operator actions through the new flow on the clone. Look for:
- Operations that succeeded on the legacy flow but error on native
- Reports that render differently
- Cost / margin numbers that differ between legacy and native
Diff certificates byte-for-byte: render 100 random CoC PDFs on legacy and on
migrated native job. They should be visually identical. Any differences are
audit-grade red flags (Nadcap / aerospace).
### 8.7 Performance baseline
Measure on the clone:
- Plant Overview load time with N active steps (grouped by work_centre)
- Job form open time with 50-step recipe
- Job traveller PDF render time
- Job sticker PDF render time
- Migration script runtime (target: < 30 min on entech-scale data)
If anything is significantly slower than the legacy MO/WO flow, investigate
indexes (M2M tables, related stores) before cutover.
### 8.8 Rollback test
On the clone, simulate a rollback:
1. Restore the pre-cutover snapshot.
2. Verify legacy MO/WO data is intact.
3. Verify the `fusion_plating_jobs` module is still installed but inert
(flag is False).
4. Verify nothing in bridge_mrp / fusion_plating_reports / shopfloor /
notifications regressed.
Rollback safety is the most important thing to prove before live cutover.
### 8.9 Sign-off criteria
Before scheduling Phase 9:
- [ ] All Phase 1+2 tests pass (50+ tests)
- [ ] Migration script runs cleanly on clone with 0 errors in audit log
- [ ] Pre/post audit counts match
- [ ] 100 sample CoCs byte-identical
- [ ] All performance baselines within 20% of legacy
- [ ] Rollback test successful
If any item fails, identify the gap, fix in `feat/fp-native-job-model`, and
re-run §§ 8.28.8.
---
## Phase 9 — Cutover weekend (1 calendar day, ~4 hours active work)
### 9.1 Pre-cutover communication (T-7 days)
- Email entech operators: "Saturday MM/DD evening: ~4 hours offline for
system upgrade. Sunday morning normal."
- Brief 2-3 plating managers on the new menu and the demo path.
- Confirm Saturday on-site presence: 1 manager + 1 tech (you).
### 9.2 Friday 6pm — stop new work
- Operators wrap up active jobs. No new SO confirms. No new WOs started.
- Verify no in_progress WOs left running. Pause any timers.
### 9.3 Friday 8pm — backup
```bash
# Full DB dump
ssh pve-worker5 "pct exec 111 -- bash -c 'su - postgres -c \"pg_dump admin\" > /var/backups/admin_pre_fp_jobs_$(date +%Y%m%d).sql'"
# Filesystem snapshot
ssh pve-worker5 "pct snapshot 111 pre_fp_jobs_cutover"
```
Tag the current commit:
```bash
cd /Users/gurpreet/Github/Odoo-Modules
git tag -a pre-cutover-$(date +%Y%m%d) -m "Pre-cutover backup point"
git push origin pre-cutover-$(date +%Y%m%d)
```
### 9.4 Friday 9pm — deploy + migrate
1. Deploy the latest `fusion_plating_jobs` to entech (it should already be
installed from Phase 7 development; just refresh).
```bash
# Sync feat/fp-native-job-model branch state to entech if not already
# (skip if entech is already on this branch)
```
2. Update the module:
```bash
ssh pve-worker5 "pct exec 111 -- bash -c 'systemctl stop odoo && su - odoo -s /bin/bash -c \"/usr/bin/odoo -c /etc/odoo/odoo.conf -d admin -u fusion_plating_jobs --stop-after-init\" && systemctl start odoo'"
```
3. Run the migration:
```bash
ssh pve-worker5 "pct exec 111 -- bash -c 'systemctl stop odoo && su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/migrate_to_fp_jobs.py"
```
4. Verify with the post-audit script.
5. Toggle the cutover flag:
```bash
# Via odoo shell:
env['ir.config_parameter'].sudo().set_param('fusion_plating_jobs.use_native_jobs', 'True')
env.cr.commit()
```
6. Restart Odoo.
### 9.5 Friday 10pm — smoke test
Same as §8.5 but on live entech. If anything fails, restore backup
(§9.7) and abort.
### 9.6 Saturday/Sunday — buffer
Shop is offline weekends. Use the time to:
- Fix anything that surfaced during smoke test
- Run additional spot checks on historical jobs
- Verify that print menus default to the new reports for new jobs
- Test sticker scans on a phone
### 9.7 Rollback procedure (if needed by Sunday evening)
If unrecoverable issues:
```bash
# Stop Odoo
ssh pve-worker5 "pct exec 111 -- systemctl stop odoo"
# Restore DB
ssh pve-worker5 "pct exec 111 -- bash -c 'su - postgres -c \"dropdb admin && createdb admin && psql admin < /var/backups/admin_pre_fp_jobs_<date>.sql\"'"
# Or restore container snapshot (faster, but loses any post-snapshot DB writes)
ssh pve-worker5 "pct rollback 111 pre_fp_jobs_cutover"
# Start Odoo
ssh pve-worker5 "pct exec 111 -- systemctl start odoo"
# Communicate to operators that we're back on the legacy flow
```
After day 7, rollback becomes "forward fix only" — too much new shop activity
to restore.
### 9.8 Monday 7am — operators back on
- 1 manager + 1 tech on site for the first 2 hours
- Walk operators through the new menu (Plating Jobs (new) → Jobs)
- Watch for confusion or errors
- Field tickets as they come in
---
## Phase 10 — Burn-in (2 weeks calendar, ~1 day active work)
### 10.1 Daily monitoring (Days 114)
Check daily:
- Odoo error log: `tail -f /var/log/odoo/odoo-server.log | grep -i error`
- Job creation rate: `SELECT COUNT(*) FROM fp_job WHERE create_date > now() - interval '1 day'`
- Step creation rate: `SELECT COUNT(*) FROM fp_job_step WHERE create_date > now() - interval '1 day'`
- Failed lifecycle hooks: `grep -c "failed to" /var/log/odoo/odoo-server.log`
- Operator support tickets
Run audit_post_migration.py weekly to catch any drift.
### 10.2 Forward-fix
Anything that surfaces during burn-in goes through the standard PR/review
workflow on `feat/fp-native-job-model` (or a new follow-up branch). The
underlying data layer is locked — fixes are mostly UI/report polish.
### 10.3 Day 14 — drop legacy snapshots
After 14 days of stable operation:
```bash
# Drop the pre-cutover snapshot
ssh pve-worker5 "pct delsnapshot 111 pre_fp_jobs_cutover"
# Optional: archive the SQL backup off-site
mv /var/backups/admin_pre_fp_jobs_*.sql /off-site/long-term-archive/
```
### 10.4 Bridge_mrp deprecation
`fusion_plating_bridge_mrp` is still installed and inert (the SO confirm
hook only fires when `x_fc_use_native_jobs=False`, which it never is post-
cutover). Options for full deprecation:
A) Leave it installed forever. Zero impact.
B) Archive (set `installable=False` in its manifest, so a future re-install
wouldn't activate it).
C) Uninstall (write a uninstall hook that drops the bridge tables but
preserves the data already migrated to fp.job).
Recommend (A) for the first 6 months, then revisit.
### 10.5 Phase-end polish
The list of deferred Minor items from Phase 1-7 reviews:
- `currency_id required=True` on fp.work.centre and fp.job (and ondelete
policies on M2Os uniformly across both core and jobs)
- `tracking=True` on fp.job.manager_id, facility_id
- `digits='Product Unit of Measure'` on qty
- `_('New')` translation safety in create
- Field labels: "Reference Product" → cleaner string
- Recipe boolean tests on fp.job.step
- `index=True` on M2Os queried frequently (recipe_id, partner_id)
- Author/website/maintainer block in fusion_plating_jobs manifest
- i18n wrapping (`_()`) on user-visible strings
- `_compute_state_ready` for fp.job.step pending → ready transition (Task 1.5
TODO)
- `button_pause` / `button_skip` / `button_cancel` real implementations
- Operator UI rewrite (Plant Overview, Tablet Station, Manager Dashboard,
Process Tree OWL component) — Phase 6 deferral
These can be batched into one polish PR after burn-in completes (Day 14+).
---
## Appendix A — Communication templates
### Email to operators (T-7)
> Subject: System maintenance Saturday — ~4 hours
>
> Team — we're upgrading the Fusion Plating Jobs system Saturday MM/DD
> from 9pm Friday through Saturday morning. The shop will be offline during
> that window. By Monday 7am everything will be normal except you'll see a
> new "Plating Jobs (new)" menu in addition to the existing menus. Same data,
> better workflow. Manager + tech will be on site Monday morning to help.
>
> No action needed from you. Just don't start any new jobs after 6pm Friday.
>
> Questions? Reply or ping the manager.
### Manager briefing (T-3)
Walk through:
1. The new menu structure
2. The settings flag and how to toggle it
3. The migration script and rollback procedure
4. What to do if an operator reports a bug Monday morning
---
## Appendix B — Open decisions for the user before Phase 9
Schedule the cutover weekend with at least 4 weeks notice. Confirm:
1. Date of cutover weekend
2. Which manager will be on-site Monday morning
3. Whether to keep the legacy menus visible after cutover (recommend: yes,
for the first 14 days, then hide via group permission)
4. Whether to send the operator email template above as-is or customize
5. Acceptance criteria for "burn-in complete" (recommend: 14 days zero
critical errors, zero operator support tickets that map to migration
issues)
---
## Appendix C — File checklist before Phase 8 starts
Verify these are present (committed to feat/fp-native-job-model):
- [x] `fusion_plating_jobs/__manifest__.py` — version >= 19.0.2.0.0, depends on 9 modules
- [x] `fusion_plating_jobs/models/fp_job.py` — _inherit with all extension fields, hooks, helpers, legacy_id
- [x] `fusion_plating_jobs/models/fp_job_node_override.py` — override model
- [x] `fusion_plating_jobs/models/sale_order.py` — SO confirm hook
- [x] `fusion_plating_jobs/models/res_config_settings.py` — flag
- [x] `fusion_plating_jobs/models/fp_portal_job.py` — x_fc_job_id link
- [x] `fusion_plating_jobs/models/fp_batch.py` — x_fc_step_id / x_fc_job_id
- [x] `fusion_plating_jobs/models/fp_quality_hold.py` — x_fc_job_id / x_fc_step_id
- [x] `fusion_plating_jobs/models/fp_certificate.py` — x_fc_job_id
- [x] `fusion_plating_jobs/models/fp_thickness_reading.py` — x_fc_job_id / x_fc_step_id
- [x] `fusion_plating_jobs/models/fp_delivery.py` — x_fc_job_id
- [x] `fusion_plating_jobs/models/fp_racking_inspection.py` — x_fc_job_id
- [x] `fusion_plating_jobs/models/account_move.py` — invoice → job hook
- [x] `fusion_plating_jobs/models/fp_notification_trigger.py` — job_confirmed/job_complete events
- [x] `fusion_plating_jobs/models/fusion_plating_kpi_value.py` — x_fc_source tag
- [x] `fusion_plating_jobs/views/res_config_settings_views.xml` — settings UI
- [x] `fusion_plating_jobs/report/report_fp_job_sticker.xml` — sticker
- [x] `fusion_plating_jobs/report/report_fp_job_traveller.xml` — traveller
- [x] `fusion_plating_jobs/controllers/job_scan.py` — /fp/job/<id>
- [x] `fusion_plating_jobs/controllers/process_tree.py` — /fp/jobs/process_tree
- [x] `fusion_plating_jobs/scripts/audit_pre_migration.py`
- [x] `fusion_plating_jobs/scripts/migrate_to_fp_jobs.py`
- [x] `fusion_plating_jobs/scripts/audit_post_migration.py`
- [x] `fusion_plating_jobs/scripts/README.md`
- [x] `fusion_plating_jobs/README.md` — Phase 6 deferrals doc
- [x] `fusion_plating_jobs/security/ir.model.access.csv` — ACL rows
- [x] `fusion_plating_jobs/tests/test_fp_job_extensions.py` — comprehensive test suite
If anything in this list is missing, fix before Phase 8.