Files
Odoo-Modules/fusion_plating/docs/superpowers/specs/2026-04-25-fp-native-job-cutover-runbook.md
gsinghpal f2f98aa9f6 docs(jobs): Phase 8/9/10 cutover runbook
Documents:
- Phase 8: 5-day E2E test plan on entech-clone (snapshot, migration,
  audits, smoke tests, rollback test, sign-off criteria)
- Phase 9: Cutover weekend runbook (Friday 6pm stop → Sunday buffer
  → Monday 7am operators back). 4 hours active work.
- Phase 10: 2-week burn-in monitoring + rollback safety net + Day
  14 snapshot drop. Bridge_mrp deprecation options.
- Phase-end polish task list (deferred Minor items from Phase 1-7
  reviews + the Phase 6 operator UI rewrite).
- Communication templates (operator email, manager briefing).
- Open decisions for user before Phase 9 starts.
- File checklist confirming all Phase 1-7 deliverables present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 00:17:57 -04:00

14 KiB
Raw Blame History

Native Job Model — Cutover Runbook (Phases 8, 9, 10)

Date: 2026-04-25 Owner: Nexa Systems Status: Draft. Verify each step on entech-clone before live cutover. Predecessor: Phases 17 complete (commits up to current HEAD on feat/fp-native-job-model). Spec: docs/superpowers/specs/2026-04-25-fp-native-job-model-design.md. Plan: docs/superpowers/plans/2026-04-25-fp-native-job-model.md.

This runbook covers the operational phases of the migration:

  • Phase 8 — End-to-end testing on a clone of entech (~5 days)
  • Phase 9 — Live cutover weekend (4 hour window)
  • Phase 10 — 2-week burn-in with rollback safety net

Phase 8 — E2E testing on entech-clone (5 days)

8.1 Prepare the clone

  1. Snapshot live entech: pct snapshot 111 pre_fp_jobs_clone on pve-worker5.
  2. Spin up a sibling LXC (e.g. entech-clone at LXC 511 / pve-worker5).
    • Restore from the snapshot
    • Configure new IP: 10.200.1.27 (so it doesn't compete with live entech 10.200.1.26)
    • Update odoo.conf to a separate database name e.g. admin_clone
  3. Update Tailscale: add entech-clone to your Tailscale ACL so SSH works.
  4. Verify clone independence: any DB writes on entech-clone must NOT bleed to live entech. Different DB name, different IP.

8.2 Pre-migration audit

Run on entech-clone:

ssh pve-worker5 "pct exec 511 -- bash -c 'su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin_clone\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/audit_pre_migration.py"

Expected output: counts of MOs, WOs, dependent records, data quality flags.

Capture the baseline numbers in phase8_baseline.txt for diffing later.

8.3 Run migration

ssh pve-worker5 "pct exec 511 -- bash -c 'su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin_clone\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/migrate_to_fp_jobs.py"

Watch for errors in the output. Audit log at /tmp/fp_jobs_migration.log.

8.4 Post-migration audit

ssh pve-worker5 "pct exec 511 -- bash -c 'su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin_clone\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/audit_post_migration.py"

Verify:

  • fp.job count == mrp.production count (every MO has a mirror)
  • fp.job.step count == mrp.workorder count
  • Dependent x_fc_*_id counts match production_id / workorder_id counts

If any mismatch, dig into the audit log for errors.

8.5 Smoke test the new flow

Manual on the clone via browser:

  1. Toggle x_fc_use_native_jobs=True in Settings → Fusion Plating Jobs.
  2. Create a new SO with a plating line.
  3. Confirm the SO. Verify a WH/JOB/... record appears in Plating Jobs (new) menu.
  4. Verify the recipe steps generated correctly.
  5. Open a step, click Start, then Finish. Verify timelog row, duration_actual, cost_total all populate.
  6. Print the new Job Sticker (6×4"). Verify QR scans to /fp/job/<id> and redirects to the form.
  7. Print the Job Traveller. Verify all steps listed.
  8. Click Mark Done on the job. Verify state=done, draft delivery created, draft cert created (best-effort).

8.6 Replay 30 days of activity

Identify the last 30 days of MO activity on entech (pre-clone) and replay those operator actions through the new flow on the clone. Look for:

  • Operations that succeeded on the legacy flow but error on native
  • Reports that render differently
  • Cost / margin numbers that differ between legacy and native

Diff certificates byte-for-byte: render 100 random CoC PDFs on legacy and on migrated native job. They should be visually identical. Any differences are audit-grade red flags (Nadcap / aerospace).

8.7 Performance baseline

Measure on the clone:

  • Plant Overview load time with N active steps (grouped by work_centre)
  • Job form open time with 50-step recipe
  • Job traveller PDF render time
  • Job sticker PDF render time
  • Migration script runtime (target: < 30 min on entech-scale data)

If anything is significantly slower than the legacy MO/WO flow, investigate indexes (M2M tables, related stores) before cutover.

8.8 Rollback test

On the clone, simulate a rollback:

  1. Restore the pre-cutover snapshot.
  2. Verify legacy MO/WO data is intact.
  3. Verify the fusion_plating_jobs module is still installed but inert (flag is False).
  4. Verify nothing in bridge_mrp / fusion_plating_reports / shopfloor / notifications regressed.

Rollback safety is the most important thing to prove before live cutover.

8.9 Sign-off criteria

Before scheduling Phase 9:

  • All Phase 1+2 tests pass (50+ tests)
  • Migration script runs cleanly on clone with 0 errors in audit log
  • Pre/post audit counts match
  • 100 sample CoCs byte-identical
  • All performance baselines within 20% of legacy
  • Rollback test successful

If any item fails, identify the gap, fix in feat/fp-native-job-model, and re-run §§ 8.28.8.


Phase 9 — Cutover weekend (1 calendar day, ~4 hours active work)

9.1 Pre-cutover communication (T-7 days)

  • Email entech operators: "Saturday MM/DD evening: ~4 hours offline for system upgrade. Sunday morning normal."
  • Brief 2-3 plating managers on the new menu and the demo path.
  • Confirm Saturday on-site presence: 1 manager + 1 tech (you).

9.2 Friday 6pm — stop new work

  • Operators wrap up active jobs. No new SO confirms. No new WOs started.
  • Verify no in_progress WOs left running. Pause any timers.

9.3 Friday 8pm — backup

# Full DB dump
ssh pve-worker5 "pct exec 111 -- bash -c 'su - postgres -c \"pg_dump admin\" > /var/backups/admin_pre_fp_jobs_$(date +%Y%m%d).sql'"

# Filesystem snapshot
ssh pve-worker5 "pct snapshot 111 pre_fp_jobs_cutover"

Tag the current commit:

cd /Users/gurpreet/Github/Odoo-Modules
git tag -a pre-cutover-$(date +%Y%m%d) -m "Pre-cutover backup point"
git push origin pre-cutover-$(date +%Y%m%d)

9.4 Friday 9pm — deploy + migrate

  1. Deploy the latest fusion_plating_jobs to entech (it should already be installed from Phase 7 development; just refresh).
# Sync feat/fp-native-job-model branch state to entech if not already
# (skip if entech is already on this branch)
  1. Update the module:
ssh pve-worker5 "pct exec 111 -- bash -c 'systemctl stop odoo && su - odoo -s /bin/bash -c \"/usr/bin/odoo -c /etc/odoo/odoo.conf -d admin -u fusion_plating_jobs --stop-after-init\" && systemctl start odoo'"
  1. Run the migration:
ssh pve-worker5 "pct exec 111 -- bash -c 'systemctl stop odoo && su - odoo -s /bin/bash -c \"/usr/bin/odoo shell -c /etc/odoo/odoo.conf -d admin\"' < /mnt/extra-addons/custom/fusion_plating_jobs/scripts/migrate_to_fp_jobs.py"
  1. Verify with the post-audit script.

  2. Toggle the cutover flag:

# Via odoo shell:
env['ir.config_parameter'].sudo().set_param('fusion_plating_jobs.use_native_jobs', 'True')
env.cr.commit()
  1. Restart Odoo.

9.5 Friday 10pm — smoke test

Same as §8.5 but on live entech. If anything fails, restore backup (§9.7) and abort.

9.6 Saturday/Sunday — buffer

Shop is offline weekends. Use the time to:

  • Fix anything that surfaced during smoke test
  • Run additional spot checks on historical jobs
  • Verify that print menus default to the new reports for new jobs
  • Test sticker scans on a phone

9.7 Rollback procedure (if needed by Sunday evening)

If unrecoverable issues:

# Stop Odoo
ssh pve-worker5 "pct exec 111 -- systemctl stop odoo"

# Restore DB
ssh pve-worker5 "pct exec 111 -- bash -c 'su - postgres -c \"dropdb admin && createdb admin && psql admin < /var/backups/admin_pre_fp_jobs_<date>.sql\"'"

# Or restore container snapshot (faster, but loses any post-snapshot DB writes)
ssh pve-worker5 "pct rollback 111 pre_fp_jobs_cutover"

# Start Odoo
ssh pve-worker5 "pct exec 111 -- systemctl start odoo"

# Communicate to operators that we're back on the legacy flow

After day 7, rollback becomes "forward fix only" — too much new shop activity to restore.

9.8 Monday 7am — operators back on

  • 1 manager + 1 tech on site for the first 2 hours
  • Walk operators through the new menu (Plating Jobs (new) → Jobs)
  • Watch for confusion or errors
  • Field tickets as they come in

Phase 10 — Burn-in (2 weeks calendar, ~1 day active work)

10.1 Daily monitoring (Days 114)

Check daily:

  • Odoo error log: tail -f /var/log/odoo/odoo-server.log | grep -i error
  • Job creation rate: SELECT COUNT(*) FROM fp_job WHERE create_date > now() - interval '1 day'
  • Step creation rate: SELECT COUNT(*) FROM fp_job_step WHERE create_date > now() - interval '1 day'
  • Failed lifecycle hooks: grep -c "failed to" /var/log/odoo/odoo-server.log
  • Operator support tickets

Run audit_post_migration.py weekly to catch any drift.

10.2 Forward-fix

Anything that surfaces during burn-in goes through the standard PR/review workflow on feat/fp-native-job-model (or a new follow-up branch). The underlying data layer is locked — fixes are mostly UI/report polish.

10.3 Day 14 — drop legacy snapshots

After 14 days of stable operation:

# Drop the pre-cutover snapshot
ssh pve-worker5 "pct delsnapshot 111 pre_fp_jobs_cutover"

# Optional: archive the SQL backup off-site
mv /var/backups/admin_pre_fp_jobs_*.sql /off-site/long-term-archive/

10.4 Bridge_mrp deprecation

fusion_plating_bridge_mrp is still installed and inert (the SO confirm hook only fires when x_fc_use_native_jobs=False, which it never is post- cutover). Options for full deprecation:

A) Leave it installed forever. Zero impact. B) Archive (set installable=False in its manifest, so a future re-install wouldn't activate it). C) Uninstall (write a uninstall hook that drops the bridge tables but preserves the data already migrated to fp.job).

Recommend (A) for the first 6 months, then revisit.

10.5 Phase-end polish

The list of deferred Minor items from Phase 1-7 reviews:

  • currency_id required=True on fp.work.centre and fp.job (and ondelete policies on M2Os uniformly across both core and jobs)
  • tracking=True on fp.job.manager_id, facility_id
  • digits='Product Unit of Measure' on qty
  • _('New') translation safety in create
  • Field labels: "Reference Product" → cleaner string
  • Recipe boolean tests on fp.job.step
  • index=True on M2Os queried frequently (recipe_id, partner_id)
  • Author/website/maintainer block in fusion_plating_jobs manifest
  • i18n wrapping (_()) on user-visible strings
  • _compute_state_ready for fp.job.step pending → ready transition (Task 1.5 TODO)
  • button_pause / button_skip / button_cancel real implementations
  • Operator UI rewrite (Plant Overview, Tablet Station, Manager Dashboard, Process Tree OWL component) — Phase 6 deferral

These can be batched into one polish PR after burn-in completes (Day 14+).


Appendix A — Communication templates

Email to operators (T-7)

Subject: System maintenance Saturday — ~4 hours

Team — we're upgrading the Fusion Plating Jobs system Saturday MM/DD from 9pm Friday through Saturday morning. The shop will be offline during that window. By Monday 7am everything will be normal except you'll see a new "Plating Jobs (new)" menu in addition to the existing menus. Same data, better workflow. Manager + tech will be on site Monday morning to help.

No action needed from you. Just don't start any new jobs after 6pm Friday.

Questions? Reply or ping the manager.

Manager briefing (T-3)

Walk through:

  1. The new menu structure
  2. The settings flag and how to toggle it
  3. The migration script and rollback procedure
  4. What to do if an operator reports a bug Monday morning

Appendix B — Open decisions for the user before Phase 9

Schedule the cutover weekend with at least 4 weeks notice. Confirm:

  1. Date of cutover weekend
  2. Which manager will be on-site Monday morning
  3. Whether to keep the legacy menus visible after cutover (recommend: yes, for the first 14 days, then hide via group permission)
  4. Whether to send the operator email template above as-is or customize
  5. Acceptance criteria for "burn-in complete" (recommend: 14 days zero critical errors, zero operator support tickets that map to migration issues)

Appendix C — File checklist before Phase 8 starts

Verify these are present (committed to feat/fp-native-job-model):

  • fusion_plating_jobs/__manifest__.py — version >= 19.0.2.0.0, depends on 9 modules
  • fusion_plating_jobs/models/fp_job.py — _inherit with all extension fields, hooks, helpers, legacy_id
  • fusion_plating_jobs/models/fp_job_node_override.py — override model
  • fusion_plating_jobs/models/sale_order.py — SO confirm hook
  • fusion_plating_jobs/models/res_config_settings.py — flag
  • fusion_plating_jobs/models/fp_portal_job.py — x_fc_job_id link
  • fusion_plating_jobs/models/fp_batch.py — x_fc_step_id / x_fc_job_id
  • fusion_plating_jobs/models/fp_quality_hold.py — x_fc_job_id / x_fc_step_id
  • fusion_plating_jobs/models/fp_certificate.py — x_fc_job_id
  • fusion_plating_jobs/models/fp_thickness_reading.py — x_fc_job_id / x_fc_step_id
  • fusion_plating_jobs/models/fp_delivery.py — x_fc_job_id
  • fusion_plating_jobs/models/fp_racking_inspection.py — x_fc_job_id
  • fusion_plating_jobs/models/account_move.py — invoice → job hook
  • fusion_plating_jobs/models/fp_notification_trigger.py — job_confirmed/job_complete events
  • fusion_plating_jobs/models/fusion_plating_kpi_value.py — x_fc_source tag
  • fusion_plating_jobs/views/res_config_settings_views.xml — settings UI
  • fusion_plating_jobs/report/report_fp_job_sticker.xml — sticker
  • fusion_plating_jobs/report/report_fp_job_traveller.xml — traveller
  • fusion_plating_jobs/controllers/job_scan.py — /fp/job/
  • fusion_plating_jobs/controllers/process_tree.py — /fp/jobs/process_tree
  • fusion_plating_jobs/scripts/audit_pre_migration.py
  • fusion_plating_jobs/scripts/migrate_to_fp_jobs.py
  • fusion_plating_jobs/scripts/audit_post_migration.py
  • fusion_plating_jobs/scripts/README.md
  • fusion_plating_jobs/README.md — Phase 6 deferrals doc
  • fusion_plating_jobs/security/ir.model.access.csv — ACL rows
  • fusion_plating_jobs/tests/test_fp_job_extensions.py — comprehensive test suite

If anything in this list is missing, fix before Phase 8.