Files
mysql2postgres/config.py
alex 1430ef206f fix: Ensure complete node consolidation by ordering MySQL query by consolidation key
Root cause: Nodes 1-11 had IDs in 132M+ range while nodes 12-22 had IDs in 298-308
range, causing them to be fetched in batches thousands apart using keyset pagination
by ID. This meant they arrived as separate groups and were never unified into a
single consolidated row.

Solution: Order MySQL query by (UnitName, ToolNameID, EventDate, EventTime) instead
of by ID. This guarantees all rows for the same consolidation key arrive together,
ensuring they are grouped and consolidated into a single row with JSONB measurements
keyed by node number.

Changes:
- fetch_consolidation_groups_from_partition(): Changed from keyset pagination by ID
  to ORDER BY consolidation key. Simplify grouping logic since ORDER BY already ensures
  consecutive rows have same key.
- full_migration.py: Add cleanup of partial partitions on resume. When resuming and a
  partition was started but not completed, delete its incomplete data before
  re-processing to avoid duplicates. Also recalculate total_rows_migrated from actual
  database count.
- config.py: Add postgres_pk field to TABLE_CONFIGS to specify correct primary key
  column names in PostgreSQL (id vs id_elab_data).
- Cleanup: Remove temporary test scripts used during debugging

Performance note: ORDER BY consolidation key requires index for speed. Index
(UnitName, ToolNameID, EventDate, EventTime) created with ALGORITHM=INPLACE
LOCK=NONE to avoid blocking reads.

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-26 18:22:23 +01:00

5.1 KiB