Root cause: Nodes 1-11 had IDs in 132M+ range while nodes 12-22 had IDs in 298-308
range, causing them to be fetched in batches thousands apart using keyset pagination
by ID. This meant they arrived as separate groups and were never unified into a
single consolidated row.
Solution: Order MySQL query by (UnitName, ToolNameID, EventDate, EventTime) instead
of by ID. This guarantees all rows for the same consolidation key arrive together,
ensuring they are grouped and consolidated into a single row with JSONB measurements
keyed by node number.
Changes:
- fetch_consolidation_groups_from_partition(): Changed from keyset pagination by ID
to ORDER BY consolidation key. Simplify grouping logic since ORDER BY already ensures
consecutive rows have same key.
- full_migration.py: Add cleanup of partial partitions on resume. When resuming and a
partition was started but not completed, delete its incomplete data before
re-processing to avoid duplicates. Also recalculate total_rows_migrated from actual
database count.
- config.py: Add postgres_pk field to TABLE_CONFIGS to specify correct primary key
column names in PostgreSQL (id vs id_elab_data).
- Cleanup: Remove temporary test scripts used during debugging
Performance note: ORDER BY consolidation key requires index for speed. Index
(UnitName, ToolNameID, EventDate, EventTime) created with ALGORITHM=INPLACE
LOCK=NONE to avoid blocking reads.
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>