- Fix get_partitions_from_year(): RAWDATACOR end_year now 2030 (was 2024)
- RAWDATACOR has partitions d12-d17 for years 2025-2030, not just part0-part10
- Update year_to_partition_name() for RAWDATACOR: handle both part and d suffix
- Years 2014-2024: use "part" suffix with formula (year - 2014)
- Years 2025-2030: use "d" suffix with formula (year - 2013) for d12-d17
- Clamp year to range [2014, 2030] for RAWDATACOR
- Update docstring examples to reflect new mapping behavior
- Now correctly generates partitions like: part8, part9, part10, d12, d13, ..., d17
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Fix year_to_partition_name() RAWDATACOR logic: properly clamp year between 2014-2024
before calculating partition index with formula (year - 2014)
- Previously: incorrectly tried to return "d" partition type with wrong formula
- Now: correctly returns "part{year-2014}" for RAWDATACOR table
- Update docstring: clarify d17 = 2030 (not 2031) as maximum ELABDATADISP partition
- Ensure partition mapping is consistent between year_to_partition_name() and
get_partitions_from_year() functions
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Root cause: Nodes 1-11 had IDs in 132M+ range while nodes 12-22 had IDs in 298-308
range, causing them to be fetched in batches thousands apart using keyset pagination
by ID. This meant they arrived as separate groups and were never unified into a
single consolidated row.
Solution: Order MySQL query by (UnitName, ToolNameID, EventDate, EventTime) instead
of by ID. This guarantees all rows for the same consolidation key arrive together,
ensuring they are grouped and consolidated into a single row with JSONB measurements
keyed by node number.
Changes:
- fetch_consolidation_groups_from_partition(): Changed from keyset pagination by ID
to ORDER BY consolidation key. Simplify grouping logic since ORDER BY already ensures
consecutive rows have same key.
- full_migration.py: Add cleanup of partial partitions on resume. When resuming and a
partition was started but not completed, delete its incomplete data before
re-processing to avoid duplicates. Also recalculate total_rows_migrated from actual
database count.
- config.py: Add postgres_pk field to TABLE_CONFIGS to specify correct primary key
column names in PostgreSQL (id vs id_elab_data).
- Cleanup: Remove temporary test scripts used during debugging
Performance note: ORDER BY consolidation key requires index for speed. Index
(UnitName, ToolNameID, EventDate, EventTime) created with ALGORITHM=INPLACE
LOCK=NONE to avoid blocking reads.
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Changed consolidation_group_limit from 100k to 10k for faster queries.
Reverted to GROUP BY approach for getting consolidation keys:
- Uses MySQL index efficiently: (UnitName, ToolNameID, NodeNum, EventDate, EventTime)
- GROUP BY with NodeNum ensures we don't lose any combinations
- Faster GROUP BY queries than large ORDER BY queries
- Smaller LIMIT = faster pagination
This matches the original optimization suggestion and should be faster.
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Changed consolidation strategy to leverage MySQL partitioning:
- Added get_table_partitions() to list all partitions
- Added fetch_consolidation_groups_from_partition() to read groups by consolidation key
- Each group (UnitName, ToolNameID, EventDate, EventTime) is fetched completely
- All nodes of same group are consolidated into single row with JSONB measurements
- Process partitions sequentially for predictable memory usage
Key benefits:
- Guaranteed complete consolidation (no fragmentation across batches)
- Deterministic behavior - same group always consolidated together
- Better memory efficiency with partition limits (100k groups per query)
- Clear audit trail of which partition each row came from
Tested with partition d3: 6960 input rows → 100 consolidated rows (69.6:1 ratio)
with groups containing 24-72 nodes each.
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Configuration improvements:
- Set read_timeout=300 (5 minutes) to handle long queries
- Set write_timeout=300 (5 minutes) for writes
- Set max_allowed_packet=64MB to handle larger data transfers
Retry logic:
- Added retry mechanism with max 3 retries on fetch failure
- Auto-reconnect on connection loss before retry
- Better error messages showing retry attempts
This fixes the 'connection is lost' error that occurs during
long-running migrations by:
1. Giving MySQL queries more time to complete
2. Allowing larger packet sizes for bulk data
3. Automatically recovering from connection drops
Fixes: 'Connection is lost' error during full migration
- TABLE_CONFIGS now accepts both 'RAWDATACOR' and 'rawdatacor' as keys
- TABLE_CONFIGS now accepts both 'ELABDATADISP' and 'elabdatadisp' as keys
- Reuse same config dict for both cases to avoid duplication
This allows FullMigrator to work correctly when initialized with
uppercase table names from the CLI while DataTransformer works
with lowercase names.
Fixes: 'Unknown table: RAWDATACOR' error during migration
- Fix ConfigDict model_config for Pydantic v2.12+ compatibility
- Add env_file and env_file_encoding to all config classes
- Each config class now properly loads from .env with correct prefix
Fixes: ValidationError when loading settings from .env file
CLI now works correctly with 'uv run python main.py'