mysql2postgres

Author	SHA1	Message	Date
alex	6306006f82	Fix: Update RAWDATACOR partition end_year and mapping logic - Fix get_partitions_from_year(): RAWDATACOR end_year now 2030 (was 2024) - RAWDATACOR has partitions d12-d17 for years 2025-2030, not just part0-part10 - Update year_to_partition_name() for RAWDATACOR: handle both part and d suffix - Years 2014-2024: use "part" suffix with formula (year - 2014) - Years 2025-2030: use "d" suffix with formula (year - 2013) for d12-d17 - Clamp year to range [2014, 2030] for RAWDATACOR - Update docstring examples to reflect new mapping behavior - Now correctly generates partitions like: part8, part9, part10, d12, d13, ..., d17 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-01-11 15:55:41 +01:00
alex	53cde5f667	Fix: Correct RAWDATACOR partition mapping logic - Fix year_to_partition_name() RAWDATACOR logic: properly clamp year between 2014-2024 before calculating partition index with formula (year - 2014) - Previously: incorrectly tried to return "d" partition type with wrong formula - Now: correctly returns "part{year-2014}" for RAWDATACOR table - Update docstring: clarify d17 = 2030 (not 2031) as maximum ELABDATADISP partition - Ensure partition mapping is consistent between year_to_partition_name() and get_partitions_from_year() functions Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-01-11 15:33:08 +01:00
alex	5c9df3d06f	fix incremental	2025-12-30 15:16:54 +01:00
alex	1430ef206f	fix: Ensure complete node consolidation by ordering MySQL query by consolidation key Root cause: Nodes 1-11 had IDs in 132M+ range while nodes 12-22 had IDs in 298-308 range, causing them to be fetched in batches thousands apart using keyset pagination by ID. This meant they arrived as separate groups and were never unified into a single consolidated row. Solution: Order MySQL query by (UnitName, ToolNameID, EventDate, EventTime) instead of by ID. This guarantees all rows for the same consolidation key arrive together, ensuring they are grouped and consolidated into a single row with JSONB measurements keyed by node number. Changes: - fetch_consolidation_groups_from_partition(): Changed from keyset pagination by ID to ORDER BY consolidation key. Simplify grouping logic since ORDER BY already ensures consecutive rows have same key. - full_migration.py: Add cleanup of partial partitions on resume. When resuming and a partition was started but not completed, delete its incomplete data before re-processing to avoid duplicates. Also recalculate total_rows_migrated from actual database count. - config.py: Add postgres_pk field to TABLE_CONFIGS to specify correct primary key column names in PostgreSQL (id vs id_elab_data). - Cleanup: Remove temporary test scripts used during debugging Performance note: ORDER BY consolidation key requires index for speed. Index (UnitName, ToolNameID, EventDate, EventTime) created with ALGORITHM=INPLACE LOCK=NONE to avoid blocking reads. 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-26 18:22:23 +01:00
alex	fe2d173b0f	Optimize consolidation fetching with GROUP BY and reduced limit Changed consolidation_group_limit from 100k to 10k for faster queries. Reverted to GROUP BY approach for getting consolidation keys: - Uses MySQL index efficiently: (UnitName, ToolNameID, NodeNum, EventDate, EventTime) - GROUP BY with NodeNum ensures we don't lose any combinations - Faster GROUP BY queries than large ORDER BY queries - Smaller LIMIT = faster pagination This matches the original optimization suggestion and should be faster. 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-25 22:22:30 +01:00
alex	bb27f749a0	Implement partition-based consolidation for ELABDATADISP Changed consolidation strategy to leverage MySQL partitioning: - Added get_table_partitions() to list all partitions - Added fetch_consolidation_groups_from_partition() to read groups by consolidation key - Each group (UnitName, ToolNameID, EventDate, EventTime) is fetched completely - All nodes of same group are consolidated into single row with JSONB measurements - Process partitions sequentially for predictable memory usage Key benefits: - Guaranteed complete consolidation (no fragmentation across batches) - Deterministic behavior - same group always consolidated together - Better memory efficiency with partition limits (100k groups per query) - Clear audit trail of which partition each row came from Tested with partition d3: 6960 input rows → 100 consolidated rows (69.6:1 ratio) with groups containing 24-72 nodes each. 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-25 21:49:30 +01:00
alex	b09cfcf9df	fix: Add timeout settings and retry logic to MySQL connector Configuration improvements: - Set read_timeout=300 (5 minutes) to handle long queries - Set write_timeout=300 (5 minutes) for writes - Set max_allowed_packet=64MB to handle larger data transfers Retry logic: - Added retry mechanism with max 3 retries on fetch failure - Auto-reconnect on connection loss before retry - Better error messages showing retry attempts This fixes the 'connection is lost' error that occurs during long-running migrations by: 1. Giving MySQL queries more time to complete 2. Allowing larger packet sizes for bulk data 3. Automatically recovering from connection drops Fixes: 'Connection is lost' error during full migration	2025-12-21 09:53:34 +01:00
alex	e381618255	fix: Support both uppercase and lowercase table names in TABLE_CONFIGS - TABLE_CONFIGS now accepts both 'RAWDATACOR' and 'rawdatacor' as keys - TABLE_CONFIGS now accepts both 'ELABDATADISP' and 'elabdatadisp' as keys - Reuse same config dict for both cases to avoid duplication This allows FullMigrator to work correctly when initialized with uppercase table names from the CLI while DataTransformer works with lowercase names. Fixes: 'Unknown table: RAWDATACOR' error during migration	2025-12-10 20:28:19 +01:00
alex	410b253808	fix: Update Pydantic v2 configuration for .env loading - Fix ConfigDict model_config for Pydantic v2.12+ compatibility - Add env_file and env_file_encoding to all config classes - Each config class now properly loads from .env with correct prefix Fixes: ValidationError when loading settings from .env file CLI now works correctly with 'uv run python main.py'	2025-12-10 20:11:12 +01:00
alex	62577d3200	feat: Add MySQL to PostgreSQL migration tool with JSONB transformation Implement comprehensive migration solution with: - Full and incremental migration modes - JSONB schema transformation for RAWDATACOR and ELABDATADISP tables - Native PostgreSQL partitioning (2014-2031) - Optimized GIN indexes for JSONB queries - Rich logging with progress tracking - Complete benchmark system for MySQL vs PostgreSQL comparison - CLI interface with multiple commands (setup, migrate, benchmark) - Configuration management via .env file - Error handling and retry logic - Batch processing for performance (configurable batch size) Database transformations: - RAWDATACOR: 16 Val columns + units → single JSONB measurements - ELABDATADISP: 25+ measurement fields → structured JSONB with categories 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 19:57:11 +01:00

10 Commits