mysql2postgres

Author	SHA1	Message	Date
alex	0cb4a0f71e	fix: Update progress tracking to use MySQL row count instead of PostgreSQL count The progress bar was appearing frozen because: - Total was set to MySQL rows to process (111M) - Progress was updated by PostgreSQL rows inserted (11M after consolidation) - This created a 10:1 mismatch, making progress appear to crawl Solution: - Track progress based on MySQL rows processed (matches total) - Use batch_size (MySQL rows) instead of inserted count (PostgreSQL rows) - Change batch_max_id calculation to use original batch instead of transformed This ensures the progress bar advances at a visible rate while still maintaining accurate row count tracking from the database. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-23 15:40:50 +01:00
alex	0f217379ea	fix: Use actual PostgreSQL row count for total_rows_migrated tracking Replace session-level counting with direct table COUNT queries to ensure total_rows_migrated always reflects actual reality in PostgreSQL. This fixes the discrepancy where the counter was only tracking rows from the current session and didn't account for earlier insertions or duplicates from failed resume attempts. Key improvements: - Use get_row_count() after each batch to get authoritative total - Preserve previous count on resume and accumulate across sessions - Remove dependency on error-prone session-level counters - Ensures migration_state.total_rows_migrated matches actual table row count 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-23 15:33:27 +01:00
alex	b09cfcf9df	fix: Add timeout settings and retry logic to MySQL connector Configuration improvements: - Set read_timeout=300 (5 minutes) to handle long queries - Set write_timeout=300 (5 minutes) for writes - Set max_allowed_packet=64MB to handle larger data transfers Retry logic: - Added retry mechanism with max 3 retries on fetch failure - Auto-reconnect on connection loss before retry - Better error messages showing retry attempts This fixes the 'connection is lost' error that occurs during long-running migrations by: 1. Giving MySQL queries more time to complete 2. Allowing larger packet sizes for bulk data 3. Automatically recovering from connection drops Fixes: 'Connection is lost' error during full migration	2025-12-21 09:53:34 +01:00
alex	821cda850e	fix: Change from COPY to parameterized INSERT for batch inserts Replace cursor.copy() with cursor.executemany() for more reliable batch inserts in PostgreSQL. The COPY method has issues with format and data encoding in psycopg3. Changes: - Use executemany() with parameterized INSERT statements - Let psycopg handle parameter escaping and encoding - Convert JSONB dicts to JSON strings automatically - More compatible with various data types This ensures that data is actually being inserted into PostgreSQL during migration, fixing the issue where data wasn't appearing in the database after migration completed. Fixes: Data not being persisted in PostgreSQL during migration	2025-12-10 20:48:20 +01:00
alex	e2377d4191	fix: Add explicit commit/rollback in PostgreSQL context manager exit - On successful execution (no exception): explicitly commit before closing - On exception: explicitly rollback before closing - Add try-except to handle commit/rollback failures gracefully This ensures that all inserted data is committed to the database when the context manager exits. Previously, commits were only done per-batch in insert_batch(), but the final context exit wasn't ensuring a final commit. Fixes: Data not appearing in PostgreSQL after migration completes	2025-12-10 20:39:04 +01:00
alex	e381618255	fix: Support both uppercase and lowercase table names in TABLE_CONFIGS - TABLE_CONFIGS now accepts both 'RAWDATACOR' and 'rawdatacor' as keys - TABLE_CONFIGS now accepts both 'ELABDATADISP' and 'elabdatadisp' as keys - Reuse same config dict for both cases to avoid duplication This allows FullMigrator to work correctly when initialized with uppercase table names from the CLI while DataTransformer works with lowercase names. Fixes: 'Unknown table: RAWDATACOR' error during migration	2025-12-10 20:28:19 +01:00
alex	de6bde17c9	feat: Add sequences for auto-incrementing IDs - Create rawdatacor_id_seq for auto-increment of id column - Create elabdatadisp_id_seq for auto-increment of id_elab_data column - Both sequences use DEFAULT nextval() to auto-generate IDs on insert This replaces PRIMARY KEY functionality since PostgreSQL doesn't support PRIMARY KEY on partitioned tables with expression-based ranges. IDs are now auto-incremented without primary key constraint. Tested: schema creation works correctly with sequences	2025-12-10 20:20:52 +01:00
alex	2834f8b578	fix: Remove unsupported constraints from partitioned tables PostgreSQL doesn't support PRIMARY KEY or UNIQUE constraints on partitioned tables when using RANGE partitioning on expressions (like EXTRACT(YEAR FROM event_date)). Changed: - RAWDATACOR: removed PRIMARY KEY (id, event_date) and UNIQUE constraint - ELABDATADISP: removed PRIMARY KEY (id_elab_data, event_date) and UNIQUE constraint - Tables now have no constraints except NOT NULL on required columns This is a PostgreSQL limitation with partitioned tables. Constraints can be added per-partition if needed, but for simplicity we rely on application-level validation. Fixes: 'vincolo PRIMARY KEY non supportato con una definizione di chiave di partizione'	2025-12-10 20:18:20 +01:00
alex	410b253808	fix: Update Pydantic v2 configuration for .env loading - Fix ConfigDict model_config for Pydantic v2.12+ compatibility - Add env_file and env_file_encoding to all config classes - Each config class now properly loads from .env with correct prefix Fixes: ValidationError when loading settings from .env file CLI now works correctly with 'uv run python main.py'	2025-12-10 20:11:12 +01:00
alex	9b18db029b	docs: Add quick navigation guide (START_HERE.md)	2025-12-10 20:00:50 +01:00
alex	8e705e33da	docs: Add detailed example workflow	2025-12-10 19:59:22 +01:00
alex	38c6b4c6d8	docs: Add implementation summary	2025-12-10 19:58:49 +01:00
alex	fccc83eb74	docs: Add comprehensive documentation and helper scripts Add: - QUICKSTART.md: 5-minute quick start guide with examples - scripts/incus_setup.sh: Automated PostgreSQL container setup - scripts/validate_migration.sql: SQL validation queries - scripts/setup_cron.sh: Cron job setup for incremental migrations - tests/test_setup.py: Unit tests for configuration and transformation - install.sh: Quick installation script Documentation includes: - Step-by-step setup instructions - Example queries for RAWDATACOR and ELABDATADISP - Troubleshooting guide - Performance optimization tips 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 19:58:20 +01:00
alex	62577d3200	feat: Add MySQL to PostgreSQL migration tool with JSONB transformation Implement comprehensive migration solution with: - Full and incremental migration modes - JSONB schema transformation for RAWDATACOR and ELABDATADISP tables - Native PostgreSQL partitioning (2014-2031) - Optimized GIN indexes for JSONB queries - Rich logging with progress tracking - Complete benchmark system for MySQL vs PostgreSQL comparison - CLI interface with multiple commands (setup, migrate, benchmark) - Configuration management via .env file - Error handling and retry logic - Batch processing for performance (configurable batch size) Database transformations: - RAWDATACOR: 16 Val columns + units → single JSONB measurements - ELABDATADISP: 25+ measurement fields → structured JSONB with categories 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 19:57:11 +01:00

14 Commits