Files
mysql2postgres/CHANGELOG.md
alex 23e9fc9d82 feat: Add error logging and fix incremental migration state tracking
Implement comprehensive error handling and fix state management bug in incremental migration:

Error Logging System:
- Add validation for consolidation keys (NULL dates, empty IDs, corrupted Java strings)
- Log invalid keys to dedicated error files with detailed reasons
- Full migration: migration_errors_<table>_<partition>.log
- Incremental migration: migration_errors_<table>_incremental_<timestamp>.log (timestamped to preserve history)
- Report total count of skipped invalid keys at migration completion
- Auto-delete empty error log files

State Tracking Fix:
- Fix critical bug where last_key wasn't updated after final buffer flush
- Track last_processed_key throughout migration loop
- Update state both during periodic flushes and after final flush
- Ensures incremental migration correctly resumes from last migrated key

Validation Checks:
- EventDate IS NULL or EventDate = '0000-00-00'
- EventTime IS NULL
- ToolNameID IS NULL or empty string
- UnitName IS NULL or empty string
- UnitName starting with '[L' (corrupted Java strings)

Documentation:
- Update README.md with error logging behavior
- Update MIGRATION_WORKFLOW.md with validation details
- Update CHANGELOG.md with new features and fixes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-01 19:49:44 +01:00

5.3 KiB

Changelog

[Current] - 2025-12-30

Added

  • Consolidation-based incremental migration: Uses consolidation keys (UnitName, ToolNameID, EventDate, EventTime) instead of timestamps
  • MySQL ID optimization: Uses MAX(mysql_max_id) from PostgreSQL to filter MySQL queries, avoiding full table scans
  • State management in PostgreSQL: Replaced JSON file with migration_state table for more reliable tracking
  • Sync utility: Added scripts/sync_migration_state.py to sync state with actual data
  • Performance optimization: MySQL queries now instant using PRIMARY KEY filter
  • Data quality validation: Automatically validates and logs invalid consolidation keys to dedicated error files
  • Error logging: Invalid keys (null dates, empty tool IDs, corrupted Java strings) are logged and skipped during migration
  • Better documentation: Consolidated and updated all documentation files

Changed

  • Incremental migration: Now uses consolidation keys instead of timestamp-based approach
  • Full migration: Improved to save global last_key after completing all partitions
  • State tracking: Moved from migration_state.json to PostgreSQL table migration_state
  • Query performance: Added min_mysql_id parameter to fetch_consolidation_keys_after() for optimization
  • Configuration: Renamed BATCH_SIZE to CONSOLIDATION_GROUP_LIMIT to better reflect what it controls
  • Configuration: Added PROGRESS_LOG_INTERVAL to control logging frequency
  • Configuration: Added BENCHMARK_OUTPUT_DIR to specify benchmark results directory
  • Documentation: Updated README.md, MIGRATION_WORKFLOW.md, QUICKSTART.md, EXAMPLE_WORKFLOW.md with current implementation
  • Documentation: Corrected index and partitioning documentation to reflect actual PostgreSQL schema:
    • Uses event_timestamp (not separate event_date/event_time)
    • Primary key includes event_year for partitioning
    • Consolidation key is UNIQUE (unit_name, tool_name_id, event_timestamp, event_year)

Removed

  • migration_state.json: Replaced by PostgreSQL table
  • Timestamp-based migration: Replaced by consolidation key-based approach
  • ID-based resumable migration: Consolidated into single consolidation-based approach
  • Temporary debug scripts: Cleaned up all /tmp/ debug files

Fixed

  • Incremental migration performance: MySQL queries now ~1000x faster with ID filter
  • State synchronization: Can now sync migration_state with actual data using utility script
  • Duplicate handling: Uses ON CONFLICT DO NOTHING to prevent duplicates
  • Last key tracking: Properly updates global state after full migration
  • Corrupted data handling: Both full and incremental migrations now validate keys and log errors instead of crashing

Error Logging

Both full and incremental migrations now handle corrupted consolidation keys gracefully:

Error files:

  • Full migration: migration_errors_<table>_<partition>.log (e.g., migration_errors_rawdatacor_p2024.log)
  • Incremental migration: migration_errors_<table>_incremental_<timestamp>.log (e.g., migration_errors_rawdatacor_incremental_20260101_194500.log)

Each incremental migration creates a new timestamped file to preserve error history across runs.

File format:

# Migration errors for <table> partition <partition>
# Format: UnitName|ToolNameID|EventDate|EventTime|Reason

ID0350||0000-00-00|0:00:00|EventDate is invalid: 0000-00-00
[Ljava.lang.String;@abc123|TOOL1|2024-01-01|10:00:00|UnitName is corrupted Java string: [Ljava.lang.String;@abc123
UNIT1||2024-01-01|10:00:00|ToolNameID is NULL or empty

Behavior:

  • Invalid keys are automatically skipped to prevent migration failure
  • Each skipped key is logged with the reason for rejection
  • Total count of skipped keys is reported at the end of migration
  • Empty error files (no errors) are automatically deleted

Migration Guide (from old to new)

If you have an existing installation with migration_state.json:

  1. Backup your data (optional but recommended):

    cp migration_state.json migration_state.json.backup
    
  2. Run full migration to populate migration_state table:

    python main.py migrate full
    
  3. Sync state (if you have existing data):

    python scripts/sync_migration_state.py
    
  4. Remove old state file:

    rm migration_state.json
    
  5. Run incremental migration:

    python main.py migrate incremental --dry-run
    python main.py migrate incremental
    

Performance Improvements

  • MySQL query time: From 60+ seconds to <0.1 seconds (600x faster)
  • Consolidation efficiency: Multiple MySQL rows → single PostgreSQL record
  • State reliability: PostgreSQL table instead of JSON file

Breaking Changes

  • --state-file parameter removed from incremental migration (no longer uses JSON)
  • --use-id flag removed (consolidation-based approach is now default)
  • Incremental migration requires full migration to be run first
  • BATCH_SIZE environment variable renamed to CONSOLIDATION_GROUP_LIMIT (update your .env file)

[Previous] - Before 2025-12-30

Features

  • Full migration support
  • Incremental migration with timestamp tracking
  • JSONB transformation
  • Partitioning by year
  • GIN indexes for JSONB queries
  • Benchmark system
  • Progress tracking
  • Rich logging