Implement comprehensive error handling and fix state management bug in incremental migration: Error Logging System: - Add validation for consolidation keys (NULL dates, empty IDs, corrupted Java strings) - Log invalid keys to dedicated error files with detailed reasons - Full migration: migration_errors_<table>_<partition>.log - Incremental migration: migration_errors_<table>_incremental_<timestamp>.log (timestamped to preserve history) - Report total count of skipped invalid keys at migration completion - Auto-delete empty error log files State Tracking Fix: - Fix critical bug where last_key wasn't updated after final buffer flush - Track last_processed_key throughout migration loop - Update state both during periodic flushes and after final flush - Ensures incremental migration correctly resumes from last migrated key Validation Checks: - EventDate IS NULL or EventDate = '0000-00-00' - EventTime IS NULL - ToolNameID IS NULL or empty string - UnitName IS NULL or empty string - UnitName starting with '[L' (corrupted Java strings) Documentation: - Update README.md with error logging behavior - Update MIGRATION_WORKFLOW.md with validation details - Update CHANGELOG.md with new features and fixes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
5.3 KiB
5.3 KiB
Changelog
[Current] - 2025-12-30
Added
- Consolidation-based incremental migration: Uses consolidation keys
(UnitName, ToolNameID, EventDate, EventTime)instead of timestamps - MySQL ID optimization: Uses
MAX(mysql_max_id)from PostgreSQL to filter MySQL queries, avoiding full table scans - State management in PostgreSQL: Replaced JSON file with
migration_statetable for more reliable tracking - Sync utility: Added
scripts/sync_migration_state.pyto sync state with actual data - Performance optimization: MySQL queries now instant using PRIMARY KEY filter
- Data quality validation: Automatically validates and logs invalid consolidation keys to dedicated error files
- Error logging: Invalid keys (null dates, empty tool IDs, corrupted Java strings) are logged and skipped during migration
- Better documentation: Consolidated and updated all documentation files
Changed
- Incremental migration: Now uses consolidation keys instead of timestamp-based approach
- Full migration: Improved to save global
last_keyafter completing all partitions - State tracking: Moved from
migration_state.jsonto PostgreSQL tablemigration_state - Query performance: Added
min_mysql_idparameter tofetch_consolidation_keys_after()for optimization - Configuration: Renamed
BATCH_SIZEtoCONSOLIDATION_GROUP_LIMITto better reflect what it controls - Configuration: Added
PROGRESS_LOG_INTERVALto control logging frequency - Configuration: Added
BENCHMARK_OUTPUT_DIRto specify benchmark results directory - Documentation: Updated README.md, MIGRATION_WORKFLOW.md, QUICKSTART.md, EXAMPLE_WORKFLOW.md with current implementation
- Documentation: Corrected index and partitioning documentation to reflect actual PostgreSQL schema:
- Uses
event_timestamp(not separate event_date/event_time) - Primary key includes
event_yearfor partitioning - Consolidation key is UNIQUE (unit_name, tool_name_id, event_timestamp, event_year)
- Uses
Removed
- migration_state.json: Replaced by PostgreSQL table
- Timestamp-based migration: Replaced by consolidation key-based approach
- ID-based resumable migration: Consolidated into single consolidation-based approach
- Temporary debug scripts: Cleaned up all
/tmp/debug files
Fixed
- Incremental migration performance: MySQL queries now ~1000x faster with ID filter
- State synchronization: Can now sync
migration_statewith actual data using utility script - Duplicate handling: Uses
ON CONFLICT DO NOTHINGto prevent duplicates - Last key tracking: Properly updates global state after full migration
- Corrupted data handling: Both full and incremental migrations now validate keys and log errors instead of crashing
Error Logging
Both full and incremental migrations now handle corrupted consolidation keys gracefully:
Error files:
- Full migration:
migration_errors_<table>_<partition>.log(e.g.,migration_errors_rawdatacor_p2024.log) - Incremental migration:
migration_errors_<table>_incremental_<timestamp>.log(e.g.,migration_errors_rawdatacor_incremental_20260101_194500.log)
Each incremental migration creates a new timestamped file to preserve error history across runs.
File format:
# Migration errors for <table> partition <partition>
# Format: UnitName|ToolNameID|EventDate|EventTime|Reason
ID0350||0000-00-00|0:00:00|EventDate is invalid: 0000-00-00
[Ljava.lang.String;@abc123|TOOL1|2024-01-01|10:00:00|UnitName is corrupted Java string: [Ljava.lang.String;@abc123
UNIT1||2024-01-01|10:00:00|ToolNameID is NULL or empty
Behavior:
- Invalid keys are automatically skipped to prevent migration failure
- Each skipped key is logged with the reason for rejection
- Total count of skipped keys is reported at the end of migration
- Empty error files (no errors) are automatically deleted
Migration Guide (from old to new)
If you have an existing installation with migration_state.json:
-
Backup your data (optional but recommended):
cp migration_state.json migration_state.json.backup -
Run full migration to populate
migration_statetable:python main.py migrate full -
Sync state (if you have existing data):
python scripts/sync_migration_state.py -
Remove old state file:
rm migration_state.json -
Run incremental migration:
python main.py migrate incremental --dry-run python main.py migrate incremental
Performance Improvements
- MySQL query time: From 60+ seconds to <0.1 seconds (600x faster)
- Consolidation efficiency: Multiple MySQL rows → single PostgreSQL record
- State reliability: PostgreSQL table instead of JSON file
Breaking Changes
--state-fileparameter removed from incremental migration (no longer uses JSON)--use-idflag removed (consolidation-based approach is now default)- Incremental migration requires full migration to be run first
BATCH_SIZEenvironment variable renamed toCONSOLIDATION_GROUP_LIMIT(update your .env file)
[Previous] - Before 2025-12-30
Features
- Full migration support
- Incremental migration with timestamp tracking
- JSONB transformation
- Partitioning by year
- GIN indexes for JSONB queries
- Benchmark system
- Progress tracking
- Rich logging