feat: Add error logging and fix incremental migration state tracking

Implement comprehensive error handling and fix state management bug in incremental migration:

Error Logging System:
- Add validation for consolidation keys (NULL dates, empty IDs, corrupted Java strings)
- Log invalid keys to dedicated error files with detailed reasons
- Full migration: migration_errors_<table>_<partition>.log
- Incremental migration: migration_errors_<table>_incremental_<timestamp>.log (timestamped to preserve history)
- Report total count of skipped invalid keys at migration completion
- Auto-delete empty error log files

State Tracking Fix:
- Fix critical bug where last_key wasn't updated after final buffer flush
- Track last_processed_key throughout migration loop
- Update state both during periodic flushes and after final flush
- Ensures incremental migration correctly resumes from last migrated key

Validation Checks:
- EventDate IS NULL or EventDate = '0000-00-00'
- EventTime IS NULL
- ToolNameID IS NULL or empty string
- UnitName IS NULL or empty string
- UnitName starting with '[L' (corrupted Java strings)

Documentation:
- Update README.md with error logging behavior
- Update MIGRATION_WORKFLOW.md with validation details
- Update CHANGELOG.md with new features and fixes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-01 19:49:44 +01:00
parent 03e39eb925
commit 23e9fc9d82
5 changed files with 142 additions and 18 deletions

View File

@@ -8,6 +8,8 @@
- **State management in PostgreSQL**: Replaced JSON file with `migration_state` table for more reliable tracking
- **Sync utility**: Added `scripts/sync_migration_state.py` to sync state with actual data
- **Performance optimization**: MySQL queries now instant using PRIMARY KEY filter
- **Data quality validation**: Automatically validates and logs invalid consolidation keys to dedicated error files
- **Error logging**: Invalid keys (null dates, empty tool IDs, corrupted Java strings) are logged and skipped during migration
- **Better documentation**: Consolidated and updated all documentation files
### Changed
@@ -35,6 +37,33 @@
- **State synchronization**: Can now sync `migration_state` with actual data using utility script
- **Duplicate handling**: Uses `ON CONFLICT DO NOTHING` to prevent duplicates
- **Last key tracking**: Properly updates global state after full migration
- **Corrupted data handling**: Both full and incremental migrations now validate keys and log errors instead of crashing
### Error Logging
Both full and incremental migrations now handle corrupted consolidation keys gracefully:
**Error files:**
- Full migration: `migration_errors_<table>_<partition>.log` (e.g., `migration_errors_rawdatacor_p2024.log`)
- Incremental migration: `migration_errors_<table>_incremental_<timestamp>.log` (e.g., `migration_errors_rawdatacor_incremental_20260101_194500.log`)
Each incremental migration creates a new timestamped file to preserve error history across runs.
**File format:**
```
# Migration errors for <table> partition <partition>
# Format: UnitName|ToolNameID|EventDate|EventTime|Reason
ID0350||0000-00-00|0:00:00|EventDate is invalid: 0000-00-00
[Ljava.lang.String;@abc123|TOOL1|2024-01-01|10:00:00|UnitName is corrupted Java string: [Ljava.lang.String;@abc123
UNIT1||2024-01-01|10:00:00|ToolNameID is NULL or empty
```
**Behavior:**
- Invalid keys are automatically skipped to prevent migration failure
- Each skipped key is logged with the reason for rejection
- Total count of skipped keys is reported at the end of migration
- Empty error files (no errors) are automatically deleted
### Migration Guide (from old to new)