4 Commits

Author SHA1 Message Date
d513920788 fix: Buffer incomplete groups at batch boundaries for complete consolidation
The consolidation grouping logic now properly handles rows with the same
consolidation key (UnitName, ToolNameID, EventDate, EventTime) that span
across multiple fetch batches.

Key improvements:
- Added buffering of incomplete groups at batch boundaries
- When a batch is full (has exactly limit rows), the final group is buffered
  to be prepended to the next batch, ensuring complete group consolidation
- When the final batch is reached (fewer than limit rows), all buffered and
  current groups are yielded

This ensures that all nodes with the same consolidation key are grouped
together in a single consolidated row, eliminating node fragmentation.

Added comprehensive unit tests verifying:
- Multi-node consolidation with batch boundaries
- RAWDATACOR consolidation with multiple nodes
- Groups that span batch boundaries are kept complete

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-25 22:36:15 +01:00
0f217379ea fix: Use actual PostgreSQL row count for total_rows_migrated tracking
Replace session-level counting with direct table COUNT queries to ensure
total_rows_migrated always reflects actual reality in PostgreSQL. This fixes
the discrepancy where the counter was only tracking rows from the current session
and didn't account for earlier insertions or duplicates from failed resume attempts.

Key improvements:
- Use get_row_count() after each batch to get authoritative total
- Preserve previous count on resume and accumulate across sessions
- Remove dependency on error-prone session-level counters
- Ensures migration_state.total_rows_migrated matches actual table row count

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-23 15:33:27 +01:00
b09cfcf9df fix: Add timeout settings and retry logic to MySQL connector
Configuration improvements:
- Set read_timeout=300 (5 minutes) to handle long queries
- Set write_timeout=300 (5 minutes) for writes
- Set max_allowed_packet=64MB to handle larger data transfers

Retry logic:
- Added retry mechanism with max 3 retries on fetch failure
- Auto-reconnect on connection loss before retry
- Better error messages showing retry attempts

This fixes the 'connection is lost' error that occurs during
long-running migrations by:
1. Giving MySQL queries more time to complete
2. Allowing larger packet sizes for bulk data
3. Automatically recovering from connection drops

Fixes: 'Connection is lost' error during full migration
2025-12-21 09:53:34 +01:00
fccc83eb74 docs: Add comprehensive documentation and helper scripts
Add:
- QUICKSTART.md: 5-minute quick start guide with examples
- scripts/incus_setup.sh: Automated PostgreSQL container setup
- scripts/validate_migration.sql: SQL validation queries
- scripts/setup_cron.sh: Cron job setup for incremental migrations
- tests/test_setup.py: Unit tests for configuration and transformation
- install.sh: Quick installation script

Documentation includes:
- Step-by-step setup instructions
- Example queries for RAWDATACOR and ELABDATADISP
- Troubleshooting guide
- Performance optimization tips

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 19:58:20 +01:00