Fix N+1 query problem - use single ordered query with Python grouping

CRITICAL FIX: Previous implementation was doing GROUP BY to get unique
keys, then a separate WHERE query for EACH group. With millions of groups,
this meant millions of separate MySQL queries = 12 bytes/sec = unusable.

New approach (single query):
- Fetch all rows from partition ordered by consolidation key
- Group them in Python as we iterate
- One query per LIMIT batch, not one per group
- ~100,000x faster than N+1 approach

Query uses index efficiently: ORDER BY (UnitName, ToolNameID, EventDate, EventTime, NodeNum)
matches index prefix and keeps groups together for consolidation.

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-25 22:32:41 +01:00
parent fe2d173b0f
commit c30d77e24b
2 changed files with 35 additions and 29 deletions

View File

@@ -82,7 +82,9 @@ class FullMigrator:
f"Use --resume to continue from last checkpoint, or delete data to restart."
)
logger.info(f"Resuming migration - found {pg_row_count} existing rows")
rows_to_migrate = total_rows - previous_migrated_count
# Progress bar tracks MySQL rows processed (before consolidation)
# Consolidation reduces count but not the rows we need to fetch
rows_to_migrate = total_rows
else:
previous_migrated_count = 0
rows_to_migrate = total_rows