feat: Add granular resume within partitions using last inserted ID
Problem: If migration was interrupted in the middle of processing a partition (e.g., at row 100k of 500k), resume would re-process all 100k rows, causing duplicate insertions and wasted time. Solution: 1. Modified fetch_consolidation_groups_from_partition() to accept start_id parameter 2. When resuming within the same partition, query the last inserted ID from migration_state.last_migrated_id 3. Use keyset pagination starting from (id > last_id) to skip already-processed rows 4. Added logic to detect when we're resuming within the same partition vs resuming from a new partition Flow: - If last_completed_partition < current_partition: start from beginning of partition - If last_completed_partition == current_partition: start from last_migrated_id - If last_completed_partition > current_partition: skip to next uncompleted partition This ensures resume is granular: - Won't re-insert already inserted rows within a partition - Continues exactly from where it stopped - Combines with existing partition tracking for complete accuracy 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -120,6 +120,16 @@ class FullMigrator:
|
||||
logger.info(f"[{partition_idx}/{len(partitions)}] Processing partition {partition}...")
|
||||
partition_group_count = 0
|
||||
|
||||
# Determine resume point within this partition
|
||||
# If resuming and this is the last completed partition, start from last_id
|
||||
start_id = None
|
||||
if last_completed_partition == partition and previous_migrated_count > 0:
|
||||
# For resume within same partition, we need to query the last ID inserted
|
||||
# This is a simplified approach: just continue from ID tracking
|
||||
start_id = self._get_last_migrated_id(pg_conn, pg_table)
|
||||
if start_id:
|
||||
logger.info(f"Resuming partition {partition} from ID > {start_id}")
|
||||
|
||||
# Accumulate rows for batch insertion to reduce database round-trips
|
||||
insert_buffer = []
|
||||
# Use smaller batch size for more frequent updates: batch_size * 5 = 50k rows
|
||||
@@ -130,7 +140,8 @@ class FullMigrator:
|
||||
# Each group is a list of rows with the same (unit, tool, date, time)
|
||||
for group_rows in mysql_conn.fetch_consolidation_groups_from_partition(
|
||||
mysql_table,
|
||||
partition
|
||||
partition,
|
||||
start_id=start_id
|
||||
):
|
||||
if not group_rows:
|
||||
break
|
||||
|
||||
Reference in New Issue
Block a user