fix: Yield all groups in final batch, not just last group

Critical bug fix for missing nodes in consolidated groups.

Problem: When a partition batch contained multiple consolidation groups,
only the LAST group was being buffered/yielded, causing earlier groups to
be lost. This happened when:

1. Batch < limit rows (final batch)
2. Multiple different consolidation keys present
3. First groups were yielded correctly
4. But FINAL group was only yielded if batch == limit
5. If batch < limit, final group was discarded

Example from partition d10:
- Fetch returns 22 rows with 2 groups: (nodes 1-11) and (nodes 12-22)
- Old code: yield nodes 1-11 on key change, then didn't yield nodes 12-22
- Result: inserted row had only nodes 12-22

Fix: Detect final batch with len(rows) < limit, then yield ALL groups
including the final one instead of buffering it.

Changes:
- Detect final batch early: is_final_batch = len(rows) < limit
- If final batch: yield current_group even if no key change follows
- If NOT final batch: buffer last group for continuity (original logic)

Now all nodes from all groups are properly consolidated.

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-26 00:03:45 +01:00
parent 3687f77911
commit 4277dd8d2c

View File

@@ -422,6 +422,7 @@ class MySQLConnector:
# Group rows by consolidation key (UnitName, ToolNameID, EventDate, EventTime) # Group rows by consolidation key (UnitName, ToolNameID, EventDate, EventTime)
current_group = [] current_group = []
last_key = None last_key = None
is_final_batch = len(rows) < limit # True if this is the last batch
for row in sorted_rows: for row in sorted_rows:
key = ( key = (
@@ -440,24 +441,23 @@ class MySQLConnector:
current_group.append(row) current_group.append(row)
last_key = key last_key = key
# At end of batch: check if final group should be buffered # At end of batch: handle the final group
# If next rows might exist (got full limit rows), buffer the last group if is_final_batch:
if len(rows) == limit and last_key is not None: # This is the last batch - yield all remaining groups
# Buffer incomplete group at boundary for next batch
buffered_group = current_group
last_buffered_key = last_key
logger.debug(
f"Buffering group at boundary: key={last_key}, "
f"rows_in_group={len(current_group)}, total_rows_fetched={len(rows)}"
)
else:
# This is the last batch, yield final group
if current_group: if current_group:
logger.debug( logger.debug(
f"Final batch: yielding group key={last_key}, " f"Final batch: yielding group key={last_key}, "
f"rows_in_group={len(current_group)}, total_rows_fetched={len(rows)}" f"rows_in_group={len(current_group)}, total_rows_fetched={len(rows)}"
) )
yield current_group yield current_group
else:
# More rows might exist - buffer the last group for next batch
buffered_group = current_group
last_buffered_key = last_key
logger.debug(
f"Buffering group at boundary: key={last_key}, "
f"rows_in_group={len(current_group)}, total_rows_fetched={len(rows)}"
)
last_id = rows[-1][id_column] last_id = rows[-1][id_column]
break # Success, exit retry loop break # Success, exit retry loop