fix: CRITICAL - Don't prematurely yield incomplete groups at batch boundaries

Bug: When batch limit was reached (len(rows) >= limit), code was yielding the
current_group immediately, even if it was incomplete. This caused groups that
spanned multiple batches to be split.

Example:
- First batch contains UnitA nodes 1-11 with same consolidation key
- Code yields them as complete group before seeing nodes 12-22 in next batch
- Next batch starts with different key, so incomplete group is never merged
- Result: 11 separate rows instead of 1 consolidated row

Root cause: Not checking if the group might continue in the next batch

Fix: Before yielding at batch boundary, check if the LAST row in current batch
has the SAME consolidation key as the current_group:
- If YES (last_row_key == current_key): DON'T yield yet, keep buffering
- If NO (last_row_key != current_key): Yield, group is definitely complete

This ensures groups that span batch boundaries are kept together and fully
consolidated.

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-26 20:44:40 +01:00
parent 0e52f72dbe
commit f2b6049608

View File

@@ -308,10 +308,25 @@ class MySQLConnector:
yield current_group yield current_group
return return
else: else:
# More rows might exist - yield the last group only if key changed # More rows might exist after this batch
# If not, it will be continued/merged in next iteration # Check if the last row in this batch has same key as current_group
if current_group: # If yes, DON'T yield yet - the group might continue in next batch
yield current_group # If no, yield because we know the group is complete
if rows:
last_row = rows[-1]
last_row_key = (
last_row.get("UnitName"),
last_row.get("ToolNameID"),
last_row.get("EventDate"),
last_row.get("EventTime")
)
# If last row has different key than current group, current group is complete
if last_row_key != current_key and current_group:
yield current_group
current_group = []
current_key = None
# else: same key as current_group, so continue in next iteration
# Update last_key for next iteration # Update last_key for next iteration
if current_key: if current_key: