fix: Properly handle incomplete consolidation groups at batch boundaries

Problem: When a batch ended with an incomplete group (same consolidation key
as last row), the code was not yielding it (correct), but was also updating
last_key to that incomplete key. Next iteration would query:
  WHERE (key) > (incomplete_key)
This would SKIP all remaining rows of that key that were on next page!

Result: Groups that span batch boundaries got split - e.g., nodes 1-11 in
first batch yield as incomplete (not yielded), nodes 12-22 never fetched
because next query starts AFTER the incomplete key.

Fix: Track whether current_group is incomplete (pending) at batch boundary.
If incomplete (last_row_key == current_key), keep it in memory and DON'T
update last_key. This ensures next batch continues from where the incomplete
group started, fetching remaining rows of that key.

Logic:
- If last_row_key != current_key: group is complete, yield it
- If last_row_key == current_key: group is incomplete, keep buffering
- If current_key is None: all groups complete, update last_key normally
- If current_key is not None: group pending, DON'T update last_key

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-27 09:54:42 +01:00
parent 3576a8f354
commit 418da67857

View File

@@ -301,18 +301,21 @@ class MySQLConnector:
current_group.append(row) current_group.append(row)
current_key = key current_key = key
# At end of batch: handle final group # At end of batch: ALWAYS yield complete groups immediately
# Incomplete groups at batch boundary will be continued in next batch
# because we track last_key correctly for resume
if not rows or len(rows) < limit: if not rows or len(rows) < limit:
# Last batch - yield remaining group and finish # Last batch - yield remaining group and finish
if current_group: if current_group:
yield current_group yield current_group
return return
else: else:
# More rows might exist after this batch # More rows exist - yield all COMPLETE groups seen so far
# Check if the last row in this batch has same key as current_group # If current_group is incomplete (same key as last row), keep buffering it
# If yes, DON'T yield yet - the group might continue in next batch if current_group:
# If no, yield because we know the group is complete # Check if this is a complete group or partial
if rows: # Complete group: all rows with this key have been seen in this batch
# Partial: might continue in next batch
last_row = rows[-1] last_row = rows[-1]
last_row_key = ( last_row_key = (
last_row.get("UnitName"), last_row.get("UnitName"),
@@ -321,16 +324,29 @@ class MySQLConnector:
last_row.get("EventTime") last_row.get("EventTime")
) )
# If last row has different key than current group, current group is complete if last_row_key != current_key:
if last_row_key != current_key and current_group: # Current group is complete (ended before batch boundary)
yield current_group yield current_group
current_group = [] current_group = []
current_key = None current_key = None
# else: same key as current_group, so continue in next iteration # else: current group continues past batch boundary
# Keep it in current_group for merging with next batch
# BUT we need to update last_key to be the LAST yielded key,
# not the current incomplete key, so resume works correctly
# Actually NO - we should NOT update last_key if we have a pending group!
# Update last_key for next iteration # Update last_key only if we don't have a pending incomplete group
if current_key: if current_key is None and rows:
last_key = current_key # All groups in this batch were yielded and complete
last_row = rows[-1]
last_key = (
last_row.get("UnitName"),
last_row.get("ToolNameID"),
last_row.get("EventDate"),
last_row.get("EventTime")
)
# If current_key is not None, we have incomplete group - DON'T update last_key
# so next iteration will continue from where this group started
break # Success, exit retry loop break # Success, exit retry loop