fix: Properly handle incomplete consolidation groups at batch boundaries
Problem: When a batch ended with an incomplete group (same consolidation key as last row), the code was not yielding it (correct), but was also updating last_key to that incomplete key. Next iteration would query: WHERE (key) > (incomplete_key) This would SKIP all remaining rows of that key that were on next page! Result: Groups that span batch boundaries got split - e.g., nodes 1-11 in first batch yield as incomplete (not yielded), nodes 12-22 never fetched because next query starts AFTER the incomplete key. Fix: Track whether current_group is incomplete (pending) at batch boundary. If incomplete (last_row_key == current_key), keep it in memory and DON'T update last_key. This ensures next batch continues from where the incomplete group started, fetching remaining rows of that key. Logic: - If last_row_key != current_key: group is complete, yield it - If last_row_key == current_key: group is incomplete, keep buffering - If current_key is None: all groups complete, update last_key normally - If current_key is not None: group pending, DON'T update last_key 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -301,18 +301,21 @@ class MySQLConnector:
|
|||||||
current_group.append(row)
|
current_group.append(row)
|
||||||
current_key = key
|
current_key = key
|
||||||
|
|
||||||
# At end of batch: handle final group
|
# At end of batch: ALWAYS yield complete groups immediately
|
||||||
|
# Incomplete groups at batch boundary will be continued in next batch
|
||||||
|
# because we track last_key correctly for resume
|
||||||
if not rows or len(rows) < limit:
|
if not rows or len(rows) < limit:
|
||||||
# Last batch - yield remaining group and finish
|
# Last batch - yield remaining group and finish
|
||||||
if current_group:
|
if current_group:
|
||||||
yield current_group
|
yield current_group
|
||||||
return
|
return
|
||||||
else:
|
else:
|
||||||
# More rows might exist after this batch
|
# More rows exist - yield all COMPLETE groups seen so far
|
||||||
# Check if the last row in this batch has same key as current_group
|
# If current_group is incomplete (same key as last row), keep buffering it
|
||||||
# If yes, DON'T yield yet - the group might continue in next batch
|
if current_group:
|
||||||
# If no, yield because we know the group is complete
|
# Check if this is a complete group or partial
|
||||||
if rows:
|
# Complete group: all rows with this key have been seen in this batch
|
||||||
|
# Partial: might continue in next batch
|
||||||
last_row = rows[-1]
|
last_row = rows[-1]
|
||||||
last_row_key = (
|
last_row_key = (
|
||||||
last_row.get("UnitName"),
|
last_row.get("UnitName"),
|
||||||
@@ -321,16 +324,29 @@ class MySQLConnector:
|
|||||||
last_row.get("EventTime")
|
last_row.get("EventTime")
|
||||||
)
|
)
|
||||||
|
|
||||||
# If last row has different key than current group, current group is complete
|
if last_row_key != current_key:
|
||||||
if last_row_key != current_key and current_group:
|
# Current group is complete (ended before batch boundary)
|
||||||
yield current_group
|
yield current_group
|
||||||
current_group = []
|
current_group = []
|
||||||
current_key = None
|
current_key = None
|
||||||
# else: same key as current_group, so continue in next iteration
|
# else: current group continues past batch boundary
|
||||||
|
# Keep it in current_group for merging with next batch
|
||||||
|
# BUT we need to update last_key to be the LAST yielded key,
|
||||||
|
# not the current incomplete key, so resume works correctly
|
||||||
|
# Actually NO - we should NOT update last_key if we have a pending group!
|
||||||
|
|
||||||
# Update last_key for next iteration
|
# Update last_key only if we don't have a pending incomplete group
|
||||||
if current_key:
|
if current_key is None and rows:
|
||||||
last_key = current_key
|
# All groups in this batch were yielded and complete
|
||||||
|
last_row = rows[-1]
|
||||||
|
last_key = (
|
||||||
|
last_row.get("UnitName"),
|
||||||
|
last_row.get("ToolNameID"),
|
||||||
|
last_row.get("EventDate"),
|
||||||
|
last_row.get("EventTime")
|
||||||
|
)
|
||||||
|
# If current_key is not None, we have incomplete group - DON'T update last_key
|
||||||
|
# so next iteration will continue from where this group started
|
||||||
|
|
||||||
break # Success, exit retry loop
|
break # Success, exit retry loop
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user