fix: Order rows by consolidation key to keep related nodes together in batches

When fetching rows for consolidation, the original keyset pagination only ordered by id, which caused nodes from the same (unit, tool, timestamp) to be split across multiple batches. This resulted in incomplete consolidation, with some nodes being missed. Solution: Order by consolidation columns in addition to id: - Primary: id (for keyset pagination) - Secondary: UnitName, ToolNameID, EventDate, EventTime, NodeNum This ensures all nodes with the same (unit, tool, timestamp) are grouped together in the same batch, allowing proper consolidation within the batch. Fixes: Nodes being lost during ELABDATADISP consolidation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-25 19:32:52 +01:00
parent 648bd98a09
commit 9cc12abe11
1 changed files with 6 additions and 2 deletions
--- a/src/connectors/mysql_connector.py
+++ b/src/connectors/mysql_connector.py
@@ -257,11 +257,15 @@ class MySQLConnector:
                    with self.connection.cursor() as cursor:
                        # Use keyset pagination: fetch by id > last_id
                        # This is much more efficient than OFFSET for large tables
+                        # Order by id first for pagination, then by consolidation key to keep
+                        # related nodes together in the same batch
+                        order_clause = f"`{id_column}` ASC, `UnitName` ASC, `ToolNameID` ASC, `EventDate` ASC, `EventTime` ASC, `NodeNum` ASC"
+
                        if last_id is None:
-                            query = f"SELECT * FROM `{table}` ORDER BY `{id_column}` ASC LIMIT %s"
+                            query = f"SELECT * FROM `{table}` ORDER BY {order_clause} LIMIT %s"
                            cursor.execute(query, (batch_size,))
                        else:
-                            query = f"SELECT * FROM `{table}` WHERE `{id_column}` > %s ORDER BY `{id_column}` ASC LIMIT %s"
+                            query = f"SELECT * FROM `{table}` WHERE `{id_column}` > %s ORDER BY {order_clause} LIMIT %s"
                            cursor.execute(query, (last_id, batch_size))

                        rows = cursor.fetchall()