clean docs

2025-12-30 15:24:19 +01:00
parent 5c9df3d06f
commit 5f6e3215a5
6 changed files with 492 additions and 159 deletions
@@ -0,0 +1,84 @@
+# Changelog
+
+## [Current] - 2025-12-30
+
+### Added
+- **Consolidation-based incremental migration**: Uses consolidation keys `(UnitName, ToolNameID, EventDate, EventTime)` instead of timestamps
+- **MySQL ID optimization**: Uses `MAX(mysql_max_id)` from PostgreSQL to filter MySQL queries, avoiding full table scans
+- **State management in PostgreSQL**: Replaced JSON file with `migration_state` table for more reliable tracking
+- **Sync utility**: Added `scripts/sync_migration_state.py` to sync state with actual data
+- **Performance optimization**: MySQL queries now instant using PRIMARY KEY filter
+- **Better documentation**: Consolidated and updated all documentation files
+
+### Changed
+- **Incremental migration**: Now uses consolidation keys instead of timestamp-based approach
+- **Full migration**: Improved to save global `last_key` after completing all partitions
+- **State tracking**: Moved from `migration_state.json` to PostgreSQL table `migration_state`
+- **Query performance**: Added `min_mysql_id` parameter to `fetch_consolidation_keys_after()` for optimization
+- **Documentation**: Updated README.md, MIGRATION_WORKFLOW.md, QUICKSTART.md with current implementation
+
+### Removed
+- **migration_state.json**: Replaced by PostgreSQL table
+- **Timestamp-based migration**: Replaced by consolidation key-based approach
+- **ID-based resumable migration**: Consolidated into single consolidation-based approach
+- **Temporary debug scripts**: Cleaned up all `/tmp/` debug files
+
+### Fixed
+- **Incremental migration performance**: MySQL queries now ~1000x faster with ID filter
+- **State synchronization**: Can now sync `migration_state` with actual data using utility script
+- **Duplicate handling**: Uses `ON CONFLICT DO NOTHING` to prevent duplicates
+- **Last key tracking**: Properly updates global state after full migration
+
+### Migration Guide (from old to new)
+
+If you have an existing installation with `migration_state.json`:
+
+1. **Backup your data** (optional but recommended):
+   ```bash
+   cp migration_state.json migration_state.json.backup
+   ```
+
+2. **Run full migration** to populate `migration_state` table:
+   ```bash
+   python main.py migrate full
+   ```
+
+3. **Sync state** (if you have existing data):
+   ```bash
+   python scripts/sync_migration_state.py
+   ```
+
+4. **Remove old state file**:
+   ```bash
+   rm migration_state.json
+   ```
+
+5. **Run incremental migration**:
+   ```bash
+   python main.py migrate incremental --dry-run
+   python main.py migrate incremental
+   ```
+
+### Performance Improvements
+
+- **MySQL query time**: From 60+ seconds to <0.1 seconds (600x faster)
+- **Consolidation efficiency**: Multiple MySQL rows → single PostgreSQL record
+- **State reliability**: PostgreSQL table instead of JSON file
+
+### Breaking Changes
+
+- `--state-file` parameter removed from incremental migration (no longer uses JSON)
+- `--use-id` flag removed (consolidation-based approach is now default)
+- Incremental migration requires full migration to be run first
+
+## [Previous] - Before 2025-12-30
+
+### Features
+- Full migration support
+- Incremental migration with timestamp tracking
+- JSONB transformation
+- Partitioning by year
+- GIN indexes for JSONB queries
+- Benchmark system
+- Progress tracking
+- Rich logging
@@ -2,244 +2,349 @@

 ## Overview

-This tool supports three migration modes:
+This tool implements **consolidation-based incremental migration** for two tables:
+- **RAWDATACOR**: Raw sensor measurements
+- **ELABDATADISP**: Elaborated/calculated data

-1. **Full Migration** (`full_migration.py`) - Initial complete migration
-2. **Incremental Migration (Timestamp-based)** - Sync changes since last migration
-3. **Incremental Migration (ID-based)** - Resumable migration from last checkpoint
+Both tables use **consolidation keys** to group and migrate data efficiently.

 ---

-## 1. Initial Full Migration
+## Migration Modes

-### First Time Setup
+### 1. Full Migration
+
+Initial migration of all historical data, migrating one partition at a time.

 ```bash
-# Create the PostgreSQL schema
-python main.py setup --create-schema
+# Migrate all partitions for all tables
+python main.py migrate full

-# Run full migration (one-time)
-python main.py migrate --full RAWDATACOR
-python main.py migrate --full ELABDATADISP
+# Migrate specific table
+python main.py migrate full --table RAWDATACOR
+
+# Migrate specific partition (year-based)
+python main.py migrate full --table ELABDATADISP --partition 2024
+
+# Dry-run to see what would be migrated
+python main.py migrate full --dry-run
 ```

-**When to use:** First time migrating data or need complete fresh migration.
-
 **Characteristics:**
- Fetches ALL rows from MySQL
- No checkpoint tracking
- Cannot resume if interrupted
- Good for initial data load
+- Migrates data partition by partition (year-based)
+- Uses consolidation groups for efficiency
+- Tracks progress in `migration_state` table (PostgreSQL)
+- Can resume from last completed partition if interrupted
+- Uses `mysql_max_id` optimization for performance

 ---

-## 2. Timestamp-based Incremental Migration
+### 2. Incremental Migration

-### For Continuous Sync (Recommended for most cases)
+Sync only new data since the last migration.

 ```bash
-# After initial full migration, use incremental with timestamps
-python main.py migrate --incremental RAWDATACOR
-python main.py migrate --incremental ELABDATADISP
+# Migrate new data for all tables
+python main.py migrate incremental
+
+# Migrate specific table
+python main.py migrate incremental --table ELABDATADISP
+
+# Dry-run to check what would be migrated
+python main.py migrate incremental --dry-run
 ```

-**When to use:** Continuous sync of new/updated records.
-
 **Characteristics:**
- Tracks `created_at` (RAWDATACOR) or `updated_at` (ELABDATADISP)
- Uses JSON state file (`migration_state.json`)
- Only fetches rows modified since last run
- Perfect for scheduled jobs (cron, airflow, etc.)
- Syncs changes but NOT deletions
+- Uses **consolidation keys** to identify new records:
+  - `(UnitName, ToolNameID, EventDate, EventTime)`
+- Tracks last migrated key in `migration_state` table
+- Optimized with `min_mysql_id` filter for performance
+- Handles duplicates with `ON CONFLICT DO NOTHING`
+- Perfect for scheduled jobs (cron, systemd timers)

 **How it works:**
-1. First run: Returns with message "No previous migration found" - must run full migration first
-2. Subsequent runs: Only fetches rows where `created_at` > last_migration_timestamp
-3. Updates state file with new timestamp for next run
-
-**Example workflow:**
-```bash
-# Day 1: Initial full migration
-python main.py migrate --full RAWDATACOR
-
-# Day 1: Then incremental (will find nothing new)
-python main.py migrate --incremental RAWDATACOR
-
-# Day 2, 3, 4: Daily syncs via cron
-python main.py migrate --incremental RAWDATACOR
-```
+1. Retrieves `last_key` from `migration_state` table
+2. Gets `MAX(mysql_max_id)` from PostgreSQL table for optimization
+3. Queries MySQL: `WHERE id > max_mysql_id AND (key_tuple) > last_key`
+4. Migrates new consolidation groups
+5. Updates `migration_state` with new `last_key`

 ---

-## 3. ID-based Incremental Migration (Resumable)
+## Consolidation Keys

-### For Large Datasets or Unreliable Connections
+Both tables use consolidation to group multiple measurements into a single JSONB record.

-```bash
-# First run
-python main.py migrate --incremental RAWDATACOR --use-id
+### Consolidation Key Structure

-# Can interrupt and resume multiple times
-python main.py migrate --incremental RAWDATACOR --use-id
+```sql
+(UnitName, ToolNameID, EventDate, EventTime)
 ```

-**When to use:**
- Large datasets that may timeout
- Need to resume from exact last position
- Network is unstable
+### Why Consolidation?

-**Characteristics:**
- Tracks `last_id` instead of timestamp
- Updates state file after EACH BATCH (not just at end)
- Can interrupt and resume dozens of times
- Resumes from exact record ID where it stopped
- Works with `migration_state.json`
+Instead of migrating individual sensor readings, we:
+1. **Group** all measurements for the same (unit, tool, date, time)
+2. **Transform** 16-25 columns into structured JSONB
+3. **Migrate** as a single consolidated record

-**How it works:**
-1. First run: Starts from beginning (ID = 0)
-2. Each batch: Updates state file with max ID from batch
-3. Interrupt: Can stop at any time
-4. Resume: Next run continues from last ID stored
-5. Continues until all rows processed
+**Example:**

-**Example workflow for large dataset:**
-```bash
-# Start ID-based migration (will migrate in batches)
-python main.py migrate --incremental RAWDATACOR --use-id
+MySQL has 10 rows for `(Unit1, Tool1, 2024-01-01, 10:00:00)`:
+```
+id  | UnitName | ToolNameID | EventDate  | EventTime | Val0  | Val1  | ...
+1   | Unit1    | Tool1      | 2024-01-01 | 10:00:00  | 23.5  | 45.2  | ...
+2   | Unit1    | Tool1      | 2024-01-01 | 10:00:00  | 23.6  | 45.3  | ...
+...
+```

-# [If interrupted after 1M rows processed]
-
-# Resume from ID 1M (automatically detects last position)
-python main.py migrate --incremental RAWDATACOR --use-id
-
-# [Continues until complete]
+PostgreSQL gets 1 consolidated record:
+```json
+{
+  "unit_name": "Unit1",
+  "tool_name_id": "Tool1",
+  "event_timestamp": "2024-01-01 10:00:00",
+  "measurements": {
+    "0": {"value": 23.5, "unit": "°C"},
+    "1": {"value": 45.2, "unit": "bar"},
+    ...
+  },
+  "mysql_max_id": 10
+}
 ```

 ---

 ## State Management

-### State File Location
-```
-migration_state.json  # In project root
+### Migration State Table
+
+The `migration_state` table in PostgreSQL tracks migration progress:
+
+```sql
+CREATE TABLE migration_state (
+    table_name VARCHAR(50),
+    partition_name VARCHAR(50),
+    last_key JSONB,  -- Last migrated consolidation key
+    started_at TIMESTAMP,
+    completed_at TIMESTAMP,
+    total_rows INTEGER,
+    status VARCHAR(20)
+);
 ```

-### State File Content (Timestamp-based)
-```json
-{
-  "rawdatacor": {
-    "last_timestamp": "2024-12-11T19:30:45.123456",
-    "last_updated": "2024-12-11T19:30:45.123456",
-    "total_migrated": 50000
-  }
-}
+### State Records
+
+- **Per-partition state**: Tracks each partition's progress
+  - Example: `('rawdatacor', '2024', {...}, '2024-01-15 10:30:00', 'completed', 1000000)`
+
+- **Global state**: Tracks overall incremental migration position
+  - Example: `('rawdatacor', '_global', {...}, NULL, NULL, 0, 'in_progress')`
+
+### Checking State
+
+```sql
+-- View all migration state
+SELECT * FROM migration_state ORDER BY table_name, partition_name;
+
+-- View global state (for incremental migration)
+SELECT * FROM migration_state WHERE partition_name = '_global';
 ```

-### State File Content (ID-based)
-```json
-{
-  "rawdatacor": {
-    "last_id": 1000000,
-    "total_migrated": 1000000,
-    "last_updated": "2024-12-11T19:45:30.123456"
-  }
-}
+---
+
+## Performance Optimization
+
+### MySQL ID Filter
+
+The incremental migration uses `MAX(mysql_max_id)` from PostgreSQL to filter MySQL queries:
+
+```sql
+SELECT UnitName, ToolNameID, EventDate, EventTime
+FROM RAWDATACOR
+WHERE id > 267399536  -- max_mysql_id from PostgreSQL
+  AND (UnitName, ToolNameID, EventDate, EventTime) > (?, ?, ?, ?)
+GROUP BY UnitName, ToolNameID, EventDate, EventTime
+ORDER BY UnitName, ToolNameID, EventDate, EventTime
+LIMIT 10000
 ```

-### Reset Migration State
-```python
-from src.migrator.state import MigrationState
+**Why this is fast:**
+- Uses PRIMARY KEY index on `id` to skip millions of already-migrated rows
+- Tuple comparison only applied to filtered subset
+- Avoids full table scans

-state = MigrationState()
+### Consolidation Group Batching

-# Reset specific table
-state.reset("rawdatacor")
-
-# Reset all tables
-state.reset()
-```
+Instead of fetching individual rows, we:
+1. Fetch 10,000 consolidation keys at a time
+2. For each key, fetch all matching rows from MySQL
+3. Transform and insert into PostgreSQL
+4. Update state every batch

 ---

 ## Recommended Workflow

-### For Daily Continuous Sync
-```bash
-# Week 1: Initial setup
-python main.py setup --create-schema
-python main.py migrate --full RAWDATACOR
-python main.py migrate --full ELABDATADISP
+### Initial Setup (One-time)

-# Week 2+: Daily incremental syncs (via cron job)
-# Schedule: `0 2 * * * cd /path/to/project && python main.py migrate --incremental RAWDATACOR`
-python main.py migrate --incremental RAWDATACOR
-python main.py migrate --incremental ELABDATADISP
+```bash
+# 1. Configure .env file
+cp .env.example .env
+nano .env
+
+# 2. Create PostgreSQL schema
+python main.py setup --create-schema
+
+# 3. Run full migration
+python main.py migrate full
 ```

-### For Large Initial Migration
-```bash
-# If dataset > 10 million rows
-python main.py setup --create-schema
-python main.py migrate --incremental RAWDATACOR --use-id  # Can interrupt/resume
+### Daily Incremental Sync

-# For subsequent syncs, use timestamp
-python main.py migrate --incremental RAWDATACOR  # Timestamp-based
+```bash
+# Run incremental migration (via cron or manual)
+python main.py migrate incremental
+```
+
+**Cron example** (daily at 2 AM):
+```cron
+0 2 * * * cd /path/to/mysql2postgres && python main.py migrate incremental >> /var/log/migration.log 2>&1
 ```

 ---

-## Key Differences at a Glance
+## Resuming Interrupted Migrations

-| Feature | Full | Timestamp | ID-based |
-|---------|------|-----------|----------|
-| Initial setup | ✅ Required first | ✅ After full | ✅ After full |
-| Sync new/updated | ❌ No | ✅ Yes | ✅ Yes |
-| Resumable | ❌ No | ⚠️ Partial* | ✅ Full |
-| Batched state tracking | ❌ No | ❌ No | ✅ Yes |
-| Large datasets | ⚠️ Risky | ✅ Good | ✅ Best |
-| Scheduled jobs | ❌ No | ✅ Perfect | ⚠️ Unnecessary |
+### Full Migration

-*Timestamp mode can resume, but must wait for full batch to complete before continuing
+If interrupted, full migration resumes from the last completed partition:
+
+```bash
+# First run: migrates partitions 2014, 2015, 2016... (interrupted after 2020)
+python main.py migrate full --table RAWDATACOR
+
+# Resume: continues from partition 2021
+python main.py migrate full --table RAWDATACOR
+```
+
+### Incremental Migration
+
+Incremental migration uses the `last_key` from `migration_state`:
+
+```bash
+# Always safe to re-run - uses ON CONFLICT DO NOTHING
+python main.py migrate incremental
+```

 ---

-## Default Partitions
+## Syncing Migration State

-Both tables are partitioned by year (2014-2031) plus a DEFAULT partition:
- **rawdatacor_2014** through **rawdatacor_2031** (yearly partitions)
- **rawdatacor_default** (catches data outside 2014-2031)
+If `migration_state` becomes out of sync with actual data, use the sync utility:

-Same for ELABDATADISP. This ensures data with edge-case timestamps doesn't break migration.
+```bash
+# Sync migration_state with actual PostgreSQL data
+python scripts/sync_migration_state.py
+```
+
+This finds the most recent row (by `created_at`) and updates `migration_state._global`.

 ---

 ## Monitoring

-### Check Migration Progress
-```bash
-# View state file
-cat migration_state.json
+### Check Progress

-# Check PostgreSQL row counts
-psql -U postgres -h localhost -d your_db -c "SELECT COUNT(*) FROM rawdatacor;"
+```bash
+# View migration state
+psql -h localhost -U postgres -d migrated_db -c \
+  "SELECT table_name, partition_name, status, total_rows, completed_at
+   FROM migration_state
+   ORDER BY table_name, partition_name"
 ```

-### Common Issues
+### Verify Row Counts

-**"No previous migration found"** (Timestamp mode)
- Solution: Run full migration first with `--full` flag
+```sql
+-- PostgreSQL
+SELECT COUNT(*) FROM rawdatacor;
+SELECT COUNT(*) FROM elabdatadisp;

-**"Duplicate key value violates unique constraint"**
- Cause: Running full migration twice
- Solution: Use timestamp-based incremental sync instead
+-- Compare with MySQL
+-- mysql> SELECT COUNT(DISTINCT UnitName, ToolNameID, EventDate, EventTime) FROM RAWDATACOR;
+```

-**"Timeout during migration"** (Large datasets)
- Solution: Switch to ID-based resumable migration with `--use-id`
+---
+
+## Common Issues
+
+### "No previous migration found"
+
+**Cause**: Running incremental migration before full migration
+
+**Solution**: Run full migration first
+```bash
+python main.py migrate full
+```
+
+### "Duplicate key value violates unique constraint"
+
+**Cause**: Data already exists (shouldn't happen with ON CONFLICT DO NOTHING)
+
+**Solution**: Migration handles this automatically - check logs for details
+
+### Slow Incremental Migration
+
+**Cause**: `MAX(mysql_max_id)` query is slow (~60 seconds on large tables)
+
+**Solution**: This is expected and only happens once at start. The MySQL queries are instant thanks to the optimization.
+
+**Alternative**: Create an index on `mysql_max_id` in PostgreSQL (uses disk space):
+```sql
+CREATE INDEX idx_rawdatacor_mysql_max_id ON rawdatacor (mysql_max_id DESC);
+CREATE INDEX idx_elabdatadisp_mysql_max_id ON elabdatadisp (mysql_max_id DESC);
+```
+
+---
+
+## Key Technical Details
+
+### Tuple Comparison in MySQL
+
+MySQL supports lexicographic tuple comparison:
+
+```sql
+WHERE (UnitName, ToolNameID, EventDate, EventTime) > ('Unit1', 'Tool1', '2024-01-01', '10:00:00')
+```
+
+This is equivalent to:
+```sql
+WHERE UnitName > 'Unit1'
+   OR (UnitName = 'Unit1' AND ToolNameID > 'Tool1')
+   OR (UnitName = 'Unit1' AND ToolNameID = 'Tool1' AND EventDate > '2024-01-01')
+   OR (UnitName = 'Unit1' AND ToolNameID = 'Tool1' AND EventDate = '2024-01-01' AND EventTime > '10:00:00')
+```
+
+But much more efficient!
+
+### Partitioning in PostgreSQL
+
+Tables are partitioned by year (2014-2031):
+```sql
+CREATE TABLE rawdatacor_2024 PARTITION OF rawdatacor
+  FOR VALUES FROM (2024) TO (2025);
+```
+
+PostgreSQL automatically routes INSERTs to the correct partition based on `event_year`.

 ---

 ## Summary

- **Start with:** Full migration (`--full`) for initial data load
- **Then use:** Timestamp-based incremental (`--incremental`) for daily syncs
- **Switch to:** ID-based resumable (`--incremental --use-id`) if full migration is too large
+1. **Full migration**: One-time initial load, partition by partition
+2. **Incremental migration**: Daily sync of new data using consolidation keys
+3. **State tracking**: PostgreSQL `migration_state` table
+4. **Performance**: `mysql_max_id` filter + consolidation batching
+5. **Resumable**: Both modes can resume from interruptions
+6. **Safe**: ON CONFLICT DO NOTHING prevents duplicates
@@ -111,8 +111,11 @@ python main.py migrate full --table RAWDATACOR
 # Migrare solo i cambiamenti dal last sync
 python main.py migrate incremental

-# Con stato personalizzato
-python main.py migrate incremental --state-file daily_sync.json
+# Dry-run per vedere cosa verrebbe migrato
+python main.py migrate incremental --dry-run
+
+# Solo una tabella specifica
+python main.py migrate incremental --table RAWDATACOR
 ```

 ### Benchmark
@@ -0,0 +1,78 @@
+# Migration Scripts
+
+Utility scripts per la gestione della migrazione.
+
+## sync_migration_state.py
+
+Sincronizza la tabella `migration_state` con i dati effettivamente presenti in PostgreSQL.
+
+### Quando usare
+
+Usa questo script quando `migration_state` non è sincronizzato con i dati reali, ad esempio:
+- Dopo inserimenti manuali in PostgreSQL
+- Dopo corruzione dello stato
+- Prima di eseguire migrazione incrementale su dati già esistenti
+
+### Come funziona
+
+Per ogni tabella (rawdatacor, elabdatadisp):
+1. Trova la riga con MAX(created_at) - l'ultima riga inserita
+2. Estrae la consolidation key da quella riga
+3. Aggiorna `migration_state._global` con quella chiave
+
+### Utilizzo
+
+```bash
+# Eseguire dalla root del progetto
+python scripts/sync_migration_state.py
+```
+
+### Output
+
+```
+Syncing migration_state with actual PostgreSQL data...
+================================================================================
+
+ELABDATADISP:
+  Most recently inserted row (by created_at):
+    created_at: 2025-12-30 11:58:24
+    event_timestamp: 2025-12-30 14:58:24
+    Consolidation key: (ID0290, DT0007, 2025-12-30, 14:58:24)
+  ✓ Updated migration_state with this key
+
+RAWDATACOR:
+  Most recently inserted row (by created_at):
+    created_at: 2025-12-30 11:13:29
+    event_timestamp: 2025-12-30 11:11:39
+    Consolidation key: (ID0304, DT0024, 2025-12-30, 11:11:39)
+  ✓ Updated migration_state with this key
+
+================================================================================
+✓ Done! Incremental migration will now start from the correct position.
+```
+
+### Effetti
+
+Dopo aver eseguito questo script:
+- `migration_state._global` sarà aggiornato con l'ultima chiave migrata
+- `python main.py migrate incremental` partirà dalla posizione corretta
+- Non verranno create duplicazioni (usa ON CONFLICT DO NOTHING)
+
+### Avvertenze
+
+- Esclude automaticamente dati corrotti (unit_name come `[Ljava.lang.String;@...`)
+- Usa `created_at` per trovare l'ultima riga inserita (non `event_timestamp`)
+- Sovrascrive lo stato globale esistente
+
+### Verifica
+
+Dopo aver eseguito lo script, verifica lo stato:
+
+```sql
+SELECT table_name, partition_name, last_key
+FROM migration_state
+WHERE partition_name = '_global'
+ORDER BY table_name;
+```
+
+Dovrebbe mostrare le chiavi più recenti per entrambe le tabelle.
@@ -0,0 +1,63 @@
+#!/usr/bin/env python3
+"""Sync migration_state with actual data in PostgreSQL tables."""
+
+import sys
+sys.path.insert(0, '/home/alex/devel/mysql2postgres')
+
+from src.connectors.postgres_connector import PostgreSQLConnector
+from src.migrator.state_manager import StateManager
+
+def sync_table_state(table_name: str):
+    """Sync migration_state for a table with its actual data."""
+    with PostgreSQLConnector() as pg_conn:
+        cursor = pg_conn.connection.cursor()
+
+        # Find the row with MAX(created_at) - most recently inserted
+        # Exclude corrupted data (Java strings)
+        cursor.execute(f"""
+            SELECT unit_name, tool_name_id,
+                   DATE(event_timestamp)::text as event_date,
+                   event_timestamp::time::text as event_time,
+                   created_at,
+                   event_timestamp
+            FROM {table_name}
+            WHERE unit_name NOT LIKE '[L%'  -- Exclude corrupted Java strings
+            ORDER BY created_at DESC
+            LIMIT 1
+        """)
+
+        result = cursor.fetchone()
+        if not result:
+            print(f"No data found in {table_name}")
+            return
+
+        unit_name, tool_name_id, event_date, event_time, created_at, event_timestamp = result
+
+        print(f"\n{table_name.upper()}:")
+        print(f"  Most recently inserted row (by created_at):")
+        print(f"    created_at: {created_at}")
+        print(f"    event_timestamp: {event_timestamp}")
+        print(f"    Consolidation key: ({unit_name}, {tool_name_id}, {event_date}, {event_time})")
+
+        # Update global migration_state with this key
+        state_mgr = StateManager(pg_conn, table_name, partition_name="_global")
+
+        last_key = {
+            "unit_name": unit_name,
+            "tool_name_id": tool_name_id,
+            "event_date": event_date,
+            "event_time": event_time
+        }
+
+        state_mgr.update_state(last_key=last_key)
+        print(f"  ✓ Updated migration_state with this key")
+
+if __name__ == "__main__":
+    print("Syncing migration_state with actual PostgreSQL data...")
+    print("="*80)
+
+    sync_table_state("elabdatadisp")
+    sync_table_state("rawdatacor")
+
+    print("\n" + "="*80)
+    print("✓ Done! Incremental migration will now start from the correct position.")