clean docs

This commit is contained in:
2025-12-30 15:24:19 +01:00
parent 5c9df3d06f
commit 5f6e3215a5
6 changed files with 492 additions and 159 deletions

78
scripts/README.md Normal file
View File

@@ -0,0 +1,78 @@
# Migration Scripts
Utility scripts per la gestione della migrazione.
## sync_migration_state.py
Sincronizza la tabella `migration_state` con i dati effettivamente presenti in PostgreSQL.
### Quando usare
Usa questo script quando `migration_state` non è sincronizzato con i dati reali, ad esempio:
- Dopo inserimenti manuali in PostgreSQL
- Dopo corruzione dello stato
- Prima di eseguire migrazione incrementale su dati già esistenti
### Come funziona
Per ogni tabella (rawdatacor, elabdatadisp):
1. Trova la riga con MAX(created_at) - l'ultima riga inserita
2. Estrae la consolidation key da quella riga
3. Aggiorna `migration_state._global` con quella chiave
### Utilizzo
```bash
# Eseguire dalla root del progetto
python scripts/sync_migration_state.py
```
### Output
```
Syncing migration_state with actual PostgreSQL data...
================================================================================
ELABDATADISP:
Most recently inserted row (by created_at):
created_at: 2025-12-30 11:58:24
event_timestamp: 2025-12-30 14:58:24
Consolidation key: (ID0290, DT0007, 2025-12-30, 14:58:24)
✓ Updated migration_state with this key
RAWDATACOR:
Most recently inserted row (by created_at):
created_at: 2025-12-30 11:13:29
event_timestamp: 2025-12-30 11:11:39
Consolidation key: (ID0304, DT0024, 2025-12-30, 11:11:39)
✓ Updated migration_state with this key
================================================================================
✓ Done! Incremental migration will now start from the correct position.
```
### Effetti
Dopo aver eseguito questo script:
- `migration_state._global` sarà aggiornato con l'ultima chiave migrata
- `python main.py migrate incremental` partirà dalla posizione corretta
- Non verranno create duplicazioni (usa ON CONFLICT DO NOTHING)
### Avvertenze
- Esclude automaticamente dati corrotti (unit_name come `[Ljava.lang.String;@...`)
- Usa `created_at` per trovare l'ultima riga inserita (non `event_timestamp`)
- Sovrascrive lo stato globale esistente
### Verifica
Dopo aver eseguito lo script, verifica lo stato:
```sql
SELECT table_name, partition_name, last_key
FROM migration_state
WHERE partition_name = '_global'
ORDER BY table_name;
```
Dovrebbe mostrare le chiavi più recenti per entrambe le tabelle.

63
scripts/sync_migration_state.py Executable file
View File

@@ -0,0 +1,63 @@
#!/usr/bin/env python3
"""Sync migration_state with actual data in PostgreSQL tables."""
import sys
sys.path.insert(0, '/home/alex/devel/mysql2postgres')
from src.connectors.postgres_connector import PostgreSQLConnector
from src.migrator.state_manager import StateManager
def sync_table_state(table_name: str):
"""Sync migration_state for a table with its actual data."""
with PostgreSQLConnector() as pg_conn:
cursor = pg_conn.connection.cursor()
# Find the row with MAX(created_at) - most recently inserted
# Exclude corrupted data (Java strings)
cursor.execute(f"""
SELECT unit_name, tool_name_id,
DATE(event_timestamp)::text as event_date,
event_timestamp::time::text as event_time,
created_at,
event_timestamp
FROM {table_name}
WHERE unit_name NOT LIKE '[L%' -- Exclude corrupted Java strings
ORDER BY created_at DESC
LIMIT 1
""")
result = cursor.fetchone()
if not result:
print(f"No data found in {table_name}")
return
unit_name, tool_name_id, event_date, event_time, created_at, event_timestamp = result
print(f"\n{table_name.upper()}:")
print(f" Most recently inserted row (by created_at):")
print(f" created_at: {created_at}")
print(f" event_timestamp: {event_timestamp}")
print(f" Consolidation key: ({unit_name}, {tool_name_id}, {event_date}, {event_time})")
# Update global migration_state with this key
state_mgr = StateManager(pg_conn, table_name, partition_name="_global")
last_key = {
"unit_name": unit_name,
"tool_name_id": tool_name_id,
"event_date": event_date,
"event_time": event_time
}
state_mgr.update_state(last_key=last_key)
print(f" ✓ Updated migration_state with this key")
if __name__ == "__main__":
print("Syncing migration_state with actual PostgreSQL data...")
print("="*80)
sync_table_state("elabdatadisp")
sync_table_state("rawdatacor")
print("\n" + "="*80)
print("✓ Done! Incremental migration will now start from the correct position.")