docs: Add detailed example workflow

2025-12-10 19:59:22 +01:00
parent 38c6b4c6d8
commit 8e705e33da
1 changed files with 400 additions and 0 deletions
--- a/EXAMPLE_WORKFLOW.md
+++ b/EXAMPLE_WORKFLOW.md
@@ -0,0 +1,400 @@
+# Example Complete Workflow
+
+Un esempio passo-passo di come usare il tool per una migrazione completa.
+
+## Scenario
+
+Devi migrare il database MySQL `production_db` a PostgreSQL, incluso setup del container Incus e test di performance.
+
+### Dati
+- RAWDATACOR: ~50 milioni di righe (5 anni di dati)
+- ELABDATADISP: ~25 milioni di righe
+
+### Timeline
+- Tempo disponibile: 2 ore per migrazione
+- Test: 30 minuti
+- Validazione: 30 minuti
+
+## Step-by-Step Workflow
+
+### 1. Preparazione (10 min)
+
+```bash
+# Clonare il progetto
+cd /home/user/projects
+git clone <repo> mysql2postgres
+cd mysql2postgres
+
+# Setup Python
+./install.sh
+source venv/bin/activate
+
+# Verificare Python
+python --version
+# Output: Python 3.10.x
+```
+
+### 2. Setup Container PostgreSQL (10 min)
+
+```bash
+# Creare e configurare container Incus
+bash scripts/incus_setup.sh pg-prod password123
+
+# Output:
+# ✓ PostgreSQL is running!
+#
+# Connection details:
+#   Host: 10.100.50.123
+#   Port: 5432
+#   User: postgres
+#   Password: password123
+#
+# Update .env file with:
+#   POSTGRES_HOST=10.100.50.123
+#   POSTGRES_PASSWORD=password123
+```
+
+### 3. Configurazione (5 min)
+
+```bash
+# Copiare template e editare
+cp .env.example .env
+nano .env
+
+# Configurazione finale:
+# MYSQL_HOST=db.production.com
+# MYSQL_PORT=3306
+# MYSQL_USER=migration_user
+# MYSQL_PASSWORD=secure_password
+# MYSQL_DATABASE=production_db
+#
+# POSTGRES_HOST=10.100.50.123
+# POSTGRES_PORT=5432
+# POSTGRES_USER=postgres
+# POSTGRES_PASSWORD=password123
+# POSTGRES_DATABASE=production_migrated
+#
+# BATCH_SIZE=50000        # Large batches for speed
+# LOG_LEVEL=INFO
+```
+
+### 4. Verifica Configurazione (5 min)
+
+```bash
+python main.py info
+
+# Output:
+# [MySQL Configuration]
+#   Host: db.production.com:3306
+#   Database: production_db
+#   User: migration_user
+#
+# [PostgreSQL Configuration]
+#   Host: 10.100.50.123:5432
+#   Database: production_migrated
+#   User: postgres
+#
+# [Migration Settings]
+#   Batch Size: 50000
+#   Log Level: INFO
+#   Dry Run: False
+```
+
+### 5. Setup Schema PostgreSQL (5 min)
+
+```bash
+# Creare schema con partizioni e indici
+python main.py setup --create-schema
+
+# Output:
+# Connected to PostgreSQL: 10.100.50.123:5432/production_migrated
+# Creating PostgreSQL schema...
+# ✓ Schema creation complete
+# ✓ PostgreSQL schema created successfully
+```
+
+### 6. Verifica Connessione MySQL (2 min)
+
+```bash
+# Fare una query test MySQL
+python -c "
+from src.connectors.mysql_connector import MySQLConnector
+with MySQLConnector() as conn:
+    count = conn.get_row_count('RAWDATACOR')
+    print(f'RAWDATACOR: {count:,} rows')
+    count = conn.get_row_count('ELABDATADISP')
+    print(f'ELABDATADISP: {count:,} rows')
+"
+
+# Output:
+# RAWDATACOR: 50,234,567 rows
+# ELABDATADISP: 25,789,123 rows
+```
+
+### 7. Dry-Run Migration (Opzionale, 10 min)
+
+```bash
+# Testare senza modificare i dati
+python main.py migrate full --dry-run
+
+# Output:
+# [DRY RUN] Would migrate all rows
+# ✓ Migration complete: 50234567 rows migrated to rawdatacor
+# ✓ RAWDATACOR: 50234567 rows migrated
+# [DRY RUN] Would migrate all rows
+# ✓ Migration complete: 25789123 rows migrated to elabdatadisp
+# ✓ ELABDATADISP: 25789123 rows migrated
+# ✓ Full migration complete: 76023690 total rows migrated
+```
+
+### 8. Migrazione Completa (60 min per dati grandi)
+
+```bash
+# Lanciare la migrazione vera
+python main.py migrate full
+
+# Output:
+# Migrating RAWDATACOR...
+# Migrating RAWDATACOR ██████████████░░░░░░░░░░░░░░░░░░░░░░░░ 35% 00:42:15
+# ✓ RAWDATACOR: 50234567 rows migrated
+#
+# Migrating ELABDATADISP...
+# Migrating ELABDATADISP ████████████████████████████████░░░░░░░░░░ 75% 00:15:30
+# ✓ ELABDATADISP: 25789123 rows migrated
+#
+# ✓ Full migration complete: 76023690 total rows migrated
+
+# ⏱ Timing per RAWDATACOR:
+#   - 50M rows
+#   - 50k batch size = 1000 transazioni
+#   - ~3 sec per batch
+#   - Total: ~50 minuti
+```
+
+### 9. Validazione Dati (15 min)
+
+```bash
+# Connettere a PostgreSQL e validare
+psql -h 10.100.50.123 -U postgres -d production_migrated
+
+# SQL validation queries
+postgres=# SELECT COUNT(*) FROM rawdatacor;
+# count: 50234567
+
+postgres=# SELECT COUNT(*) FROM elabdatadisp;
+# count: 25789123
+
+postgres=# SELECT event_date, COUNT(*) FROM rawdatacor GROUP BY event_date ORDER BY event_date LIMIT 10;
+# Verificare distribuzione date
+
+postgres=# SELECT measurements FROM rawdatacor WHERE measurements IS NOT NULL LIMIT 1 \gx
+# Verificare struttura JSONB
+
+# Esecuzione script validazione
+\i scripts/validate_migration.sql
+
+# Controllare:
+# - Row counts match
+# - No NULL measurements
+# - Date ranges correct
+# - Indexes created
+# - Partitions exist
+```
+
+### 10. Benchmark Performance (30 min)
+
+```bash
+# Eseguire benchmark con 10 iterazioni
+python main.py benchmark --iterations 10 --output production_benchmark.json
+
+# Output example:
+# Running performance benchmarks...
+#
+# [RAWDATACOR]
+#   select_by_pk:
+#     MySQL: 0.45ms (min: 0.35ms, max: 0.65ms)
+#     PostgreSQL: 0.32ms (min: 0.28ms, max: 0.38ms)
+#     ✓ PostgreSQL is 1.4x faster
+#
+#   select_by_date_range:
+#     MySQL: 125.50ms (min: 120.00ms, max: 135.00ms)
+#     PostgreSQL: 45.20ms (min: 42.00ms, max: 50.00ms)
+#     ✓ PostgreSQL is 2.8x faster
+#
+#   jsonb_filter_value:
+#     PostgreSQL: 32.10ms (min: 28.00ms, max: 38.00ms)
+#     (MySQL non supporta JSONB)
+#
+# [ELABDATADISP]
+#   ...
+#
+# ✓ Benchmark complete: results saved to benchmark_results/benchmark_20240115_143022.json
+```
+
+### 11. Analizzare Risultati Benchmark
+
+```bash
+# Visualizzare JSON results
+cat production_benchmark.json | python -m json.tool
+
+# Risultati attesi:
+# - SELECT semplici: PostgreSQL 2-3x più veloce
+# - Range query: PostgreSQL 2-4x più veloce
+# - JSONB query: Solo PostgreSQL (non disponibili in MySQL)
+# - Aggregazioni: PostgreSQL simile o migliore
+
+# Interpretazione:
+# - RAWDATACOR con JSONB: beneficio 30-50% (meno colonne in storage)
+# - ELABDATADISP con JSONB: beneficio 20-30% (compressione dei NULL)
+```
+
+### 12. Setup Migrazioni Incrementali (5 min)
+
+```bash
+# Configurare cron per sincronizzazione periodica
+bash scripts/setup_cron.sh
+
+# Quando viene chiesto se aggiungere il cron:
+# Aggiunge:
+# 0 */6 * * * cd /path/to/mysql2postgres && python main.py migrate incremental >> migration_*.log 2>&1
+#
+# Risultato: esecuzione ogni 6 ore
+
+# Verificare
+crontab -l | grep migrate
+# Output: 0 */6 * * * cd /home/user/projects/mysql2postgres && ...
+```
+
+### 13. Primo Test Migrazioni Incrementali (5 min)
+
+```bash
+# Simulare cambiamenti in MySQL
+# (aggiungere alcuni record nuovi)
+
+# Poi eseguire migrazione incrementale
+python main.py migrate incremental
+
+# Output:
+# Incremental migration for RAWDATACOR...
+# ℹ RAWDATACOR: 0 rows migrated
+#
+# Incremental migration for ELABDATADISP...
+# ℹ ELABDATADISP: 0 rows migrated
+#
+# ℹ No rows to migrate
+
+# Oppure se ci sono dati:
+# Incremental migration for RAWDATACOR...
+# ✓ RAWDATACOR: 1234 rows migrated
+#
+# ✓ Incremental migration complete: 1234 total rows migrated
+```
+
+## Risultati Attesi
+
+### Tempo Totale
+- Setup: 30 minuti
+- Migrazione: 60-90 minuti (dipende da dimensione)
+- Test: 30 minuti
+- **Totale: 2-3 ore**
+
+### Performance Gains
+- Storage: 10-20% riduzione (JSONB compressione)
+- Query tempo: 2-4x più veloce su PostgreSQL
+- Indici: più efficienti con JSONB GIN
+
+### Validazione
+- Row counts: Match 100%
+- Date ranges: Completi
+- JSONB structure: Valida
+- Indexes: Tutti creati
+- Partitions: Funzionanti
+
+## Troubleshooting Durante Workflow
+
+### Errore: "Cannot connect to MySQL"
+```bash
+# Verificare credenziali
+mysql -h db.production.com -u migration_user -p -e "SELECT 1" < /dev/null
+
+# Verificare firewall
+nc -zv db.production.com 3306
+
+# Controllare .env
+grep MYSQL .env
+```
+
+### Errore: "Schema creation failed"
+```bash
+# Verificare PostgreSQL online
+psql -h 10.100.50.123 -U postgres -c "SELECT version()"
+
+# Recreate schema
+python main.py setup --create-schema
+```
+
+### Migrazione molto lenta
+```bash
+# Aumentare batch size temporaneamente
+# Editare .env: BATCH_SIZE=100000
+
+# Oppure verificare:
+# - Latency rete MySQL↔PostgreSQL
+# - CPU/Memoria su entrambi i server
+# - Disk I/O disponibile
+```
+
+## Monitoraggio Durante Migrazione
+
+```bash
+# In un altro terminale, monitorare progresso
+watch "ps aux | grep -i 'python main.py'"
+
+# Oppure monitorare database
+# PostgreSQL
+psql -h 10.100.50.123 -U postgres -d production_migrated \
+  -c "SELECT COUNT(*) FROM rawdatacor"
+
+# MySQL
+mysql -h db.production.com -u migration_user -p production_db \
+  -e "SELECT COUNT(*) FROM RAWDATACOR"
+```
+
+## Post-Migration Checklist
+
+- [ ] Row counts match MySQL e PostgreSQL
+- [ ] Date ranges sono completi
+- [ ] JSONB structure è valido
+- [ ] Indici sono stati creati
+- [ ] Query critiche funzionano su PostgreSQL
+- [ ] Benchmark mostra improvement
+- [ ] Cron job per incremental è configurato
+- [ ] Backups dei dati migration sono salvati
+- [ ] Log di migrazione sono archiviati
+
+## Documenti da Archiviare
+
+```bash
+# Creare directory backup
+mkdir -p backups/migration_$(date +%Y%m%d)
+
+# Salvare:
+# - benchmark results
+# - migration logs
+# - .env file (con password rimossa)
+# - validation output
+# - timing reports
+
+cp production_benchmark.json backups/migration_*/
+cp *.log backups/migration_*/ 2>/dev/null || true
+```
+
+## Successo!
+
+Quando tutto è completato:
+
+✓ Dati migrati completamente
+✓ Performance validata (PostgreSQL più veloce)
+✓ JSONB schema funzionante
+✓ Migrazioni incrementali configurate
+✓ Pronto per switchover produzione