mysql2postgres/README.md

# MySQL to PostgreSQL Migration Tool

Un tool robusto per la migrazione di database MySQL a PostgreSQL con trasformazione di colonne multiple in JSONB, supporto per partizionamento nativo di PostgreSQL, e sistema completo di benchmark per confrontare le performance.

## Caratteristiche

- **Migrazione Completa**: Trasferimento di tutti i dati da MySQL a PostgreSQL
- **Migrazione Incrementale**: Sincronizzazione periodica basata su timestamp
- **Trasformazione JSONB**: Consolidamento automatico di colonne multiple in campi JSONB
- **Partizionamento**: Supporto per partizioni per anno (2014-2031)
- **Indici Ottimizzati**: GIN indexes per query efficienti su JSONB
- **Progress Tracking**: Barra di avanzamento in tempo reale con ETA
- **Benchmark**: Sistema completo per confrontare performance MySQL vs PostgreSQL
- **Logging**: Logging strutturato con Rich per output colorato
- **Dry-Run Mode**: Modalità test senza modificare i dati

## Setup

### 1. Requisiti
- Python 3.10+
- MySQL 5.7+
- PostgreSQL 13+
- pip

### 2. Installazione

```bash
# Clonare il repository
cd mysql2postgres

# Creare virtual environment
python -m venv venv
source venv/bin/activate  # su Windows: venv\Scripts\activate

# Installare dipendenze
pip install -e .
```

### 3. Configurazione

Copiare `.env.example` a `.env` e configurare i dettagli di connessione:

```bash
cp .env.example .env
```

Modificare `.env` con i tuoi dettagli:

```env
# MySQL Source Database
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_USER=root
MYSQL_PASSWORD=your_password
MYSQL_DATABASE=your_database

# PostgreSQL Target Database (container Incus)
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_password
POSTGRES_DATABASE=migrated_db

# Migration Settings
BATCH_SIZE=10000
LOG_LEVEL=INFO
DRY_RUN=false

# Benchmark Settings
BENCHMARK_ITERATIONS=5
```

## Utilizzo

### Comandi Disponibili

#### Info Configuration
```bash
python main.py info
```
Mostra la configurazione corrente di MySQL e PostgreSQL.

#### Setup Database
```bash
python main.py setup --create-schema
```
Crea lo schema PostgreSQL con:
- Tabelle `rawdatacor` e `elabdatadisp` partizionate per anno
- Indici ottimizzati per JSONB
- Tabella di tracking `migration_state`

#### Migrazione Completa
```bash
# Migrare tutte le tabelle
python main.py migrate full

# Migrare una tabella specifica
python main.py migrate full --table RAWDATACOR

# Modalità dry-run (senza modificare i dati)
python main.py migrate full --dry-run
```

#### Migrazione Incrementale
```bash
# Migrare solo i cambiamenti dal last sync
python main.py migrate incremental

# Per una tabella specifica
python main.py migrate incremental --table ELABDATADISP

# Specificare file di stato personalizzato
python main.py migrate incremental --state-file custom_state.json
```

#### Benchmark Performance
```bash
# Eseguire benchmark con iterations da config (default: 5)
python main.py benchmark

# Benchmark con numero specifico di iterazioni
python main.py benchmark --iterations 10

# Salvare risultati in file specifico
python main.py benchmark --output my_results.json
```

## Trasformazione Dati

### RAWDATACOR

**Da MySQL:**
```
Val0, Val1, ..., ValF (16 colonne)
Val0_unitmisure, Val1_unitmisure, ..., ValF_unitmisure (16 colonne)
```

**A PostgreSQL (JSONB measurements):**
```json
{
  "0": {"value": "123.45", "unit": "°C"},
  "1": {"value": "67.89", "unit": "bar"},
  ...
  "F": {"value": "11.22", "unit": "m/s"}
}
```

### ELABDATADISP

**Da MySQL:** 25+ colonne di misure e calcoli

**A PostgreSQL (JSONB measurements):**
```json
{
  "shifts": {
    "x": 1.234567, "y": 2.345678, "z": 3.456789,
    "h": 4.567890, "h_dir": 5.678901, "h_local": 6.789012
  },
  "coordinates": {
    "x": 10.123456, "y": 20.234567, "z": 30.345678,
    "x_star": 40.456789, "z_star": 50.567890
  },
  "kinematics": {
    "speed": 1.111111, "speed_local": 2.222222,
    "acceleration": 3.333333, "acceleration_local": 4.444444
  },
  "sensors": {
    "t_node": 25.5, "load_value": 100.5, "water_level": 50.5, "pressure": 1.013
  },
  "calculated": {
    "alfa_x": 0.123456, "alfa_y": 0.234567, "area": 100.5
  }
}
```

## Query su JSONB

### Esempi di query su PostgreSQL

```sql
-- Filtrare per valore specifico in RAWDATACOR
SELECT * FROM rawdatacor
WHERE measurements->>'0'->>'value' IS NOT NULL;

-- Range query su ELABDATADISP
SELECT * FROM elabdatadisp
WHERE (measurements->'kinematics'->>'speed')::NUMERIC > 10.0;

-- Aggregazione su JSONB
SELECT unit_name, AVG((measurements->'kinematics'->>'speed')::NUMERIC) as avg_speed
FROM elabdatadisp
GROUP BY unit_name;

-- Containment check
SELECT * FROM elabdatadisp
WHERE measurements @> '{"kinematics":{}}';

-- GIN index scan (veloce)
SELECT * FROM rawdatacor
WHERE measurements ? '0'
LIMIT 1000;
```

## Partizionamento

Entrambe le tabelle sono partizionate per anno (RANGE partitioning su `EXTRACT(YEAR FROM event_date)`):

```sql
-- Partizioni create automaticamente per:
-- rawdatacor_2014, rawdatacor_2015, ..., rawdatacor_2031
-- elabdatadisp_2014, elabdatadisp_2015, ..., elabdatadisp_2031

-- Query partizionata (constraint exclusion automatico)
SELECT * FROM rawdatacor
WHERE event_date >= '2024-01-01' AND event_date < '2024-12-31';
-- PostgreSQL usa solo rawdatacor_2024
```

## Indici

### RAWDATACOR
```sql
idx_unit_tool_node_datetime  -- (unit_name, tool_name_id, node_num, event_date, event_time)
idx_unit_tool                -- (unit_name, tool_name_id)
idx_measurements_gin         -- GIN index su measurements JSONB
idx_event_date               -- (event_date)
```

### ELABDATADISP
```sql
idx_unit_tool_node_datetime  -- (unit_name, tool_name_id, node_num, event_date, event_time)
idx_unit_tool                -- (unit_name, tool_name_id)
idx_measurements_gin         -- GIN index su measurements JSONB
idx_event_date               -- (event_date)
```

## Benchmark

Il benchmark confronta le performance tra MySQL e PostgreSQL su:

- **SELECT semplici**: By PK, date range, unit+tool
- **Query JSONB**: Filtri su campi, range query, containment checks
- **Aggregazioni**: Group by, AVG, COUNT
- **JOIN**: Tra le due tabelle

**Risultati salvati in:** `benchmark_results/benchmark_TIMESTAMP.json`

Formato risultati:
```json
{
  "timestamp": "2024-01-15T10:30:45.123456",
  "iterations": 5,
  "tables": {
    "RAWDATACOR": {
      "select_by_pk": {
        "mysql": {
          "min": 0.5,
          "max": 0.8,
          "mean": 0.65,
          "median": 0.65,
          "p95": 0.8
        },
        "postgres": {
          "min": 0.3,
          "max": 0.6,
          "mean": 0.45,
          "p95": 0.6
        }
      }
    }
  }
}
```

## Struttura Progetto

```
mysql2postgres/
 main.py                           # CLI entry point
 config.py                         # Configurazione Pydantic
 .env.example                      # Template configurazione
 pyproject.toml                    # Dipendenze
 README.md                         # Questo file
 src/
     connectors/
        mysql_connector.py        # Connector MySQL
        postgres_connector.py     # Connector PostgreSQL
     transformers/
        schema_transformer.py     # Creazione schema PostgreSQL
        data_transformer.py       # Trasformazione JSONB
     migrator/
        full_migration.py         # Migrazione completa
        incremental_migration.py  # Migrazione delta
        state.py                  # Tracking stato
     benchmark/
        query_generator.py        # Generatore query test
        performance_test.py       # Runner benchmark
     utils/
         logger.py                 # Logging con Rich
         progress.py               # Progress bar
```

## Workflow Consigliato

1. **Setup iniziale**
   ```bash
   python main.py setup --create-schema
   ```

2. **Prima migrazione (completa)**
   ```bash
   python main.py migrate full
   ```

3. **Migrazioni periodiche (incrementali)**
   ```bash
   python main.py migrate incremental
   ```

4. **Benchmark di performance**
   ```bash
   python main.py benchmark --iterations 10
   ```

## Troubleshooting

### Errore di connessione MySQL
- Verificare credenziali in `.env`
- Controllare che MySQL sia online: `mysql -h localhost -u root -p`

### Errore di connessione PostgreSQL
- Verificare che container Incus sia avviato
- Verificare credenziali: `psql -h localhost -U postgres`

### Timeout durante migrazione
- Aumentare `BATCH_SIZE` in `.env` (default: 10000)
- Verificare performance di rete tra MySQL e PostgreSQL

### JSONB con valori NULL
- Il tool esclude automaticamente valori NULL da JSONB (solo valori non-NULL vengono aggiunti)

## Performance Tips

1. **Migration**
   - Aumentare `BATCH_SIZE` per meno transazioni (es. 50000)
   - Disabilitare indici durante migrazione se possibile (non implementato)

2. **Queries on JSONB**
   - Usare `->>` per testo, `->`per JSON
   - GIN indexes accelerano query `?` e `@>`
   - Castare a NUMERIC/INT quando necessario per operazioni

3. **Partizionamento**
   - PostgreSQL usa constraint exclusion per saltare partizioni
   - Query su date range sono automaticamente ottimizzate

## Supporto

Per bug o suggerimenti, aprire una issue nel repository.

## License

MIT