Go to file

alex fe2d173b0f Optimize consolidation fetching with GROUP BY and reduced limit

Changed consolidation_group_limit from 100k to 10k for faster queries.

Reverted to GROUP BY approach for getting consolidation keys:
- Uses MySQL index efficiently: (UnitName, ToolNameID, NodeNum, EventDate, EventTime)
- GROUP BY with NodeNum ensures we don't lose any combinations
- Faster GROUP BY queries than large ORDER BY queries
- Smaller LIMIT = faster pagination

This matches the original optimization suggestion and should be faster.

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2025-12-25 22:22:30 +01:00

scripts

chore: Add validation queries for default timestamp records

2025-12-23 20:47:01 +01:00

src

Optimize consolidation fetching with GROUP BY and reduced limit

2025-12-25 22:22:30 +01:00

tests

fix: Use actual PostgreSQL row count for total_rows_migrated tracking

2025-12-23 15:33:27 +01:00

.env.example

Add detailed partition progress logging

2025-12-25 22:10:43 +01:00

.gitignore

feat: Add MySQL to PostgreSQL migration tool with JSONB transformation

2025-12-10 19:57:11 +01:00

.python-version

feat: Add MySQL to PostgreSQL migration tool with JSONB transformation

2025-12-10 19:57:11 +01:00

config.py

Optimize consolidation fetching with GROUP BY and reduced limit

2025-12-25 22:22:30 +01:00

EXAMPLE_WORKFLOW.md

docs: Add detailed example workflow

2025-12-10 19:59:22 +01:00

IMPLEMENTATION_SUMMARY.md

docs: Add implementation summary

2025-12-10 19:58:49 +01:00

install.sh

docs: Add comprehensive documentation and helper scripts

2025-12-10 19:58:20 +01:00

main.py

fix: Use actual PostgreSQL row count for total_rows_migrated tracking

2025-12-23 15:33:27 +01:00

MIGRATION_WORKFLOW.md

fix: Add timeout settings and retry logic to MySQL connector

2025-12-21 09:53:34 +01:00

pyproject.toml

feat: Add MySQL to PostgreSQL migration tool with JSONB transformation

2025-12-10 19:57:11 +01:00

QUICKSTART.md

docs: Add comprehensive documentation and helper scripts

2025-12-10 19:58:20 +01:00

README.md

feat: Add MySQL to PostgreSQL migration tool with JSONB transformation

2025-12-10 19:57:11 +01:00

START_HERE.md

docs: Add quick navigation guide (START_HERE.md)

2025-12-10 20:00:50 +01:00

uv.lock

fix: Remove unsupported constraints from partitioned tables

2025-12-10 20:18:20 +01:00

README.md

MySQL to PostgreSQL Migration Tool

Un tool robusto per la migrazione di database MySQL a PostgreSQL con trasformazione di colonne multiple in JSONB, supporto per partizionamento nativo di PostgreSQL, e sistema completo di benchmark per confrontare le performance.

Caratteristiche

Migrazione Completa: Trasferimento di tutti i dati da MySQL a PostgreSQL
Migrazione Incrementale: Sincronizzazione periodica basata su timestamp
Trasformazione JSONB: Consolidamento automatico di colonne multiple in campi JSONB
Partizionamento: Supporto per partizioni per anno (2014-2031)
Indici Ottimizzati: GIN indexes per query efficienti su JSONB
Progress Tracking: Barra di avanzamento in tempo reale con ETA
Benchmark: Sistema completo per confrontare performance MySQL vs PostgreSQL
Logging: Logging strutturato con Rich per output colorato
Dry-Run Mode: Modalità test senza modificare i dati

Setup

1. Requisiti

Python 3.10+
MySQL 5.7+
PostgreSQL 13+
pip

2. Installazione

# Clonare il repository
cd mysql2postgres

# Creare virtual environment
python -m venv venv
source venv/bin/activate  # su Windows: venv\Scripts\activate

# Installare dipendenze
pip install -e .

3. Configurazione

Copiare .env.example a .env e configurare i dettagli di connessione:

cp .env.example .env

Modificare .env con i tuoi dettagli:

# MySQL Source Database
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_USER=root
MYSQL_PASSWORD=your_password
MYSQL_DATABASE=your_database

# PostgreSQL Target Database (container Incus)
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_password
POSTGRES_DATABASE=migrated_db

# Migration Settings
BATCH_SIZE=10000
LOG_LEVEL=INFO
DRY_RUN=false

# Benchmark Settings
BENCHMARK_ITERATIONS=5

Utilizzo

Comandi Disponibili

Info Configuration

python main.py info

Mostra la configurazione corrente di MySQL e PostgreSQL.

Setup Database

python main.py setup --create-schema

Crea lo schema PostgreSQL con:

Tabelle rawdatacor e elabdatadisp partizionate per anno
Indici ottimizzati per JSONB
Tabella di tracking migration_state

Migrazione Completa

# Migrare tutte le tabelle
python main.py migrate full

# Migrare una tabella specifica
python main.py migrate full --table RAWDATACOR

# Modalità dry-run (senza modificare i dati)
python main.py migrate full --dry-run

Migrazione Incrementale

# Migrare solo i cambiamenti dal last sync
python main.py migrate incremental

# Per una tabella specifica
python main.py migrate incremental --table ELABDATADISP

# Specificare file di stato personalizzato
python main.py migrate incremental --state-file custom_state.json

Benchmark Performance

# Eseguire benchmark con iterations da config (default: 5)
python main.py benchmark

# Benchmark con numero specifico di iterazioni
python main.py benchmark --iterations 10

# Salvare risultati in file specifico
python main.py benchmark --output my_results.json

Trasformazione Dati

RAWDATACOR

Da MySQL:

Val0, Val1, ..., ValF (16 colonne)
Val0_unitmisure, Val1_unitmisure, ..., ValF_unitmisure (16 colonne)

A PostgreSQL (JSONB measurements):

{
  "0": {"value": "123.45", "unit": "°C"},
  "1": {"value": "67.89", "unit": "bar"},
  ...
  "F": {"value": "11.22", "unit": "m/s"}
}

ELABDATADISP

Da MySQL: 25+ colonne di misure e calcoli

A PostgreSQL (JSONB measurements):

{
  "shifts": {
    "x": 1.234567, "y": 2.345678, "z": 3.456789,
    "h": 4.567890, "h_dir": 5.678901, "h_local": 6.789012
  },
  "coordinates": {
    "x": 10.123456, "y": 20.234567, "z": 30.345678,
    "x_star": 40.456789, "z_star": 50.567890
  },
  "kinematics": {
    "speed": 1.111111, "speed_local": 2.222222,
    "acceleration": 3.333333, "acceleration_local": 4.444444
  },
  "sensors": {
    "t_node": 25.5, "load_value": 100.5, "water_level": 50.5, "pressure": 1.013
  },
  "calculated": {
    "alfa_x": 0.123456, "alfa_y": 0.234567, "area": 100.5
  }
}

Query su JSONB

Esempi di query su PostgreSQL

-- Filtrare per valore specifico in RAWDATACOR
SELECT * FROM rawdatacor
WHERE measurements->>'0'->>'value' IS NOT NULL;

-- Range query su ELABDATADISP
SELECT * FROM elabdatadisp
WHERE (measurements->'kinematics'->>'speed')::NUMERIC > 10.0;

-- Aggregazione su JSONB
SELECT unit_name, AVG((measurements->'kinematics'->>'speed')::NUMERIC) as avg_speed
FROM elabdatadisp
GROUP BY unit_name;

-- Containment check
SELECT * FROM elabdatadisp
WHERE measurements @> '{"kinematics":{}}';

-- GIN index scan (veloce)
SELECT * FROM rawdatacor
WHERE measurements ? '0'
LIMIT 1000;

Partizionamento

Entrambe le tabelle sono partizionate per anno (RANGE partitioning su EXTRACT(YEAR FROM event_date)):

-- Partizioni create automaticamente per:
-- rawdatacor_2014, rawdatacor_2015, ..., rawdatacor_2031
-- elabdatadisp_2014, elabdatadisp_2015, ..., elabdatadisp_2031

-- Query partizionata (constraint exclusion automatico)
SELECT * FROM rawdatacor
WHERE event_date >= '2024-01-01' AND event_date < '2024-12-31';
-- PostgreSQL usa solo rawdatacor_2024

Indici

RAWDATACOR

idx_unit_tool_node_datetime  -- (unit_name, tool_name_id, node_num, event_date, event_time)
idx_unit_tool                -- (unit_name, tool_name_id)
idx_measurements_gin         -- GIN index su measurements JSONB
idx_event_date               -- (event_date)

ELABDATADISP

idx_unit_tool_node_datetime  -- (unit_name, tool_name_id, node_num, event_date, event_time)
idx_unit_tool                -- (unit_name, tool_name_id)
idx_measurements_gin         -- GIN index su measurements JSONB
idx_event_date               -- (event_date)

Benchmark

Il benchmark confronta le performance tra MySQL e PostgreSQL su:

SELECT semplici: By PK, date range, unit+tool
Query JSONB: Filtri su campi, range query, containment checks
Aggregazioni: Group by, AVG, COUNT
JOIN: Tra le due tabelle

Risultati salvati in: benchmark_results/benchmark_TIMESTAMP.json

Formato risultati:

{
  "timestamp": "2024-01-15T10:30:45.123456",
  "iterations": 5,
  "tables": {
    "RAWDATACOR": {
      "select_by_pk": {
        "mysql": {
          "min": 0.5,
          "max": 0.8,
          "mean": 0.65,
          "median": 0.65,
          "p95": 0.8
        },
        "postgres": {
          "min": 0.3,
          "max": 0.6,
          "mean": 0.45,
          "p95": 0.6
        }
      }
    }
  }
}

Struttura Progetto

mysql2postgres/
 main.py                           # CLI entry point
 config.py                         # Configurazione Pydantic
 .env.example                      # Template configurazione
 pyproject.toml                    # Dipendenze
 README.md                         # Questo file
 src/
     connectors/
        mysql_connector.py        # Connector MySQL
        postgres_connector.py     # Connector PostgreSQL
     transformers/
        schema_transformer.py     # Creazione schema PostgreSQL
        data_transformer.py       # Trasformazione JSONB
     migrator/
        full_migration.py         # Migrazione completa
        incremental_migration.py  # Migrazione delta
        state.py                  # Tracking stato
     benchmark/
        query_generator.py        # Generatore query test
        performance_test.py       # Runner benchmark
     utils/
         logger.py                 # Logging con Rich
         progress.py               # Progress bar

Workflow Consigliato

Setup iniziale
```
python main.py setup --create-schema
```
Prima migrazione (completa)
```
python main.py migrate full
```
Migrazioni periodiche (incrementali)
```
python main.py migrate incremental
```

Benchmark di performance

python main.py benchmark --iterations 10

Troubleshooting

Errore di connessione MySQL

Verificare credenziali in .env
Controllare che MySQL sia online: mysql -h localhost -u root -p

Errore di connessione PostgreSQL

Verificare che container Incus sia avviato
Verificare credenziali: psql -h localhost -U postgres

Timeout durante migrazione

Aumentare BATCH_SIZE in .env (default: 10000)
Verificare performance di rete tra MySQL e PostgreSQL

JSONB con valori NULL

Il tool esclude automaticamente valori NULL da JSONB (solo valori non-NULL vengono aggiunti)

Performance Tips

Migration
- Aumentare BATCH_SIZE per meno transazioni (es. 50000)
- Disabilitare indici durante migrazione se possibile (non implementato)
Queries on JSONB
- Usare ->> per testo, ->per JSON
- GIN indexes accelerano query ? e @>
- Castare a NUMERIC/INT quando necessario per operazioni
Partizionamento
- PostgreSQL usa constraint exclusion per saltare partizioni
- Query su date range sono automaticamente ottimizzate

Supporto

Per bug o suggerimenti, aprire una issue nel repository.

License

MIT

README.md Unescape Escape

MySQL to PostgreSQL Migration Tool

Caratteristiche

Setup

1. Requisiti

2. Installazione

3. Configurazione

Utilizzo

Comandi Disponibili

Info Configuration

Setup Database

Migrazione Completa

Migrazione Incrementale

Benchmark Performance

Trasformazione Dati

RAWDATACOR

ELABDATADISP

Query su JSONB

Esempi di query su PostgreSQL

Partizionamento

Indici

RAWDATACOR

ELABDATADISP

Benchmark

Struttura Progetto

Workflow Consigliato

Troubleshooting

Errore di connessione MySQL

Errore di connessione PostgreSQL

Timeout durante migrazione

JSONB con valori NULL

Performance Tips

Supporto

License

README.md