Commit Graph

14 Commits

Author SHA1 Message Date
49dbd98bff fix: Add last_completed_partition column to migration_state table schema
The migration_state table was missing the last_completed_partition column
that was referenced in the migration update queries. This column tracks
which partition was last completed to enable accurate resume capability.

To apply this change to existing databases:
  ALTER TABLE migration_state ADD COLUMN last_completed_partition VARCHAR(255);

For new databases, the table will be created with the column automatically.
2025-12-26 11:39:30 +01:00
648bd98a09 chore: Add debug logging to ELABDATADISP consolidation
Added logging to track which nodes are being consolidated and how many
measurement categories each node has. This helps debug cases where data
appears to be lost during consolidation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-25 19:27:34 +01:00
72035bb1b5 fix: Convert MySQL Decimal values to float for JSON serialization in ELABDATADISP
MySQL returns numeric values as Decimal objects, which are not JSON serializable.
PostgreSQL JSONB requires proper JSON types.

Added convert_value() helper in _build_measurement_for_elabdatadisp_node() to:
- Convert Decimal → float
- Convert str → float
- Pass through other types unchanged

This ensures all numeric values are JSON-serializable before insertion into
the measurements JSONB column.

Fixes: "Object of type Decimal is not JSON serializable" error

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-25 19:06:50 +01:00
5045c8bd86 fix: Add updated_at column back to ELABDATADISP table
The updated_at column was removed from the schema but should be kept for
consistency with the original table structure and to track when rows are
modified.

Changes:
- Added updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP to table schema
- Added updated_at to get_column_order() for elabdatadisp
- Added updated_at to transform_elabdatadisp_row() output

This maintains backward compatibility while still consolidating node_num,
state, and calc_err into the measurements JSONB.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-25 18:46:03 +01:00
42c0d9cdaf chore: Update column order for ELABDATADISP to exclude node/state/calc_err
Updated get_column_order() for elabdatadisp table to return only the
columns that are now stored separately:
- id_elab_data
- unit_name
- tool_name_id
- event_timestamp
- measurements (includes node_num, state, calc_err keyed by node)
- created_at

Removed: node_num, state, calc_err, updated_at (not used after consolidation)

This matches the schema defined in schema_transformer.py where these fields
are noted as being stored in the JSONB measurements column.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-25 18:42:19 +01:00
693228c0da feat: Implement node consolidation for ELABDATADISP table
Add consolidation logic to ELABDATADISP similar to RAWDATACOR:
- Group rows by (unit_name, tool_name_id, event_timestamp)
- Consolidate multiple nodes with same timestamp into single row
- Store node_num, state, calc_err in JSONB measurements keyed by node

Changes:
1. Add _build_measurement_for_elabdatadisp_node() helper
   - Builds measurement object with state, calc_err, and measurement categories
   - Filters out empty categories to save space

2. Update transform_elabdatadisp_row() signature
   - Accept optional measurements parameter for consolidated rows
   - Build from single row if measurements not provided
   - Remove node_num, state, calc_err from returned columns (now in JSONB)
   - Keep only: id_elab_data, unit_name, tool_name_id, event_timestamp, measurements, created_at

3. Add consolidate_elabdatadisp_batch() method
   - Group rows by consolidation key
   - Build consolidated measurements with node numbers as keys
   - Use MAX(idElabData) for checkpoint tracking (resume capability)
   - Use MIN(idElabData) as template for other fields

4. Update transform_batch() to support ELABDATADISP consolidation
   - Check consolidate flag for both tables
   - Call consolidate_elabdatadisp_batch() when needed

Result: ELABDATADISP now consolidates ~5-10:1 like RAWDATACOR,
with all node data (node_num, state, calc_err, measurements) keyed
by node number in JSONB.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-25 18:41:54 +01:00
0461bb3b44 fix: Handle invalid MySQL dates (0000-00-00) gracefully
MySQL can contain invalid/zero dates like '0000-00-00' which cannot be
parsed with strptime. These should be treated as NULL and converted to
the default timestamp (1970-01-01 00:00:00).

Changes to _convert_date():
- Check for '0000-00-00' and invalid date strings
- Wrap strptime in try/except to catch ValueError
- Return None for invalid dates instead of crashing
- Updated callers to check for None and use default timestamp

This allows the migration to continue even when encountering invalid
historical dates in the MySQL database.

Fixes: "time data '0000-00-00' does not match format '%Y-%m-%d'"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-23 19:06:38 +01:00
4f4ba6af51 fix: Import date type explicitly to fix isinstance checks
When we import datetime from the datetime module, we get the datetime class,
not the module. This caused isinstance() checks to fail when checking against
datetime.date (which doesn't exist when datetime is a class).

Solution: Import date explicitly from datetime module and use it in isinstance
checks. Order matters - check datetime before date since datetime is a subclass
of date.

Fixes: "isinstance() arg 2 must be a type, a tuple of types, or a union"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-23 18:56:12 +01:00
eb315c90ff fix: Handle date conversion for string dates in data transformer
When resuming migration, EventDate may be a string (from PostgreSQL queries)
instead of a datetime.date object (from MySQL). The combine() function expects
a datetime.date object, so we now convert strings to dates before combining
with time.

Added _convert_date() helper similar to _convert_time() that handles:
- str: Parse from "YYYY-MM-DD" format
- datetime.date: Return as-is
- datetime.datetime: Extract date component

Fixes error: "combine() argument 1 must be datetime.date, not str"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-23 18:52:42 +01:00
0f217379ea fix: Use actual PostgreSQL row count for total_rows_migrated tracking
Replace session-level counting with direct table COUNT queries to ensure
total_rows_migrated always reflects actual reality in PostgreSQL. This fixes
the discrepancy where the counter was only tracking rows from the current session
and didn't account for earlier insertions or duplicates from failed resume attempts.

Key improvements:
- Use get_row_count() after each batch to get authoritative total
- Preserve previous count on resume and accumulate across sessions
- Remove dependency on error-prone session-level counters
- Ensures migration_state.total_rows_migrated matches actual table row count

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-23 15:33:27 +01:00
b09cfcf9df fix: Add timeout settings and retry logic to MySQL connector
Configuration improvements:
- Set read_timeout=300 (5 minutes) to handle long queries
- Set write_timeout=300 (5 minutes) for writes
- Set max_allowed_packet=64MB to handle larger data transfers

Retry logic:
- Added retry mechanism with max 3 retries on fetch failure
- Auto-reconnect on connection loss before retry
- Better error messages showing retry attempts

This fixes the 'connection is lost' error that occurs during
long-running migrations by:
1. Giving MySQL queries more time to complete
2. Allowing larger packet sizes for bulk data
3. Automatically recovering from connection drops

Fixes: 'Connection is lost' error during full migration
2025-12-21 09:53:34 +01:00
de6bde17c9 feat: Add sequences for auto-incrementing IDs
- Create rawdatacor_id_seq for auto-increment of id column
- Create elabdatadisp_id_seq for auto-increment of id_elab_data column
- Both sequences use DEFAULT nextval() to auto-generate IDs on insert

This replaces PRIMARY KEY functionality since PostgreSQL doesn't
support PRIMARY KEY on partitioned tables with expression-based ranges.
IDs are now auto-incremented without primary key constraint.

Tested: schema creation works correctly with sequences
2025-12-10 20:20:52 +01:00
2834f8b578 fix: Remove unsupported constraints from partitioned tables
PostgreSQL doesn't support PRIMARY KEY or UNIQUE constraints on
partitioned tables when using RANGE partitioning on expressions
(like EXTRACT(YEAR FROM event_date)).

Changed:
- RAWDATACOR: removed PRIMARY KEY (id, event_date) and UNIQUE constraint
- ELABDATADISP: removed PRIMARY KEY (id_elab_data, event_date) and UNIQUE constraint
- Tables now have no constraints except NOT NULL on required columns

This is a PostgreSQL limitation with partitioned tables.
Constraints can be added per-partition if needed, but for simplicity
we rely on application-level validation.

Fixes: 'vincolo PRIMARY KEY non supportato con una definizione di chiave di partizione'
2025-12-10 20:18:20 +01:00
62577d3200 feat: Add MySQL to PostgreSQL migration tool with JSONB transformation
Implement comprehensive migration solution with:
- Full and incremental migration modes
- JSONB schema transformation for RAWDATACOR and ELABDATADISP tables
- Native PostgreSQL partitioning (2014-2031)
- Optimized GIN indexes for JSONB queries
- Rich logging with progress tracking
- Complete benchmark system for MySQL vs PostgreSQL comparison
- CLI interface with multiple commands (setup, migrate, benchmark)
- Configuration management via .env file
- Error handling and retry logic
- Batch processing for performance (configurable batch size)

Database transformations:
- RAWDATACOR: 16 Val columns + units → single JSONB measurements
- ELABDATADISP: 25+ measurement fields → structured JSONB with categories

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 19:57:11 +01:00