diff --git a/CHANGELOG.md b/CHANGELOG.md index 3972926..c89c3d9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -19,6 +19,10 @@ - **Configuration**: Added `PROGRESS_LOG_INTERVAL` to control logging frequency - **Configuration**: Added `BENCHMARK_OUTPUT_DIR` to specify benchmark results directory - **Documentation**: Updated README.md, MIGRATION_WORKFLOW.md, QUICKSTART.md, EXAMPLE_WORKFLOW.md with current implementation +- **Documentation**: Corrected index and partitioning documentation to reflect actual PostgreSQL schema: + - Uses `event_timestamp` (not separate event_date/event_time) + - Primary key includes `event_year` for partitioning + - Consolidation key is UNIQUE (unit_name, tool_name_id, event_timestamp, event_year) ### Removed - **migration_state.json**: Replaced by PostgreSQL table diff --git a/MIGRATION_WORKFLOW.md b/MIGRATION_WORKFLOW.md index 742fdb1..c721ca9 100644 --- a/MIGRATION_WORKFLOW.md +++ b/MIGRATION_WORKFLOW.md @@ -338,6 +338,33 @@ CREATE TABLE rawdatacor_2024 PARTITION OF rawdatacor PostgreSQL automatically routes INSERTs to the correct partition based on `event_year`. +### Indexes in PostgreSQL + +Both tables have these indexes automatically created: + +**Primary Key** (required for partitioned tables): +```sql +-- Must include partition key (event_year) +UNIQUE (id, event_year) +``` + +**Consolidation Key** (prevents duplicates): +```sql +-- Ensures one record per consolidation group +UNIQUE (unit_name, tool_name_id, event_timestamp, event_year) +``` + +**Query Optimization**: +```sql +-- Fast filtering by unit/tool +(unit_name, tool_name_id) + +-- JSONB queries with GIN index +GIN (measurements) +``` + +**Note**: All indexes are automatically created on all partitions when you run `setup --create-schema`. + --- ## Summary diff --git a/README.md b/README.md index 3af958d..ad5980d 100644 --- a/README.md +++ b/README.md @@ -254,16 +254,25 @@ LIMIT 1000; ## Partizionamento -Entrambe le tabelle sono partizionate per anno (RANGE partitioning su `EXTRACT(YEAR FROM event_date)`): +Entrambe le tabelle sono partizionate per anno usando la colonna `event_year`: ```sql -- Partizioni create automaticamente per: -- rawdatacor_2014, rawdatacor_2015, ..., rawdatacor_2031 -- elabdatadisp_2014, elabdatadisp_2015, ..., elabdatadisp_2031 +-- Partizionamento basato su event_year (calcolato da event_timestamp durante insert) +CREATE TABLE rawdatacor_2024 PARTITION OF rawdatacor + FOR VALUES FROM (2024) TO (2025); + -- Query partizionata (constraint exclusion automatico) SELECT * FROM rawdatacor -WHERE event_date >= '2024-01-01' AND event_date < '2024-12-31'; +WHERE event_year = 2024; +-- PostgreSQL usa solo rawdatacor_2024 + +-- Oppure usando event_timestamp +SELECT * FROM rawdatacor +WHERE event_timestamp >= '2024-01-01' AND event_timestamp < '2025-01-01'; -- PostgreSQL usa solo rawdatacor_2024 ``` @@ -271,18 +280,28 @@ WHERE event_date >= '2024-01-01' AND event_date < '2024-12-31'; ### RAWDATACOR ```sql -idx_unit_tool_node_datetime -- (unit_name, tool_name_id, node_num, event_date, event_time) -idx_unit_tool -- (unit_name, tool_name_id) -idx_measurements_gin -- GIN index su measurements JSONB -idx_event_date -- (event_date) +-- Primary key (necessario per tabelle partizionate) +rawdatacor_pkey -- UNIQUE (id, event_year) + +-- Consolidation key (previene duplicati) +rawdatacor_consolidation_key_unique -- UNIQUE (unit_name, tool_name_id, event_timestamp, event_year) + +-- Query optimization +idx_rawdatacor_unit_tool -- (unit_name, tool_name_id) +idx_rawdatacor_measurements_gin -- GIN (measurements) per query JSONB ``` ### ELABDATADISP ```sql -idx_unit_tool_node_datetime -- (unit_name, tool_name_id, node_num, event_date, event_time) -idx_unit_tool -- (unit_name, tool_name_id) -idx_measurements_gin -- GIN index su measurements JSONB -idx_event_date -- (event_date) +-- Primary key (necessario per tabelle partizionate) +elabdatadisp_pkey -- UNIQUE (id, event_year) + +-- Consolidation key (previene duplicati) +elabdatadisp_consolidation_key_unique -- UNIQUE (unit_name, tool_name_id, event_timestamp, event_year) + +-- Query optimization +idx_elabdatadisp_unit_tool -- (unit_name, tool_name_id) +idx_elabdatadisp_measurements_gin -- GIN (measurements) per query JSONB ``` ## Benchmark