This directory contains refactored versions of the legacy scripts from old_scripts/, reimplemented with modern Python best practices, async/await support, and proper error handling.

Overview

The refactored scripts provide the same functionality as their legacy counterparts but with significant improvements:

Key Improvements

✅ Full Async/Await Support

Uses aiomysql for non-blocking database operations
Compatible with asyncio event loops
Can be integrated into existing async orchestrators

✅ Proper Logging

Uses Python's logging module instead of print() statements
Configurable log levels (DEBUG, INFO, WARNING, ERROR)
Structured log messages with context

✅ Type Hints & Documentation

Full type hints for all functions
Comprehensive docstrings following Google style
Self-documenting code

✅ Error Handling

Proper exception handling with logging
Retry logic available via utility functions
Graceful degradation

✅ Configuration Management

Centralized configuration via DatabaseConfig class
No hardcoded values
Environment-aware settings

✅ Code Quality

Follows PEP 8 style guide
Passes ruff linting
Clean, maintainable code structure

Directory Structure

refactory_scripts/
├── __init__.py           # Package initialization
├── README.md            # This file
├── config/              # Configuration management
│   └── __init__.py      # DatabaseConfig class
├── utils/               # Utility functions
│   └── __init__.py      # Database helpers, retry logic, etc.
└── loaders/             # Data loader modules
    ├── __init__.py      # Loader exports
    ├── hirpinia_loader.py
    ├── vulink_loader.py
    └── sisgeo_loader.py

Refactored Scripts

1. Hirpinia Loader (`hirpinia_loader.py`)

Replaces: old_scripts/hirpiniaLoadScript.py

Purpose: Processes Hirpinia ODS files and loads sensor data into the database.

Features:

Parses ODS (OpenDocument Spreadsheet) files
Extracts data from multiple sheets (one per node)
Handles datetime parsing and validation
Batch inserts with INSERT IGNORE
Supports MATLAB elaboration triggering

Usage:

from refactory_scripts.loaders import HirpiniaLoader
from refactory_scripts.config import DatabaseConfig

async def process_hirpinia_file(file_path: str):
    db_config = DatabaseConfig()

    async with HirpiniaLoader(db_config) as loader:
        success = await loader.process_file(file_path)

    return success

Command Line:

python -m refactory_scripts.loaders.hirpinia_loader /path/to/file.ods

2. Vulink Loader (`vulink_loader.py`)

Replaces: old_scripts/vulinkScript.py

Purpose: Processes Vulink CSV files with battery monitoring and pH alarm management.

Features:

Serial number to unit/tool name mapping
Node configuration loading (depth, thresholds)
Battery level monitoring with alarm creation
pH threshold checking with multi-level alarms
Time-based alarm suppression (24h interval for battery)

Alarm Types:

Type 2: Low battery alarms (<25%)
Type 3: pH threshold alarms (3 levels)

Usage:

from refactory_scripts.loaders import VulinkLoader
from refactory_scripts.config import DatabaseConfig

async def process_vulink_file(file_path: str):
    db_config = DatabaseConfig()

    async with VulinkLoader(db_config) as loader:
        success = await loader.process_file(file_path)

    return success

Command Line:

python -m refactory_scripts.loaders.vulink_loader /path/to/file.csv

3. Sisgeo Loader (`sisgeo_loader.py`)

Replaces: old_scripts/sisgeoLoadScript.py

Purpose: Processes Sisgeo sensor data with smart duplicate handling.

Features:

Handles two sensor types:
- Pressure sensors (1 value): Piezometers
- Vibrating wire sensors (3 values): Strain gauges, tiltmeters, etc.
Smart duplicate detection based on time thresholds
Conditional INSERT vs UPDATE logic
Preserves data integrity

Data Processing Logic:

Scenario	BatLevelModule	Time Diff	Action
No previous record	N/A	N/A	INSERT
Previous exists	NULL	>= 5h	INSERT
Previous exists	NULL	< 5h	UPDATE
Previous exists	NOT NULL	N/A	INSERT

Usage:

from refactory_scripts.loaders import SisgeoLoader
from refactory_scripts.config import DatabaseConfig

async def process_sisgeo_data(raw_data, elab_data):
    db_config = DatabaseConfig()

    async with SisgeoLoader(db_config) as loader:
        raw_count, elab_count = await loader.process_data(raw_data, elab_data)

    return raw_count, elab_count

Configuration

Database Configuration

Configuration is loaded from env/config.ini:

[mysql]
host = 10.211.114.173
port = 3306
database = ase_lar
user = root
password = ****

Loading Configuration:

from refactory_scripts.config import DatabaseConfig

# Default: loads from env/config.ini, section [mysql]
db_config = DatabaseConfig()

# Custom file and section
db_config = DatabaseConfig(
    config_file="/path/to/config.ini",
    section="production_db"
)

# Access configuration
print(db_config.host)
print(db_config.database)

# Get as dict for aiomysql
conn_params = db_config.as_dict()

Utility Functions

Database Helpers

from refactory_scripts.utils import get_db_connection, execute_query, execute_many

# Get async database connection
conn = await get_db_connection(db_config.as_dict())

# Execute query with single result
result = await execute_query(
    conn,
    "SELECT * FROM table WHERE id = %s",
    (123,),
    fetch_one=True
)

# Execute query with multiple results
results = await execute_query(
    conn,
    "SELECT * FROM table WHERE status = %s",
    ("active",),
    fetch_all=True
)

# Batch insert
rows = [(1, "a"), (2, "b"), (3, "c")]
count = await execute_many(
    conn,
    "INSERT INTO table (id, name) VALUES (%s, %s)",
    rows
)

Retry Logic

from refactory_scripts.utils import retry_on_failure

# Retry with exponential backoff
result = await retry_on_failure(
    some_async_function,
    max_retries=3,
    delay=1.0,
    backoff=2.0,
    arg1="value1",
    arg2="value2"
)

DateTime Parsing

from refactory_scripts.utils import parse_datetime

# Parse ISO format
dt = parse_datetime("2024-10-11T14:30:00")

# Parse separate date and time
dt = parse_datetime("2024-10-11", "14:30:00")

# Parse date only
dt = parse_datetime("2024-10-11")

Logging

All loaders use Python's standard logging module:

import logging

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)

# Use in scripts
logger = logging.getLogger(__name__)
logger.info("Processing started")
logger.debug("Debug information")
logger.warning("Warning message")
logger.error("Error occurred", exc_info=True)

Log Levels:

DEBUG: Detailed diagnostic information
INFO: General informational messages
WARNING: Warning messages (non-critical issues)
ERROR: Error messages with stack traces

Integration with Orchestrators

The refactored loaders can be easily integrated into the existing orchestrator system:

# In your orchestrator worker
from refactory_scripts.loaders import HirpiniaLoader
from refactory_scripts.config import DatabaseConfig

async def worker(worker_id: int, cfg: dict, pool: object) -> None:
    db_config = DatabaseConfig()

    async with HirpiniaLoader(db_config) as loader:
        # Process files from queue
        file_path = await get_next_file_from_queue()
        success = await loader.process_file(file_path)

        if success:
            await mark_file_processed(file_path)

Migration from Legacy Scripts

Mapping Table

Legacy Script	Refactored Module	Class Name
`hirpiniaLoadScript.py`	`hirpinia_loader.py`	`HirpiniaLoader`
`vulinkScript.py`	`vulink_loader.py`	`VulinkLoader`
`sisgeoLoadScript.py`	`sisgeo_loader.py`	`SisgeoLoader`
`sorotecPini.py`	⏳ TODO	`SorotecLoader`
`TS_PiniScript.py`	⏳ TODO	`TSPiniLoader`

Key Differences

Async/Await:
- Legacy: conn = MySQLConnection(**db_config)
- Refactored: conn = await get_db_connection(db_config.as_dict())
Error Handling:
- Legacy: print('Error:', e)
- Refactored: logger.error(f"Error: {e}", exc_info=True)
Configuration:
- Legacy: read_db_config() returns dict
- Refactored: DatabaseConfig() returns object with validation
Context Managers:
- Legacy: Manual connection management
- Refactored: async with Loader(config) as loader:

Testing

Unit Tests (TODO)

# Run tests
pytest tests/test_refactory_scripts/

# Run with coverage
pytest --cov=refactory_scripts tests/

Manual Testing

# Set log level
export LOG_LEVEL=DEBUG

# Test Hirpinia loader
python -m refactory_scripts.loaders.hirpinia_loader /path/to/test.ods

# Test with Python directly
python3 << 'EOF'
import asyncio
from refactory_scripts.loaders import HirpiniaLoader
from refactory_scripts.config import DatabaseConfig

async def test():
    db_config = DatabaseConfig()
    async with HirpiniaLoader(db_config) as loader:
        result = await loader.process_file("/path/to/file.ods")
        print(f"Result: {result}")

asyncio.run(test())
EOF

Performance Considerations

Async Benefits

Non-blocking I/O: Database operations don't block the event loop
Concurrent Processing: Multiple files can be processed simultaneously
Better Resource Utilization: CPU-bound operations can run during I/O waits

Batch Operations

Use execute_many() for bulk inserts (faster than individual INSERT statements)
Example: Hirpinia loader processes all rows in one batch operation

Connection Pooling

When integrating with orchestrators, reuse connection pools:

# Don't create new connections in loops
# ❌ Bad
for file in files:
    async with HirpiniaLoader(db_config) as loader:
        await loader.process_file(file)

# ✅ Good - reuse loader instance
async with HirpiniaLoader(db_config) as loader:
    for file in files:
        await loader.process_file(file)

Future Enhancements

Planned Improvements

Complete refactoring of sorotecPini.py
Complete refactoring of TS_PiniScript.py
Add unit tests with pytest
Add integration tests
Implement CSV parsing for Vulink loader
Add metrics and monitoring (Prometheus?)
Add data validation schemas (Pydantic?)
Implement retry policies for transient failures
Add dry-run mode for testing
Create CLI tool with argparse

Potential Features

Data Validation: Use Pydantic models for input validation
Metrics: Track processing times, error rates, etc.
Dead Letter Queue: Handle permanently failed records
Idempotency: Ensure repeated processing is safe
Streaming: Process large files in chunks

Contributing

When adding new loaders:

Follow the existing pattern (async context manager)
Add comprehensive docstrings
Include type hints
Use the logging module
Add error handling with context
Update this README
Add unit tests

Support

For issues or questions:

Check logs with LOG_LEVEL=DEBUG
Review the legacy script comparison
Consult the main project documentation

Version History

v1.0.0 (2024-10-11)

Initial refactored implementation
HirpiniaLoader complete
VulinkLoader complete (pending CSV parsing)
SisgeoLoader complete
Base utilities and configuration management
Comprehensive documentation

License

Same as the main ASE project.

README.md

Refactored Scripts - Modern Async Implementation

Overview

Key Improvements

Directory Structure

Refactored Scripts

1. Hirpinia Loader (hirpinia_loader.py)

2. Vulink Loader (vulink_loader.py)

3. Sisgeo Loader (sisgeo_loader.py)

Configuration

Database Configuration

Utility Functions

Database Helpers

Retry Logic

DateTime Parsing

Logging

Integration with Orchestrators

Migration from Legacy Scripts

Mapping Table

Key Differences

Testing

Unit Tests (TODO)

Manual Testing

Performance Considerations

Async Benefits

Batch Operations

Connection Pooling

Future Enhancements

Planned Improvements

Potential Features

Contributing

Support

Version History

v1.0.0 (2024-10-11)

License

1. Hirpinia Loader (`hirpinia_loader.py`)

2. Vulink Loader (`vulink_loader.py`)

3. Sisgeo Loader (`sisgeo_loader.py`)