Refactored Scripts - Modern Async Implementation
This directory contains refactored versions of the legacy scripts from old_scripts/, reimplemented with modern Python best practices, async/await support, and proper error handling.
Overview
The refactored scripts provide the same functionality as their legacy counterparts but with significant improvements:
Key Improvements
✅ Full Async/Await Support
- Uses
aiomysqlfor non-blocking database operations - Compatible with asyncio event loops
- Can be integrated into existing async orchestrators
✅ Proper Logging
- Uses Python's
loggingmodule instead ofprint()statements - Configurable log levels (DEBUG, INFO, WARNING, ERROR)
- Structured log messages with context
✅ Type Hints & Documentation
- Full type hints for all functions
- Comprehensive docstrings following Google style
- Self-documenting code
✅ Error Handling
- Proper exception handling with logging
- Retry logic available via utility functions
- Graceful degradation
✅ Configuration Management
- Centralized configuration via
DatabaseConfigclass - No hardcoded values
- Environment-aware settings
✅ Code Quality
- Follows PEP 8 style guide
- Passes ruff linting
- Clean, maintainable code structure
Directory Structure
refactory_scripts/
├── __init__.py # Package initialization
├── README.md # This file
├── config/ # Configuration management
│ └── __init__.py # DatabaseConfig class
├── utils/ # Utility functions
│ └── __init__.py # Database helpers, retry logic, etc.
└── loaders/ # Data loader modules
├── __init__.py # Loader exports
├── hirpinia_loader.py
├── vulink_loader.py
└── sisgeo_loader.py
Refactored Scripts
1. Hirpinia Loader (hirpinia_loader.py)
Replaces: old_scripts/hirpiniaLoadScript.py
Purpose: Processes Hirpinia ODS files and loads sensor data into the database.
Features:
- Parses ODS (OpenDocument Spreadsheet) files
- Extracts data from multiple sheets (one per node)
- Handles datetime parsing and validation
- Batch inserts with
INSERT IGNORE - Supports MATLAB elaboration triggering
Usage:
from refactory_scripts.loaders import HirpiniaLoader
from refactory_scripts.config import DatabaseConfig
async def process_hirpinia_file(file_path: str):
db_config = DatabaseConfig()
async with HirpiniaLoader(db_config) as loader:
success = await loader.process_file(file_path)
return success
Command Line:
python -m refactory_scripts.loaders.hirpinia_loader /path/to/file.ods
2. Vulink Loader (vulink_loader.py)
Replaces: old_scripts/vulinkScript.py
Purpose: Processes Vulink CSV files with battery monitoring and pH alarm management.
Features:
- Serial number to unit/tool name mapping
- Node configuration loading (depth, thresholds)
- Battery level monitoring with alarm creation
- pH threshold checking with multi-level alarms
- Time-based alarm suppression (24h interval for battery)
Alarm Types:
- Type 2: Low battery alarms (<25%)
- Type 3: pH threshold alarms (3 levels)
Usage:
from refactory_scripts.loaders import VulinkLoader
from refactory_scripts.config import DatabaseConfig
async def process_vulink_file(file_path: str):
db_config = DatabaseConfig()
async with VulinkLoader(db_config) as loader:
success = await loader.process_file(file_path)
return success
Command Line:
python -m refactory_scripts.loaders.vulink_loader /path/to/file.csv
3. Sisgeo Loader (sisgeo_loader.py)
Replaces: old_scripts/sisgeoLoadScript.py
Purpose: Processes Sisgeo sensor data with smart duplicate handling.
Features:
- Handles two sensor types:
- Pressure sensors (1 value): Piezometers
- Vibrating wire sensors (3 values): Strain gauges, tiltmeters, etc.
- Smart duplicate detection based on time thresholds
- Conditional INSERT vs UPDATE logic
- Preserves data integrity
Data Processing Logic:
| Scenario | BatLevelModule | Time Diff | Action |
|---|---|---|---|
| No previous record | N/A | N/A | INSERT |
| Previous exists | NULL | >= 5h | INSERT |
| Previous exists | NULL | < 5h | UPDATE |
| Previous exists | NOT NULL | N/A | INSERT |
Usage:
from refactory_scripts.loaders import SisgeoLoader
from refactory_scripts.config import DatabaseConfig
async def process_sisgeo_data(raw_data, elab_data):
db_config = DatabaseConfig()
async with SisgeoLoader(db_config) as loader:
raw_count, elab_count = await loader.process_data(raw_data, elab_data)
return raw_count, elab_count
Configuration
Database Configuration
Configuration is loaded from env/config.ini:
[mysql]
host = 10.211.114.173
port = 3306
database = ase_lar
user = root
password = ****
Loading Configuration:
from refactory_scripts.config import DatabaseConfig
# Default: loads from env/config.ini, section [mysql]
db_config = DatabaseConfig()
# Custom file and section
db_config = DatabaseConfig(
config_file="/path/to/config.ini",
section="production_db"
)
# Access configuration
print(db_config.host)
print(db_config.database)
# Get as dict for aiomysql
conn_params = db_config.as_dict()
Utility Functions
Database Helpers
from refactory_scripts.utils import get_db_connection, execute_query, execute_many
# Get async database connection
conn = await get_db_connection(db_config.as_dict())
# Execute query with single result
result = await execute_query(
conn,
"SELECT * FROM table WHERE id = %s",
(123,),
fetch_one=True
)
# Execute query with multiple results
results = await execute_query(
conn,
"SELECT * FROM table WHERE status = %s",
("active",),
fetch_all=True
)
# Batch insert
rows = [(1, "a"), (2, "b"), (3, "c")]
count = await execute_many(
conn,
"INSERT INTO table (id, name) VALUES (%s, %s)",
rows
)
Retry Logic
from refactory_scripts.utils import retry_on_failure
# Retry with exponential backoff
result = await retry_on_failure(
some_async_function,
max_retries=3,
delay=1.0,
backoff=2.0,
arg1="value1",
arg2="value2"
)
DateTime Parsing
from refactory_scripts.utils import parse_datetime
# Parse ISO format
dt = parse_datetime("2024-10-11T14:30:00")
# Parse separate date and time
dt = parse_datetime("2024-10-11", "14:30:00")
# Parse date only
dt = parse_datetime("2024-10-11")
Logging
All loaders use Python's standard logging module:
import logging
# Configure logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
# Use in scripts
logger = logging.getLogger(__name__)
logger.info("Processing started")
logger.debug("Debug information")
logger.warning("Warning message")
logger.error("Error occurred", exc_info=True)
Log Levels:
DEBUG: Detailed diagnostic informationINFO: General informational messagesWARNING: Warning messages (non-critical issues)ERROR: Error messages with stack traces
Integration with Orchestrators
The refactored loaders can be easily integrated into the existing orchestrator system:
# In your orchestrator worker
from refactory_scripts.loaders import HirpiniaLoader
from refactory_scripts.config import DatabaseConfig
async def worker(worker_id: int, cfg: dict, pool: object) -> None:
db_config = DatabaseConfig()
async with HirpiniaLoader(db_config) as loader:
# Process files from queue
file_path = await get_next_file_from_queue()
success = await loader.process_file(file_path)
if success:
await mark_file_processed(file_path)
Migration from Legacy Scripts
Mapping Table
| Legacy Script | Refactored Module | Class Name |
|---|---|---|
hirpiniaLoadScript.py |
hirpinia_loader.py |
HirpiniaLoader |
vulinkScript.py |
vulink_loader.py |
VulinkLoader |
sisgeoLoadScript.py |
sisgeo_loader.py |
SisgeoLoader |
sorotecPini.py |
⏳ TODO | SorotecLoader |
TS_PiniScript.py |
⏳ TODO | TSPiniLoader |
Key Differences
-
Async/Await:
- Legacy:
conn = MySQLConnection(**db_config) - Refactored:
conn = await get_db_connection(db_config.as_dict())
- Legacy:
-
Error Handling:
- Legacy:
print('Error:', e) - Refactored:
logger.error(f"Error: {e}", exc_info=True)
- Legacy:
-
Configuration:
- Legacy:
read_db_config()returns dict - Refactored:
DatabaseConfig()returns object with validation
- Legacy:
-
Context Managers:
- Legacy: Manual connection management
- Refactored:
async with Loader(config) as loader:
Testing
Unit Tests (TODO)
# Run tests
pytest tests/test_refactory_scripts/
# Run with coverage
pytest --cov=refactory_scripts tests/
Manual Testing
# Set log level
export LOG_LEVEL=DEBUG
# Test Hirpinia loader
python -m refactory_scripts.loaders.hirpinia_loader /path/to/test.ods
# Test with Python directly
python3 << 'EOF'
import asyncio
from refactory_scripts.loaders import HirpiniaLoader
from refactory_scripts.config import DatabaseConfig
async def test():
db_config = DatabaseConfig()
async with HirpiniaLoader(db_config) as loader:
result = await loader.process_file("/path/to/file.ods")
print(f"Result: {result}")
asyncio.run(test())
EOF
Performance Considerations
Async Benefits
- Non-blocking I/O: Database operations don't block the event loop
- Concurrent Processing: Multiple files can be processed simultaneously
- Better Resource Utilization: CPU-bound operations can run during I/O waits
Batch Operations
- Use
execute_many()for bulk inserts (faster than individual INSERT statements) - Example: Hirpinia loader processes all rows in one batch operation
Connection Pooling
When integrating with orchestrators, reuse connection pools:
# Don't create new connections in loops
# ❌ Bad
for file in files:
async with HirpiniaLoader(db_config) as loader:
await loader.process_file(file)
# ✅ Good - reuse loader instance
async with HirpiniaLoader(db_config) as loader:
for file in files:
await loader.process_file(file)
Future Enhancements
Planned Improvements
- Complete refactoring of
sorotecPini.py - Complete refactoring of
TS_PiniScript.py - Add unit tests with pytest
- Add integration tests
- Implement CSV parsing for Vulink loader
- Add metrics and monitoring (Prometheus?)
- Add data validation schemas (Pydantic?)
- Implement retry policies for transient failures
- Add dry-run mode for testing
- Create CLI tool with argparse
Potential Features
- Data Validation: Use Pydantic models for input validation
- Metrics: Track processing times, error rates, etc.
- Dead Letter Queue: Handle permanently failed records
- Idempotency: Ensure repeated processing is safe
- Streaming: Process large files in chunks
Contributing
When adding new loaders:
- Follow the existing pattern (async context manager)
- Add comprehensive docstrings
- Include type hints
- Use the logging module
- Add error handling with context
- Update this README
- Add unit tests
Support
For issues or questions:
- Check logs with
LOG_LEVEL=DEBUG - Review the legacy script comparison
- Consult the main project documentation
Version History
v1.0.0 (2024-10-11)
- Initial refactored implementation
- HirpiniaLoader complete
- VulinkLoader complete (pending CSV parsing)
- SisgeoLoader complete
- Base utilities and configuration management
- Comprehensive documentation
License
Same as the main ASE project.