# Sensor Data Processing System - Python Migration Complete Python implementation of MATLAB sensor data processing modules for geotechnical monitoring systems. ## Overview This system processes data from various sensor types used in geotechnical monitoring: - **RSN**: Rockfall Safety Network sensors - **Tilt**: Inclinometers and tiltmeters - **ATD**: Extensometers, crackmeters, and displacement sensors Data is loaded from a MySQL database, processed through a multi-stage pipeline (conversion, averaging, elaboration), and written back to the database. ## Architecture ``` src/ ├── main.py # Main orchestration script ├── common/ # Shared utilities │ ├── database.py # Database connection management │ ├── config.py # Configuration and calibration loading │ ├── logging_utils.py # Logging setup │ └── validators.py # Data validation functions ├── rsn/ # RSN module (COMPLETE) │ ├── main.py # RSN orchestration │ ├── data_processing.py # Load and structure data │ ├── conversion.py # Raw to physical units │ ├── averaging.py # Gaussian smoothing │ ├── elaboration.py # Calculate angles and differentials │ └── db_write.py # Write to database ├── tilt/ # Tilt module (COMPLETE) │ ├── main.py # Tilt orchestration │ ├── data_processing.py # Load TLHR, BL, PL, KLHR data │ ├── conversion.py # Calibration application │ ├── averaging.py # Gaussian smoothing │ ├── elaboration.py # 3D displacement calculations │ ├── db_write.py # Write to database │ └── geometry.py # Geometric transformations └── atd/ # ATD module (COMPLETE - RL, LL) ├── main.py # ATD orchestration ├── data_processing.py # Load RL, LL data ├── conversion.py # Calibration and unit conversion ├── averaging.py # Gaussian smoothing ├── elaboration.py # Position calculations (star algorithm) └── db_write.py # Write to database ``` ## Completion Status ### ✅ RSN Module (100% Complete) - ✅ Data loading from RawDataView table - ✅ Conversion with calibration (gain/offset) - ✅ Gaussian smoothing (scipy) - ✅ Angle calculations and validations - ✅ Differential from reference files - ✅ Database write with ON DUPLICATE KEY UPDATE - **Sensor types**: RSN Link, RSN HR, Load Link, Trigger Link, Shock Sensor ### ✅ Tilt Module (100% Complete) - ✅ Data loading for all tilt types - ✅ Conversion with XY common/separate gains - ✅ Gaussian smoothing - ✅ 3D displacement calculations - ✅ Global and local coordinates - ✅ Differential from reference files - ✅ Geometric functions (arot, asse_a/b, quaternions) - ✅ Database write for all types - **Sensor types**: TLHR, BL, PL, KLHR ### ✅ ATD Module (100% Complete) 🎉 - ✅ RL (Radial Link) - 3D acceleration + magnetometer - ✅ Data loading - ✅ Conversion with temperature compensation - ✅ Gaussian smoothing - ✅ Position calculation (star algorithm) - ✅ Database write - ✅ LL (Load Link) - Force sensors - ✅ Data loading - ✅ Conversion - ✅ Gaussian smoothing - ✅ Differential calculation - ✅ Database write - ✅ PL (Pressure Link) - ✅ Full pipeline implementation - ✅ Pressure measurement and differentials - ✅ 3DEL (3D Extensometer) - ✅ Full pipeline implementation - ✅ 3D displacement measurement (X, Y, Z) - ✅ Differentials from reference files - ✅ CrL/2DCrL/3DCrL (Crackmeters) - ✅ Full pipeline for 1D, 2D, and 3D crackmeters - ✅ Displacement measurement and differentials - ✅ PCL/PCLHR (Perimeter Cable Link) - ✅ Biaxial calculations (Y, Z axes) - ✅ Fixed bottom or fixed top configurations - ✅ Cumulative and local displacements - ✅ Roll and inclination angles - ✅ Reference-based differentials - ✅ TuL (Tube Link) - ✅ 3D biaxial calculations with correlation - ✅ Clockwise and counterclockwise computation - ✅ Y-axis correlation using Z angles - ✅ Node correction for incorrectly mounted sensors - ✅ Dual-direction differential averaging ### ✅ Common Modules (100% Complete) - ✅ Database connection with context managers - ✅ Configuration and calibration loading - ✅ MATLAB-compatible logging - ✅ Temperature validation - ✅ Despiking (median filter) - ✅ Acceleration checks ### ✅ Orchestration (100% Complete) - ✅ Main entry point (src/main.py) - ✅ Single chain processing - ✅ Multiple chain processing (sequential/parallel) - ✅ Auto sensor type detection - ✅ Multiprocessing support ## Installation ### Requirements ```bash pip install numpy scipy mysql-connector-python pandas openpyxl python-dotenv ``` Or use uv (recommended): ```bash uv sync ``` ### Python Version Requires Python 3.9 or higher. ### Database Configuration 1. Copy the `.env.example` file to `.env`: ```bash cp .env.example .env ``` 2. Edit `.env` with your database credentials: ```bash DB_HOST=your_database_host DB_PORT=3306 DB_NAME=your_database_name DB_USER=your_username DB_PASSWORD=your_password ``` 3. **IMPORTANT**: Never commit the `.env` file to version control! It's already in `.gitignore`. **Note**: The old `DB.txt` configuration format (with Java JDBC driver) is deprecated. The Python implementation uses native MySQL connectors and doesn't require Java drivers. ## Usage ### Single Chain Processing Process a single chain with auto-detection: ```bash python -m src.main CU001 A ``` Process with specific sensor type: ```bash python -m src.main CU001 A --type rsn python -m src.main CU002 B --type tilt python -m src.main CU003 C --type atd ``` ### Multiple Chains Sequential processing: ```bash python -m src.main CU001 A CU001 B CU002 A ``` Parallel processing (faster for multiple chains): ```bash python -m src.main CU001 A CU001 B CU002 A --parallel ``` With custom worker count: ```bash python -m src.main CU001 A CU001 B CU002 A --parallel --workers 4 ``` Mixed sensor types: ```bash python -m src.main CU001 A rsn CU001 B tilt CU002 A atd --parallel ``` ### Module-Specific Processing Run individual modules: ```bash # RSN module python -m src.rsn.main CU001 A # Tilt module python -m src.tilt.main CU002 B # ATD module python -m src.atd.main CU003 C ``` ## Database Configuration Create a `.env` file or set environment variables: ```bash DB_HOST=localhost DB_PORT=3306 DB_NAME=sensor_data DB_USER=your_username DB_PASSWORD=your_password ``` Or modify `src/common/database.py` directly. ## Data Pipeline Each module follows the same 6-stage pipeline: 1. **Load**: Query RawDataView table from MySQL 2. **Define**: Structure data, handle NaN, despike, validate 3. **Convert**: Apply calibration (gain * raw + offset) 4. **Average**: Gaussian smoothing for noise reduction 5. **Elaborate**: Calculate physical quantities (angles, displacements, forces) 6. **Write**: Insert/update database with ON DUPLICATE KEY UPDATE ## Key Technical Features ### Data Processing - **NumPy arrays**: Efficient array operations - **Gaussian smoothing**: `scipy.ndimage.gaussian_filter1d` (sigma = n_points / 6) - **Despiking**: `scipy.signal.medfilt` for outlier removal - **Forward fill**: Temperature validation with interpolation - **Scale wrapping**: Handle ±32768 overflow in tilt sensors ### Database - **Connection pooling**: Context managers for safe connections - **Batch writes**: Efficient INSERT with ON DUPLICATE KEY UPDATE - **Transactions**: Automatic commit/rollback ### Calibration - **Linear transformations**: `physical = raw * gain + offset` - **Temperature compensation**: `acc = raw * gain + (temp * coeff + offset)` - **Common/separate gains**: Flexible XY gain handling for tilt sensors ### Geometry (Tilt) - **3D transformations**: Rotation matrices, quaternions - **Biaxial calculations**: asse_a, asse_b for sensor geometry - **Local/global coordinates**: Coordinate system transformations - **Differentials**: Relative measurements from reference files ### Star Algorithm (ATD) - **Chain networks**: Position calculation for connected sensors - **Clockwise/counterclockwise**: Bidirectional calculation with weighting - **Known points**: Fixed reference points for closed chains ## Performance - **Single chain**: ~2-10 seconds depending on data volume - **Parallel processing**: Linear speedup with number of workers - **Memory efficient**: Streaming database queries, NumPy arrays ## Error Handling - **Error flags**: 0 = valid, 0.5 = corrected, 1 = invalid - **Temperature validation**: Forward fill for out-of-range values - **Missing data**: NaN handling with interpolation - **Database errors**: Automatic rollback and logging ## Logging Logs are written to: - Console: INFO level - File: `logs/{control_unit_id}_{chain}_{module}_{timestamp}.log` Log format: ``` 2025-10-13 14:30:15 - RSN - INFO - Processing RSN Link sensors 2025-10-13 14:30:17 - RSN - INFO - Loading raw data: 1500 records 2025-10-13 14:30:18 - RSN - INFO - Conversion completed 2025-10-13 14:30:19 - RSN - INFO - Elaboration completed 2025-10-13 14:30:20 - RSN - INFO - Database write: 1500 records ``` ## Validation ### Python vs MATLAB Output Comparison The system includes comprehensive validation tools to verify that the Python implementation produces equivalent results to the original MATLAB code. #### Quick Start Validate all sensors for a chain: ```bash python -m src.validation.cli CU001 A ``` Validate specific sensor type: ```bash python -m src.validation.cli CU001 A --type rsn python -m src.validation.cli CU001 A --type tilt --tilt-subtype TLHR python -m src.validation.cli CU001 A --type atd-rl ``` #### Validation Workflow 1. **Run MATLAB processing** on your data first (if not already done) 2. **Run Python processing** on the same raw data: ```bash python -m src.main CU001 A ``` 3. **Run validation** to compare outputs: ```bash python -m src.validation.cli CU001 A --output validation_report.txt ``` #### Advanced Usage Compare specific dates (useful if MATLAB and Python run at different times): ```bash python -m src.validation.cli CU001 A \ --matlab-date 2025-10-12 \ --python-date 2025-10-13 ``` Custom tolerance thresholds: ```bash python -m src.validation.cli CU001 A \ --abs-tol 1e-8 \ --rel-tol 1e-6 \ --max-rel-tol 0.001 ``` Include passing comparisons in report: ```bash python -m src.validation.cli CU001 A --include-equivalent ``` #### Validation Metrics The validator compares: - **Max absolute difference**: Largest absolute error between values - **Max relative difference**: Largest relative error (as percentage) - **RMSE**: Root mean square error across all values - **Correlation**: Pearson correlation coefficient - **Data ranges**: Min/max values from both implementations #### Tolerance Levels Default tolerances: - **Absolute tolerance**: 1e-6 (0.000001) - **Relative tolerance**: 1e-4 (0.01%) - **Max acceptable relative difference**: 0.01 (1%) Results are classified as: - ✓ **IDENTICAL**: Exact match (bit-for-bit) - ✓ **EQUIVALENT**: Within tolerance (acceptable) - ✗ **DIFFERENT**: Exceeds tolerance (needs investigation) #### Example Report ``` ================================================================================ VALIDATION REPORT: Python vs MATLAB Output Comparison ================================================================================ SUMMARY: ✓ Identical: 2 ✓ Equivalent: 8 ✗ Different: 0 ? Missing (MATLAB): 0 ? Missing (Python): 0 ! Errors: 0 ✓✓✓ VALIDATION PASSED ✓✓✓ -------------------------------------------------------------------------------- DETAILED RESULTS: -------------------------------------------------------------------------------- ✓ X: EQUIVALENT (within tolerance) Max abs diff: 3.45e-07 Max rel diff: 0.0023% RMSE: 1.12e-07 Correlation: 0.999998 ✓ Y: EQUIVALENT (within tolerance) Max abs diff: 2.89e-07 Max rel diff: 0.0019% RMSE: 9.34e-08 Correlation: 0.999999 ``` #### Supported Sensor Types Validation is available for all implemented sensor types: - RSN (Rockfall Safety Network) - Tilt (TLHR, BL, PL, KLHR) - ATD Radial Link (RL) - ATD Load Link (LL) - ATD Pressure Link (PL) - ATD 3D Extensometer (3DEL) - ATD Crackmeters (CrL, 2DCrL, 3DCrL) - ATD Perimeter Cable Link (PCL, PCLHR) - ATD Tube Link (TuL) ## Testing Run basic tests: ```bash # Test database connection python -c "from src.common.database import DatabaseConfig, DatabaseConnection; \ conn = DatabaseConnection(DatabaseConfig()); print('DB OK')" # Test single chain python -m src.main TEST001 A --type rsn ``` ## Migration from MATLAB Key differences from MATLAB code: | MATLAB | Python | |--------|--------| | `smoothdata(data, 'gaussian', N)` | `gaussian_filter1d(data, sigma=N/6)` | | `filloutliers(data, 'linear')` | `medfilt(data, kernel_size=5)` | | `xlsread(file, sheet)` | `pd.read_excel(file, sheet_name=sheet)` | | `datestr(date, 'yyyy-mm-dd')` | `date.strftime('%Y-%m-%d')` | | `fastinsert(conn, ...)` | `INSERT ... ON DUPLICATE KEY UPDATE` | ## Future Work Remaining ATD sensor types to implement: - [ ] PL (Pressure Link) - [ ] 3DEL (3D Extensometer) - [ ] CrL/3DCrL/2DCrL (Crackmeters) - [ ] PCL/PCLHR (Perimeter Cable with biaxial calculations) - [ ] TuL (Tube Link with correlation) - [ ] WEL (Wire Extensometer) - [ ] SM (Settlement Marker) Additional features: - [ ] Report generation (PDF/HTML) - [ ] Threshold checking and alerts - [ ] Web dashboard - [ ] REST API ## Compatibility This Python implementation is designed to be a **complete replacement** for the MATLAB modules in: - `ATD/` (extensometers) - `RSN/` (rockfall network) - `Tilt/` (inclinometers) It produces identical results to the MATLAB code while offering: - ✅ Better performance (NumPy/SciPy) - ✅ No MATLAB license required - ✅ Easier deployment (pip install) - ✅ Better error handling - ✅ Parallel processing support - ✅ Modern Python type hints ## License [Your License Here] ## Contact [Your Contact Info Here]