# Flux Data Processing – Workflow Summary **Utah Geological Survey · Flux Monitoring Network** Station reference implementation: **US-UTD (Dugout Ranch)** This page provides a high-level overview of the end-to-end eddy covariance data processing pipeline. For full technical detail on each step, see the [complete workflow reference](flux_processing_workflow.md). --- ## Workflow Flowchart ```{image} _static/flux_workflow_flowchart.png :alt: Flux data processing workflow flowchart :align: center :width: 100% ``` --- ## Pipeline at a Glance The pipeline converts raw Campbell Scientific `.dat` files into quality-controlled, AmeriFlux-formatted output through six processing steps and three review checkpoints. | Step | Notebook | Input | Key Operations | Output | |------|----------|-------|---------------|--------| | [**1**](flux_processing_workflow.md#step-1--compile-and-preprocess) | `dugout1_compile_and_preprocess` | Raw `.dat` files | Compile files · run `Reformatter.preprocess()` · validate timestamps · subset interval | 4 × `*_preprocessed.parquet` | | [**2**](flux_processing_workflow.md#step-2--create-raw-dataset) | `dugout2_create_raw_data` | Preprocessed parquets | Merge eddy & met · fix SoilVue time-shift · align met–eddy · standardise columns | `*_raw.parquet` | | [**3**](flux_processing_workflow.md#step-3--quality-control) | `dugout3_qc_data` | `*_raw.parquet` | Calibration corrections · `Reformatter.finalize()` · manual QC · signal flags | `*_qc.parquet` | | [**4**](flux_processing_workflow.md#step-4--ameriflux-export) | `dugout4_ameriflux` | `*_qc.parquet` | Signal-strength filter · drop non-AmeriFlux cols · format timestamps · fill −9999 | `*_HH_*.csv` | | [**5**](flux_processing_workflow.md#step-5--flux-qaqc-with-energy-balance) | `dugout5_fluxqaqc` | `*_HH_*.csv` | Gap-fill NETRAD/G · EBR correction · ET gap-fill (gridMET) · sensitivity tests | Daily ET + HTML reports | **Review notebooks** (read-only — findings feed corrections back into Step 3): | Notebook | When to run | Purpose | |----------|-------------|---------| | [**3a** · Variable Review](flux_processing_workflow.md#step-3a--variable-review) | After Step 3 | Regression, wind roses, energy-balance closure, sensor intercomparison | | [**3b** · Plot Review](flux_processing_workflow.md#step-3b--plot-review) | After Step 3 | Quick time-series sweep of every variable | | [**4b** · AmeriFlux Plot Review](flux_processing_workflow.md#step-4b--ameriflux-plot-review) | After Step 4 | Final visual check before AmeriFlux submission | --- ## Step-by-Step Summary ### Step 1 · Compile & Preprocess [→ Full details](flux_processing_workflow.md#step-1--compile-and-preprocess) Four data streams are assembled from the shared drive and run through `micromet.Reformatter.preprocess()`: - **Met Statistics** (TOA5 format) — strips `_Avg`/`_Tot` suffixes, standardises timestamps - **Met AmeriFlux Statistics** — renames leaf-wetness columns, drops all-NA artefact columns - **Eddy AmeriFlux Format** — validates against AmeriFlux master variable list - **Eddy CS Format** — renames Campbell-specific columns, adds diagnostic variables absent from AmeriFlux format All streams are filtered to the target interval (30 or 60 min) via `interval_updates.subset_interval()`. **Outputs:** `{stationid}_{interval}_{source}_preprocessed.parquet` × 4 --- ### Step 2 · Create Raw Dataset [→ Full details](flux_processing_workflow.md#step-2--create-raw-dataset) The four preprocessed streams are merged into one coherent dataset: 1. **Eddy merge** — AmeriFlux format is the primary stream; CS Format fills gaps and supplies unique columns (`G_PLATE`, `FC_MASS`, `TKE`, `TSTAR`, wind components). 2. **SoilVue time-shift** — SoilVue profile columns (`EC_3_*`, `K_3_*`, `SWC_3_*`, `TS_3_*`) in the AmeriFlux Statistics table are often offset by 30 min. Cross-correlation via `validate.review_lags()` detects the lag; `shift(freq='30min')` corrects it. 3. **Met–eddy alignment** — `validate.detect_sectional_offsets_indexed()` checks for systematic offsets between `NETRAD` and `WS` across the two systems; any detected shift is corrected. 4. **Column cleanup** — duplicates renamed (`FILE_NAME_EDDY`, `FILE_NAME_MET`); derived columns dropped; `_1_1_1` suffixes applied; data before install date removed. **Output:** `{stationid}_{start}_{end}_raw.parquet` --- ### Step 3 · Quality Control [→ Full details](flux_processing_workflow.md#step-3--quality-control) The largest and most site-specific step. Key operations in sequence: 1. **Calibration corrections** (date-gated using program-update records) - Soil heat flux storage: incorrect layer thickness (0.16 m → 0.05 m) corrected by factor 0.3125 - `G_PLATE_2` sign inversion corrected for affected period - Tipping bucket precipitation: calibration factor 0.1 → 0.254 (×2.54) 2. **SoilVue G calculation** — `soil_heat` library computes `SG_3_1_1` and conductive flux; `G_3_1_1 = SG_3_1_1 + G_SOILVUE` 3. **MicroMet finalize** — converts SWC fraction → percent; applies physical limits; standardises SSITC encoding; produces limit report 4. **Manual corrections** — field-day precip cleanup, G_PLATE zeros, SoilVue spikes, wind-direction offsets, pressure/temperature spikes, footprint outliers, failed sensor nulling 5. **Signal-strength flags** — `H2O_SIG_FLAG_1_1_1` and `CO2_SIG_FLAG_1_1_1` (0 = good / 1 = marginal / 2 = known bad period); `WD_1_1_1_FLAG` for tower obstruction sector **Output:** `{stationid}_{daterange}_qc.parquet` --- ### Review: 3a & 3b [→ 3a details](flux_processing_workflow.md#step-3a--variable-review) · [→ 3b details](flux_processing_workflow.md#step-3b--plot-review) Run after Step 3 to evaluate data quality. Any issues found here are resolved by adding new correction blocks in Notebook 3 and re-running Steps 3–5. **3a covers:** radiation intercomparison, albedo, wind speed/direction regression, soil heat flux plate vs. SoilVue, temperature sensor agreement, RH/VPD signal-strength stratification, energy balance closure. **3b:** iterates over all columns and produces an interactive Plotly time-series for each. --- ### Step 4 · AmeriFlux Export [→ Full details](flux_processing_workflow.md#step-4--ameriflux-export) Converts the QC parquet into an AmeriFlux-ready half-hourly CSV: - IRGA-derived variables (LE, H2O, CO2, RH, ET) set to NaN where signal strength < 0.8 - Non-AmeriFlux columns dropped (internal flags, diagnostic fields, temporal helpers) - NaN → −9999; timestamps regenerated in `YYYYMMDDHHmm` format - Final file retains ~80 variables across flux, radiation, temperature, humidity, soil, and wind categories **Output:** `{stationid}_HH_{start}_{end}.csv` --- ### Step 5 · Flux QAQC [→ Full details](flux_processing_workflow.md#step-5--flux-qaqc-with-energy-balance) Runs `flux-data-qaqc` to perform Energy Balance Ratio (EBR) correction and ET gap-filling: - Redundant sensors (`NETRAD_1_1_1` / `NETRAD_1_1_2`; `G_1_1_A` / `G_3_1_1`) are cross-regressed to fill gaps before passing to QAQC - EBR correction applied to LE; corrected ET gap-filled using ETrF × gridMET ETr - Sensitivity runs with different Rn and G input combinations produce separate HTML reports for comparison **Outputs:** EBR-corrected daily ET, HTML diagnostic reports, optional daily CSV --- ## Key Libraries | Library | Role | |---------|------| | [**micromet**](micromet.rst) | Core pipeline: `Reformatter`, `validate`, `merge`, `data_cleaning`, `fix_g_values`, `timestamps`, `columns`, `interval_updates`, `eddy_plots` | | **soil_heat** | SoilVue-derived ground heat flux (`storage_calculations`, `soil_heat`) | | **fluxdataqaqc** | EBR correction, ET gap-fill (`Data`, `QaQc`, `Plot`) | | **pandas / numpy** | Data wrangling and array operations | | **scipy** | Cross-correlation and linear regression | | **plotly / bokeh** | Interactive diagnostics and HTML reports | --- ## Directory Structure ``` M:/Shared drives/UGS_Flux/ ├── Data_Downloads/compiled/{stationid}/ ← raw .dat source files │ ├── Statistics/ │ ├── Statistics_Ameriflux/ │ ├── AmeriFluxFormat/ │ └── Flux_CSFormat/ └── Data_Processing/final_database_tables/ ← processed outputs ├── raw/ *_raw.parquet ├── qc/ *_qc.parquet └── ameriflux/ *_HH_*.csv ``` --- ## Adapting to Other Sites Copy the dugout notebooks and update: 1. `stationid`, `interval` — station code and measurement interval 2. Calibration correction dates and factors in Notebook 3 3. Sensor failure date ranges and affected variable lists 4. Wind direction offset between sonic and Young anemometer 5. Signal-strength bad-period date ranges 6. `csflux_join_cols` in Notebook 2 (site-dependent sensor array) 7. `.ini` config for flux-data-qaqc (Notebook 5) See the [full workflow document](flux_processing_workflow.md#adapting-the-workflow-to-other-sites) for detailed guidance.