Flux Data Processing – Workflow Summary

Utah Geological Survey · Flux Monitoring Network Station reference implementation: US-UTD (Dugout Ranch)

This page provides a high-level overview of the end-to-end eddy covariance data processing pipeline. For full technical detail on each step, see the complete workflow reference.


Workflow Flowchart

Flux data processing workflow flowchart

Pipeline at a Glance

The pipeline converts raw Campbell Scientific .dat files into quality-controlled, AmeriFlux-formatted output through six processing steps and three review checkpoints.

Step

Notebook

Input

Key Operations

Output

1

dugout1_compile_and_preprocess

Raw .dat files

Compile files · run Reformatter.preprocess() · validate timestamps · subset interval

4 × *_preprocessed.parquet

2

dugout2_create_raw_data

Preprocessed parquets

Merge eddy & met · fix SoilVue time-shift · align met–eddy · standardise columns

*_raw.parquet

3

dugout3_qc_data

*_raw.parquet

Calibration corrections · Reformatter.finalize() · manual QC · signal flags

*_qc.parquet

4

dugout4_ameriflux

*_qc.parquet

Signal-strength filter · drop non-AmeriFlux cols · format timestamps · fill −9999

*_HH_*.csv

5

dugout5_fluxqaqc

*_HH_*.csv

Gap-fill NETRAD/G · EBR correction · ET gap-fill (gridMET) · sensitivity tests

Daily ET + HTML reports

Review notebooks (read-only — findings feed corrections back into Step 3):

Notebook

When to run

Purpose

3a · Variable Review

After Step 3

Regression, wind roses, energy-balance closure, sensor intercomparison

3b · Plot Review

After Step 3

Quick time-series sweep of every variable

4b · AmeriFlux Plot Review

After Step 4

Final visual check before AmeriFlux submission


Step-by-Step Summary

Step 1 · Compile & Preprocess

→ Full details

Four data streams are assembled from the shared drive and run through micromet.Reformatter.preprocess():

  • Met Statistics (TOA5 format) — strips _Avg/_Tot suffixes, standardises timestamps

  • Met AmeriFlux Statistics — renames leaf-wetness columns, drops all-NA artefact columns

  • Eddy AmeriFlux Format — validates against AmeriFlux master variable list

  • Eddy CS Format — renames Campbell-specific columns, adds diagnostic variables absent from AmeriFlux format

All streams are filtered to the target interval (30 or 60 min) via interval_updates.subset_interval().

Outputs: {stationid}_{interval}_{source}_preprocessed.parquet × 4


Step 2 · Create Raw Dataset

→ Full details

The four preprocessed streams are merged into one coherent dataset:

  1. Eddy merge — AmeriFlux format is the primary stream; CS Format fills gaps and supplies unique columns (G_PLATE, FC_MASS, TKE, TSTAR, wind components).

  2. SoilVue time-shift — SoilVue profile columns (EC_3_*, K_3_*, SWC_3_*, TS_3_*) in the AmeriFlux Statistics table are often offset by 30 min. Cross-correlation via validate.review_lags() detects the lag; shift(freq='30min') corrects it.

  3. Met–eddy alignmentvalidate.detect_sectional_offsets_indexed() checks for systematic offsets between NETRAD and WS across the two systems; any detected shift is corrected.

  4. Column cleanup — duplicates renamed (FILE_NAME_EDDY, FILE_NAME_MET); derived columns dropped; _1_1_1 suffixes applied; data before install date removed.

Output: {stationid}_{start}_{end}_raw.parquet


Step 3 · Quality Control

→ Full details

The largest and most site-specific step. Key operations in sequence:

  1. Calibration corrections (date-gated using program-update records)

    • Soil heat flux storage: incorrect layer thickness (0.16 m → 0.05 m) corrected by factor 0.3125

    • G_PLATE_2 sign inversion corrected for affected period

    • Tipping bucket precipitation: calibration factor 0.1 → 0.254 (×2.54)

  2. SoilVue G calculationsoil_heat library computes SG_3_1_1 and conductive flux; G_3_1_1 = SG_3_1_1 + G_SOILVUE

  3. MicroMet finalize — converts SWC fraction → percent; applies physical limits; standardises SSITC encoding; produces limit report

  4. Manual corrections — field-day precip cleanup, G_PLATE zeros, SoilVue spikes, wind-direction offsets, pressure/temperature spikes, footprint outliers, failed sensor nulling

  5. Signal-strength flagsH2O_SIG_FLAG_1_1_1 and CO2_SIG_FLAG_1_1_1 (0 = good / 1 = marginal / 2 = known bad period); WD_1_1_1_FLAG for tower obstruction sector

Output: {stationid}_{daterange}_qc.parquet


Review: 3a & 3b

→ 3a details · → 3b details

Run after Step 3 to evaluate data quality. Any issues found here are resolved by adding new correction blocks in Notebook 3 and re-running Steps 3–5.

3a covers: radiation intercomparison, albedo, wind speed/direction regression, soil heat flux plate vs. SoilVue, temperature sensor agreement, RH/VPD signal-strength stratification, energy balance closure.

3b: iterates over all columns and produces an interactive Plotly time-series for each.


Step 4 · AmeriFlux Export

→ Full details

Converts the QC parquet into an AmeriFlux-ready half-hourly CSV:

  • IRGA-derived variables (LE, H2O, CO2, RH, ET) set to NaN where signal strength < 0.8

  • Non-AmeriFlux columns dropped (internal flags, diagnostic fields, temporal helpers)

  • NaN → −9999; timestamps regenerated in YYYYMMDDHHmm format

  • Final file retains ~80 variables across flux, radiation, temperature, humidity, soil, and wind categories

Output: {stationid}_HH_{start}_{end}.csv


Step 5 · Flux QAQC

→ Full details

Runs flux-data-qaqc to perform Energy Balance Ratio (EBR) correction and ET gap-filling:

  • Redundant sensors (NETRAD_1_1_1 / NETRAD_1_1_2; G_1_1_A / G_3_1_1) are cross-regressed to fill gaps before passing to QAQC

  • EBR correction applied to LE; corrected ET gap-filled using ETrF × gridMET ETr

  • Sensitivity runs with different Rn and G input combinations produce separate HTML reports for comparison

Outputs: EBR-corrected daily ET, HTML diagnostic reports, optional daily CSV


Key Libraries

Library

Role

micromet

Core pipeline: Reformatter, validate, merge, data_cleaning, fix_g_values, timestamps, columns, interval_updates, eddy_plots

soil_heat

SoilVue-derived ground heat flux (storage_calculations, soil_heat)

fluxdataqaqc

EBR correction, ET gap-fill (Data, QaQc, Plot)

pandas / numpy

Data wrangling and array operations

scipy

Cross-correlation and linear regression

plotly / bokeh

Interactive diagnostics and HTML reports


Directory Structure

M:/Shared drives/UGS_Flux/
├── Data_Downloads/compiled/{stationid}/     ← raw .dat source files
│   ├── Statistics/
│   ├── Statistics_Ameriflux/
│   ├── AmeriFluxFormat/
│   └── Flux_CSFormat/
└── Data_Processing/final_database_tables/   ← processed outputs
    ├── raw/          *_raw.parquet
    ├── qc/           *_qc.parquet
    └── ameriflux/    *_HH_*.csv

Adapting to Other Sites

Copy the dugout notebooks and update:

  1. stationid, interval — station code and measurement interval

  2. Calibration correction dates and factors in Notebook 3

  3. Sensor failure date ranges and affected variable lists

  4. Wind direction offset between sonic and Young anemometer

  5. Signal-strength bad-period date ranges

  6. csflux_join_cols in Notebook 2 (site-dependent sensor array)

  7. .ini config for flux-data-qaqc (Notebook 5)

See the full workflow document for detailed guidance.