Eddy Covariance Flux Data Processing Workflow
Utah Geological Survey – Flux Monitoring Network
Overview
This document describes the end-to-end data processing workflow used by the Utah Geological Survey (UGS) Flux Monitoring Network to convert raw eddy covariance and meteorological data into quality-controlled, AmeriFlux-formatted output. The workflow is implemented as a series of numbered Jupyter Notebooks in the docs/notebooks/ directory, each handling a distinct processing stage.
The pipeline progresses linearly through five major stages with review notebooks interspersed for visual quality assessment. Each notebook reads from the output of the previous stage and writes intermediate or final products as Parquet or CSV files.
Pipeline at a Glance
Step |
Notebook |
Purpose |
Output |
|---|---|---|---|
1 |
|
Create compiled and clean versions of data from each data stream (e.g., CSFlux web/datalogger, AmeriFlux eddy web/datalogger, MetStats, MetAF) |
|
2 |
|
Assemble multiple preprocessed files into final datasets and manage any datetime shifts |
|
3 |
|
Apply calibrations, physical limits, and manual QC |
|
3a |
|
Summary statistics and distribution analysis of QC data |
Diagnostic output |
3b |
|
Interactive time-series plots of every variable |
Visual review only |
4 |
|
Signal-strength filter, drop non-AmeriFlux columns, format and export |
|
4b |
|
Plot every variable in the final AmeriFlux file |
Visual review only |
5 |
|
Energy balance closure analysis and ET gap-filling |
Daily ET + HTML reports |
Italicized rows are review-only notebooks that do not modify data. Issues found during review should be corrected in the appropriate upstream notebook (primarily Notebook 3).
Prerequisites
Software and Libraries
Python 3.x with pandas, numpy, scipy, plotly, bokeh
micromet – core UGS processing library (
Reformatter,validate,merge,data_cleaning,columns,timestamps,fix_g_values,eddy_plots,interval_updates)soil_heat – SoilVue-derived ground heat flux calculations
fluxdataqaqc – energy balance ratio correction and ET gap-filling
Supporting:
prettytable,requests
Data Sources
Preprocessed Parquet files from the compilation stage (CSFlux, AmeriFlux eddy, MetStats, MetAF – both web and datalogger variants)
AmeriFlux variable naming reference CSV (
flux-met_processing_variables_*.csv)UGS database API providing station metadata, visit notes, and program update history
Directory Structure
M:/Shared drives/UGS_Flux/
├── Data_Downloads/compiled/
│ ├── preprocessed_site_data/ ← preprocessed parquets per source
│ └── {stationid}/ ← raw .dat files by station
└── Data_Processing/final_database_tables/
├── raw/ *_raw.parquet
├── qc/ *_qc.parquet
└── ameriflux/ *_HH_*.csv
Step 1 – Compile and Preprocess
Notebook: 1_compile_and_preprocess.ipynb
Create compiled and clean versions of data from each data stream (e.g., CSFlux web/datalogger, AmeriFlux eddy web/datalogger, MetStats, MetAF)
Key Operations
Organize datalogger files into a single directory by table name (e.g., Statistics_Ameriflux, Statistics, Flux_AmerifluxFormat, Flux_CSFormat)
Compile data from a single data stream into a dataframe
Clean data by applying renaming dictionary, setting data types, and fixing timestamp issue
Subset out data to only include 30 or 60 minute data, depending on user input
Present data for review to identify misnamed columns and missing data
Export data into separate parquet files for data from each data stream
Configuration
station– AmeriFlux station IDinterval– measurement interval in minutes (30 or 60)Paths to datalogger files and files from EasyFlux Web
Output
{station}_{interval}_{datatype}_preprocessed.parquet in preprocessed_site_data/
Step 2 – Create Raw Dataset
Notebook: 2_create_raw_data.ipynb
Assemble multiple preprocessed files into final datasets and manage any datetime shifts
Load preprocessed parquets for each data stream (CSFlux web/datalogger, AmeriFlux eddy web/datalogger, MetStats, MetAF)
Compare and merge eddy data – CSFlux and AmeriFlux eddy streams are compared for differences; the AmeriFlux stream is primary, with CSFlux filling gaps and providing unique columns (e.g.,
G_PLATE, diagnostic fields)Compare and merge met data – MetStats and MetAF streams are compared and combined
Detect and correct temporal shifts – SoilVue sensor data (
EC_3_*,K_3_*,SWC_3_*,TS_3_*) may be offset by one time step; cross-correlation detects the lag and a frequency shift corrects it. Historical timestamp misalignments are also identified and corrected.Combine eddy and met – merge the two streams, resolve duplicate columns, validate 30-minute interval integrity
Standardize column naming – apply AmeriFlux positional suffixes (
_1_1_1,_1_1_2, etc.)Trim to station record – drop data before the station install date (retrieved from the database API)
Step 3 – Quality Control
Notebook: 3_qc_data.ipynb
The largest and most site-specific step. Applies corrections, physical limits, and quality flags to produce a QC-level dataset.
Key Operations
Retrieve station metadata – query the database API for station visit notes and program update history to inform date-gated corrections
Apply calibration corrections – site-specific fixes applied before the program update date to avoid double-correction:
Soil heat flux storage (SG) thickness correction
Precipitation tipping bucket calibration factor
NR01 calibration factors
Rename G Variables to calculate surface G and create G_1 for Ameriflux
Calculate SoilVue-derived ground heat flux – use the
soil_heatlibrary (Johansen thermal properties model) to computeG_SURFACE_3_1_1from SoilVue temperature and moisture profilesApply physical limits via
Reformatter.finalize():Converts SWC from fraction to percent
Applies range limits by variable type (out-of-range values set to NaN)
Standardizes SSITC encoding
Produces a limit report CSV for review
Manual corrections – address site-specific data issues:
Spurious precipitation on station visit days
G_PLATE zeros (sensor disconnection)
Despike data
Sensor failure
Address any wind direction offsets
Signal-strength flagging:
H2O_SIG_FLAGandCO2_SIG_FLAG: 0 = good (signal >= 0.8), 1 = marginal (< 0.8), 2 = known bad periodWD_1_1_1_FLAG: flags wind from behind the tower or obstruction sectors
Gap-fill ground heat flux – Calculate G_1 as average between heat flux plates and gap-fill with linear regression
Output
{station}_{daterange}_qc.parquetinfinal_database_tables/qc/{station}_{daterange}_report.csv– finalization report showing flagged percentages per variable
Step 3a – Variable Review
Notebook: 3a_variable_review.ipynb
Read-only review of the QC dataset. Generates time series plots and scatter plots between redundant sensors for visual evaluation of data quality. Focuses primarily on net radiation and components, wind speed and direction, soil heat flux and component variables, temperature, and relative humidity. Also looks at closure and the relationship between closure and signal strength. Any issues found should be corrected by adding blocks to Notebook 3.
Step 3b – Plot Review
Notebook: 3b_plot_review.ipynb
Iterates over all columns in the QC (or raw) dataset and generates an interactive Plotly time-series plot for each. Provides a rapid visual sweep for remaining spikes, gaps, step changes, or artifacts. Can be run at either the raw or qc level by changing the level parameter.
Step 4 – AmeriFlux Export
Notebook: 4_ameriflux.ipynb
Converts the QC dataset into an AmeriFlux-compliant half-hourly CSV file.
Key Operations
Signal-strength filtering – IRGA-derived variables set to NaN where signal strength < 0.8:
H2O signal:
H2O,H2O_SIGMA,LE,RH,VPD,ETCO2 signal:
CO2,CO2_SIGMA,FC
Column cleanup – drop all-NaN columns, remove non-AmeriFlux variables (validated against the master variable list), drop internal flags and diagnostic fields
Format for submission – replace NaN with -9999, recalculate
TIMESTAMP_STARTandTIMESTAMP_ENDinYYYYMMDDHHmmformat
Output
{station}_HH_{timestamp_start}_{timestamp_end}.csv – ready for AmeriFlux upload
Step 4b – AmeriFlux Plot Review
Notebook: 4b_ameriflux_plots.ipynb
Final visual check before AmeriFlux submission. Reads the exported CSV (converting -9999 back to NaN), then plots every variable as an interactive time series. This is the last review checkpoint before upload.
Step 5 – Flux QAQC
Notebook: 5_fluxqaqc.ipynb
Runs the fluxdataqaqc package to perform energy balance ratio (EBR) correction, gap-fill ET, and produce diagnostic reports.
Key Operations
Gap-fill redundant sensors – use linear regression to cross-fill between redundant NETRAD sources (e.g.,
NETRAD_1_1_1/NETRAD_1_1_2), creating*_FINALcolumnsRun FluxDataQAQC with
.iniconfiguration files mapping columns to Rn, G, LE, H, etc.:daily_frac = 1(require complete days)max_interp_hours = 2(daytime) /max_interp_hours_night = 4(nighttime)EBR correction method applied to LE
ET gap-filling using ETrF x gridMET reference ET
Seasonal and annual subsetting – analyze energy balance closure and ET by year and season (growing season: Apr 1 – Oct 31; winter: Nov 1 – Mar 31)
Sensitivity testing – run multiple configurations with different NETRAD and G inputs to compare results
Outputs
HTML diagnostic reports with interactive bokeh plots
Monthly summaries of ET data availability (good, gap-filled, missing)
Optional daily corrected CSV export
Data Flow Diagram
Raw data logger and Easyflux Web files
|
v [Notebook 1: Compile and Preprocess CSFlux + AmeriFlux Eddy + MetStats + MetAF]
*_preprocessed.parquet
|
v [Notebook 2: Merge by Data Stream and Data Types]
*_raw.parquet
|
v [Notebook 3: QC -- calibrations, physical limits, flags, manual corrections]
*_qc.parquet
|
+---> [Notebook 3a: Variable Review -- statistics, distributions]
+---> [Notebook 3b: Plot Review -- time-series sweep]
| (feedback loop: corrections go back to Notebook 3)
|
v [Notebook 4: AmeriFlux Export -- signal filter, format, export]
*_HH_*.csv (AmeriFlux submission file)
|
+---> [Notebook 4b: AmeriFlux Plot Review -- final visual check]
|
v [Notebook 5: Flux QAQC -- EBR correction, ET gap-fill]
EBR-corrected daily ET + HTML diagnostic reports
Adapting the Workflow to Other Sites
Each station needs its own copy of the notebooks with the following site-specific elements updated:
Parameters –
stationID,interval, anddate_rangeCalibration corrections (Notebook 3) – dates, factors, and affected variables determined from station visit logs and program updates
Sensor failure periods – date ranges and variables to null
Wind direction offsets – instrument-specific azimuth corrections
Signal-strength bad periods – date ranges for known IRGA contamination
Column selection (Notebooks 1/2 and 4) – varies by station sensor array
FluxDataQAQC config (Notebook 5) –
.inifile with column mappings for the site’s sensor configuration