Eddy Covariance Flux Data Processing Workflow

Utah Geological Survey – Flux Monitoring Network

Based on the MicroMet/dugout processing notebooks Author: Diane Menuz | Compiled: March 2026 Station Reference Implementation: US-UTD (Dugout Ranch)


Table of Contents

  1. Overview and Pipeline Summary

  2. Prerequisites and Directory Structure

  3. Step 1 – Compile and Preprocess (Notebook 1)

  4. Step 2 – Create Raw Dataset (Notebook 2)

  5. Step 3 – Quality Control (Notebook 3)

  6. Step 3a – Variable Review (Notebook 3a)

  7. Step 3b – Plot Review (Notebook 3b)

  8. Step 4 – AmeriFlux Export (Notebook 4)

  9. Step 4b – AmeriFlux Plot Review (Notebook 4b)

  10. Step 5 – Flux QAQC with Energy Balance (Notebook 5)

  11. Key Libraries and Dependencies

  12. Data Flow Diagram

  13. Adapting the Workflow to Other Sites


Overview and Pipeline Summary

This document describes the end-to-end data processing workflow used by the Utah Geological Survey (UGS) Flux Monitoring Network to take raw eddy covariance and meteorological data from Campbell Scientific dataloggers and produce quality-controlled, AmeriFlux-formatted output files. The workflow is implemented as a series of numbered Jupyter Notebooks in the MicroMet/dugout directory, each handling a distinct stage of processing.

The pipeline follows a linear progression through six major stages, with review notebooks interspersed for interactive data exploration and visual quality assessment. Each notebook reads from the output of the previous stage and writes intermediate or final products as Parquet or CSV files to a shared Google Drive.

Pipeline Stages at a Glance

Step

Notebook

Purpose

Output

1

dugout1_compile_and_preprocess

Compile raw .dat files; preprocess met and eddy data

*_preprocessed.parquet (per data source)

2

dugout2_create_raw_data

Merge sources into single raw dataset; fix time shifts

*_raw.parquet

3

dugout3_qc_data

Apply corrections, physical limits, flags, and manual QC

*_qc.parquet

3a

dugout3a_variable_review

Interactive exploration of QC data (not modifying)

Diagnostic plots (PNG files)

3b

dugout3b_plot_review

Quick time-series plot of every variable

Visual review only

4

dugout4_ameriflux

Drop non-AmeriFlux columns; format and export

*_HH_*.csv (AmeriFlux upload)

4b

dugout4b_ameriflux_plots

Plot every variable in the final AmeriFlux file

Visual review only

5

dugout5_fluxqaqc

Energy balance closure analysis with flux-data-qaqc

EBR-corrected daily ET; HTML reports

Italicized rows indicate review-only notebooks that do not modify or export data. Any issues found during review should be addressed by adding corrections back into the appropriate upstream notebook (primarily Notebook 3).


Prerequisites and Directory Structure

Required Software and Libraries

  • Python 3.x with pandas, numpy, scipy, matplotlib, plotly, geopandas

  • MicroMet library (custom UGS package at micromet_path) providing: Reformatter, validate, interval_updates, file_compile, eddy_plots, data_cleaning, merge, columns, timestamps, fix_g_values, recalculate_albedo, variable_limits

  • soil_heat library (custom UGS package) providing: storage_calculations, soil_heat modules

  • flux-data-qaqc (fluxdataqaqc) – third-party package for energy balance correction

  • Supporting tools: prettytable, windrose, bokeh

Data Source Directory Structure

Raw data resides on a shared Google Drive under the path:

M:/Shared drives/UGS_Flux/Data_Downloads/compiled/{stationid}/

Each station folder must contain the following subdirectories and files:

  • {stationname}_Flux_AmeriFluxFormat.dat – AmeriFlux-format eddy file from EasyFlux

  • {stationname}_Flux_CSFormat.dat – Campbell Scientific format eddy file from EasyFlux

  • AmeriFluxFormat/ – folder of eddy data downloaded directly from the datalogger

  • Statistics_Ameriflux/ – folder of met data in AmeriFlux naming from the datalogger

  • Statistics/ – folder of met data from the datalogger (TOA5 format, may need Card Convert)

  • Flux_CSFormat/ – folder of CS-format eddy data from the datalogger

Output Directory Structure

M:/Shared drives/UGS_Flux/Data_Processing/final_database_tables/
  • raw/ – merged raw parquet files from Notebook 2

  • qc/ – quality-controlled parquet files from Notebook 3

  • ameriflux/ – final AmeriFlux CSV files from Notebook 4

  • micromet_reports/ – physical limit reports from the Reformatter.finalize() step

Supporting Data

  • flux-met_processing_variables_*.csv – master list of AmeriFlux variable names for validation

  • Database API at ugs-koop providing eddy_events (station visit notes, program updates) and eddy_station_metadata (install dates)

Site Dictionary

The notebooks use a station ID to folder name mapping:

Station ID

Folder Name

Site Name

US-UTD

Dugout_Ranch

Dugout Ranch

US-UTB

BSF

Bonneville Salt Flats

US-UTJ

Bluff

Bluff

US-UTW

Wellington

Wellington

US-UTE

Escalante

Escalante

US-UTM

Matheson

Matheson

US-UTP

Phrag

Phrag

US-CdM

Cedar_mesa

Cedar Mesa

US-UTV

Desert_View_Myton

Desert View Myton

US-UTN

Juab

Juab

US-UTG

Green_River

Green River

US-UTL

Pelican_Lake

Pelican Lake


Step 1 – Compile and Preprocess

Notebook: dugout1_compile_and_preprocess.ipynb Goal: Compile data files from multiple sources for a single station and run through the MicroMet preprocessing pipeline. Gap-fill where possible between overlapping data sources.

Parameters

  • interval: Measurement interval in minutes (30 or 60). Controls which records are retained via interval_updates.

  • stationid: AmeriFlux station identifier (e.g., US-UTD).

  • micromet_path: Path to the MicroMet library source code.

File Compilation

The first phase copies raw .dat files from individual download folders into an organized structure under the compiled/ directory. The file_compile.compile_files() function searches source folders by regex pattern and copies matching files into target subfolders. Data types compiled include:

  • Statistics_AmerifluxStatistics_AmeriFlux/

  • Statistics_\d+ (raw TOA5 format) → Statistics_Raw/ (requires Card Convert before use)

  • Flux_AmeriFluxFormatAmeriFluxFormat/

  • Flux_CSFormatFlux_CSFormat/

  • Operatn_Notes, Config_Setting_Notes, Flux_Notes → respective folders

After compilation of raw Statistics files, they must be manually converted using Campbell Scientific Card Convert, then compiled again from Statistics_Converted/ to Statistics/.

Met Data Preprocessing

Two parallel met data streams are processed through the preprocess_data() function:

Statistics Tables (TOA5 format)

  • Source folder: {stationid}/Statistics/

  • Glob pattern: TOA5*Statistics*.dat

  • Rows 0, 2, 3 are skipped (header metadata); -9999 and NAN treated as missing.

  • Column suffixes _Avg and _Tot are stripped from variable names.

  • Passed through micromet.Reformatter.preprocess() which standardizes naming and timestamps.

Statistics AmeriFlux Tables

  • Source folder: {stationid}/Statistics_Ameriflux/

  • Glob pattern: *Statistics_AmeriFlux*.dat

  • Already in AmeriFlux naming; no row skipping needed.

  • Leaf wetness columns are checked and renamed for consistency (e.g., LWMWET_1_1_2LEAF_WET_1_2_1).

  • Columns that are entirely NA are dropped (artifact columns from corrupted files).

Eddy Data Preprocessing

Two parallel eddy data streams are similarly processed:

AmeriFlux Format (from datalogger)

  • Source folder: {stationid}/AmeriFluxFormat/

  • Glob pattern: *Flux_AmeriFluxFormat*.dat

  • All-NA columns dropped. Variable names validated against AmeriFlux master list using validate.compare_names_to_ameriflux().

CS Format (from datalogger)

  • Source folder: {stationid}/Flux_CSFormat/

  • Glob pattern: *_Flux_CSFormat*.dat

  • Contains additional diagnostic variables not in AmeriFlux format (e.g., BOWEN_RATIO, ENERGY_CLOSURE, FC_MASS, various density and QC fields).

  • Specific columns are dropped (e.g., WS_RSLT, SONIC_AZIMUTH, SUN_ELEVATION) and others renamed (e.g., CS65X_EC_1_1_1EC_1_1_1, LI7700_AMB_TMPRTA_1_1_5).

Validation Checks

  • validate.compare_names_to_ameriflux() – flags any variable not in the AmeriFlux master list.

  • validate.validate_timestamp_consistency() – confirms DATETIME_END and TIMESTAMP_END agree.

  • Visual review via plotly interactive time series (ed_plot.plotlystuff).

  • Comparison plots overlay AmeriFlux and CS Format eddy data to identify coverage gaps.

Interval Subsetting

All datasets are passed through interval_updates.subset_interval() which filters to the target interval (30 or 60 minutes) based on a centralized dictionary (interval_update_dict). This handles sites that changed their reporting interval mid-record.

Outputs

Four Parquet files are exported per station to preprocessed_site_data/:

  • {stationid}_{interval}_metstats_preprocessed.parquet

  • {stationid}_{interval}_metstatsaf_preprocessed.parquet

  • {stationid}_{interval}_eddyaf_dl_preprocessed.parquet

  • {stationid}_{interval}_eddycsflux_dl_preprocessed.parquet


Step 2 – Create Raw Dataset

Notebook: dugout2_create_raw_data.ipynb Goal: Combine the four preprocessed data sources into a single raw dataset per station, resolving overlaps, fixing time shifts, and validating alignment.

Database Lookups

Station metadata and event logs are fetched from the UGS database API (ugs-koop) to retrieve:

  • Install date – used to drop any data preceding the station installation.

  • Station visit notes – provide context for data anomalies (e.g., bird nesting on sensors, IRGA zeroing events).

  • Program update notes – track changes to datalogger programs (sampling intervals, calibration factors, sonic azimuth updates).

Eddy Data Merging

Eddy data from AmeriFlux and CS Format sources are compared and merged:

  1. Read preprocessed Parquet files and pass through data_cleaning.prep_parquet() for index standardization.

  2. Run validate.data_diff_check() to compare values rounded to 3 decimal places. Small differences (<0.5%) are expected due to rounding. Larger differences (e.g., G_1_1_A, FETCH_90) are noted.

  3. The CS Format data contributes unique columns not found in the AmeriFlux format (G_PLATE values, additional diagnostic fields like FC_MASS, TKE, TSTAR, UX/UY/UZ components).

  4. merge.fillna_with_second_df() is used to fill gaps in the AmeriFlux data using CS Format values, prioritizing the AmeriFlux stream where both have data.

Met Data Merging

Met data from the two Statistics table sources are compared and merged with special attention to the SoilVue time-shift issue.

SoilVue Time-Shift Detection and Correction

A known issue exists where SoilVue sensor data in the AmeriFlux Statistics tables may be offset by one time step (30 minutes). This is detected using cross-correlation analysis (validate.review_lags()) and corrected by shifting the affected columns:

  • Columns affected: all EC_3_*, K_3_*, SWC_3_*, and TS_3_* variables (SoilVue profile data).

  • The shift is applied using pandas shift(freq='30min') on the identified columns only.

  • After shifting, lags are re-verified to confirm alignment (optimal lag should be 0).

  • Non-SoilVue met variables (NETRAD, WD, WS, radiation) are not shifted.

Time-Shift Between Met and Eddy

After merging met and eddy independently, the combined streams are checked for alignment using cross-correlation of NETRAD and WS between the two systems. If a systematic offset is detected (e.g., the met data was shifted by 1 hour for a portion of the record), the older portion of the met data is shifted to align.

Final Combination and Export

  • Duplicate columns between met and eddy (FILE_NAME, TIMESTAMP, T_NR) are renamed with _EDDY or _MET suffixes.

  • Derived columns (G_1_1_A, SG_1_1_A) are dropped since they will be recalculated later.

  • Timestamp integrity is verified: all records must fall on 30-minute boundaries.

  • Column suffixes are standardized using columns.create_suffix_map() to append _1_1_1 suffixes to variables that lack positional indices.

  • Data before the station install date is dropped.

Output: {stationid}_{timestart}_{timeend}_raw.parquet in final_database_tables/raw/


Step 3 – Quality Control

Notebook: dugout3_qc_data.ipynb Goal: Apply calibration corrections, physical limits, signal-strength flags, and manual data cleaning to produce a QC-level dataset.

Calibration Corrections

Site-specific calibration corrections are applied before running data through the MicroMet finalize step. Corrections are applied chronologically using date masks so that already-corrected data (post-program-update) is not double-corrected.

Soil Heat Flux Plate Calibration

  • The SG (storage) values had an incorrect soil thickness parameter (0.16 m instead of 0.05 m). Values before the program fix date are multiplied by the correction factor (0.05/0.16 = 0.3125) using fix_g_values.correct_vars_by_factor().

  • New G values are recalculated as G = SG + G_PLATE for each plate using fix_g_values.calculate_new_g_value().

  • G_PLATE_2 values were inverted for a period and must be multiplied by −1 before the correction date.

Precipitation Calibration

  • The tipping bucket calibration factor was incorrect (0.1 instead of 0.254). Values before the program fix date are multiplied by 2.54.

SoilVue G Calculation

A third soil heat flux estimate (G_3_1_1) is calculated from SoilVue temperature and moisture profiles using the soil_heat library:

  • soil_heat.storage_calculations.compute_soil_storage_integrated() – computes heat storage (SG_3_1_1) in the top 5 cm.

  • soil_heat.soil_heat.compute_heat_flux_conduction() – computes conductive heat flux at 5 cm depth using temperatures and moisture at 5 and 10 cm.

  • G_3_1_1 = SG_3_1_1 + G_SOILVUE

Important: SWC values must be in proportions (0–1), not percent, at this stage. They are converted to percent during the finalize step.

MicroMet Finalize

The micromet.Reformatter.finalize() function applies the following:

  • Converts SWC from fraction to percent (multiply by 100).

  • Scales SSITC test values to standard 0/1/2 encoding.

  • Applies physical limits based on variable type (e.g., temperature ranges, radiation bounds). Values outside limits are set to NaN/−9999.

  • Reorders columns to match AmeriFlux conventions.

  • Produces a report CSV listing each variable, its limits, and the count/percentage of values flagged.

The report should be reviewed for variables with high flag percentages (e.g., >5%) which may indicate sensor issues.

Common Data Issue Corrections

After the finalize step, several categories of manual corrections are applied:

Precipitation on Field Days

Precipitation events coinciding with station visits are reviewed. Spurious values caused by sensor maintenance (e.g., tipping the bucket during a sensor swap) are set to zero. Genuine rain events on visit days are preserved.

Ground Heat Flux Plate Zeros

G_PLATE values of exactly 0 are set to NaN along with their corresponding G values, as zeros typically indicate sensor disconnection rather than zero flux.

Soil Data Spikes

SoilVue data spikes are identified where K_3_7_1 drops below a threshold (e.g., 3.5). All EC_3, K_3, SWC_3, and TS_3 columns are set to NaN for the affected timestamps.

Wind Direction Corrections

  • Sonic azimuth was recorded as 217° but measured as 227°. The 10° difference is added to WD_1_1_1 for all data before the program update date.

  • The Young anemometer (WD_1_1_2) was offset from the IRGASON by approximately 80°. This offset is subtracted.

Miscellaneous Corrections

  • Barometric pressure spikes on specific dates set to NaN.

  • Albedo set to NaN where SW_IN or SW_OUT are missing (prevents misleading calculated albedo).

  • Early analog data issues (first few days after install) – radiation and soil variables set to NaN.

  • Leaf wetness sensor #2 failure after a known date – all LWM/LEAF_WET_1_2_1 values set to NaN.

  • Footprint distance outliers capped (FP_DIST_INTRST > 1000 m and UPWND_DIST_INTRST < 180 m set to NaN).

  • Temperature spikes on specific dates set to NaN.

Signal Strength Flagging

Custom quality flags are created for H2O and CO2 based on IRGA signal strength:

H2O Signal Flag (H2O_SIG_FLAG_1_1_1)

  • Flag = 0: signal strength ≥ 0.8 (good)

  • Flag = 1: signal strength < 0.8 (marginal)

  • Flag = 2: within a known continuous low-signal stretch (bad). These periods are identified from site visit logs and applied using data_cleaning.apply_internal_flags().

CO2 Signal Flag (CO2_SIG_FLAG_1_1_1)

Same structure as the H2O flag, using the same known bad periods (typically both gases degrade simultaneously from window contamination).

Wind Direction Flag (WD_1_1_1_FLAG)

  • Flag = 0: wind from the expected direction (32°–212°)

  • Flag = 1: wind from behind the tower or from known obstruction directions

Outputs

  • Exported file: {stationid}_{daterange}_qc.parquet in final_database_tables/qc/

  • Includes derived columns: day_of_year, time_of_day, days_since_20240101 (for coloring regression plots).


Step 3a – Variable Review

Notebook: dugout3a_variable_review.ipynb Goal: Interactive exploration of the QC dataset to evaluate sensor agreement, data quality, and identify any remaining issues. This notebook does not modify data; corrections should be added back to Notebook 3.

Review Categories

Net Radiation and Radiation Components

  • Comparison of NETRAD, SW_IN, SW_OUT, LW_IN, LW_OUT between instruments 1 and 2 (CNR4 on eddy system vs. NR01 on met mast).

  • Daily-mean regression to detect drift over time.

  • Studentized residual plots to flag outlier days.

  • PPFD_IN vs. SW_IN regression to verify PAR sensor consistency.

  • Check for NETRAD records where component values are missing.

Albedo

  • ALB_1_1_1 vs. ALB_1_1_2 regression colored by SW_IN, day of year, and time of day.

  • Check for albedo values where SW_IN or SW_OUT is missing.

Wind Speed and Direction

  • Time-lag detection between IRGASON and Young anemometer using cross-correlation.

  • Linear regression of WS_1_1_1 vs. WS_1_1_2 colored by time.

  • Monthly wind rose plots for both instruments to verify directional consistency.

Soil Heat Flux

  • Comparison of G_1_1_1, G_2_1_1, G_3_1_1 (two heat flux plates plus SoilVue-derived).

  • SWC, TS, G_PLATE, SG intercomparisons.

  • Daily mean regression for G values to assess plate-to-plate and plate-to-SoilVue agreement.

Temperature

  • Comparison of five temperature sources: T_SONIC, TA_1_1_1 (EC100), TA_1_2_1 (sonic-derived), TA_1_3_1 (EE08 aspirated), TA_1_4_1 (secondary aspirated).

  • Regression colored by H2O signal strength to detect IRGA-related temperature bias.

Relative Humidity and VPD

  • Three RH sources compared; regression colored by H2O signal strength.

  • Separate analysis for low-signal vs. high-signal periods and flag=2 vs. flag<2 periods.

Energy Balance Closure

  • Calculated as (H + LE) vs. (NETRAD G) using G_3_1_1.

  • Daily closure analysis with filtering by record completeness (48/48 half-hours), signal strength, and flag status.

CO2 and H2O Concentrations

  • Time series of CO2_1_1_1, FC_1_1_1 colored by signal strength.

  • Review of flagged periods to confirm they capture low-quality data.


Step 3b – Plot Review

Notebook: dugout3b_plot_review.ipynb Goal: Generate a quick interactive time-series plot of every variable in the QC dataset for a visual sweep. This serves as a rapid check for remaining spikes, gaps, or artifacts before exporting to AmeriFlux format.

The notebook iterates over all columns (excluding FILE_NAME, stationid) and calls ed_plot.plotlystuff() for each, producing interactive plotly figures with range sliders. It can be run on either the raw or qc data level by changing the level parameter.


Step 4 – AmeriFlux Export

Notebook: dugout4_ameriflux.ipynb Goal: Prepare the QC dataset for submission to the AmeriFlux network by dropping non-standard variables, applying final signal-strength filters, formatting timestamps, and exporting to CSV.

Signal Strength Filtering

Variables derived from IRGA measurements are set to NaN where signal strength is below 0.8:

  • H2O signal < 0.8: H2O_1_1_1, H2O_SIGMA, LE_1_1_1, RH_1_1_1, RH_1_2_1, VPD_1_1_1, ET_1_1_1 set to NaN.

  • CO2 signal < 0.8: CO2_1_1_1, CO2_SIGMA, FC_1_1_1 set to NaN.

Column Cleanup

  1. Drop columns that are entirely NaN.

  2. Compare all remaining column names against the AmeriFlux variable list. Columns not in the list are flagged for removal.

  3. Manually review the drop list to confirm no needed variables are lost. PBLH_F is kept despite not matching the standard list.

  4. Additional columns dropped: internal flags (WD_1_1_1_FLAG), file names, diagnostic fields (FETCH_MAX, FETCH_90, ZL), stationid, temporal helper columns.

Remaining Variables

The final dataset retains approximately 80 variables including: flux variables (FC, LE, H, TAU with SSITC tests), radiation (SW/LW IN/OUT for two instruments, NETRAD, ALB, PPFD_IN), temperature (4 air temp sources, T_SONIC, T_CANOPY), humidity (3 RH sources, VPD), soil (G from 3 sources, SG from 3 sources, SWC at 9 SoilVue depths plus 2 CS probes, TS at 9 depths plus 2 CS probes), wind (WS and WD from 2 instruments, WS_MAX, U/V/W sigmas, USTAR, WD_SIGMA), and other (PA, P, MO_LENGTH, CO2/H2O concentrations and sigmas, LEAF_WET, PBLH_F).

Final Formatting

  • NaN values are replaced with −9999 (AmeriFlux missing value convention).

  • TIMESTAMP_START and TIMESTAMP_END are recalculated from the datetime index using timestamps.add_ameriflux_timestamps() in YYYYMMDDHHmm format.

  • The start timestamp is verified against the initial AmeriFlux submission to ensure continuity.

Output

File: {stationid}_HH_{timestamp_start}_{timestamp_end}.csv

The HH prefix indicates half-hourly data. This file is ready for upload to the AmeriFlux data portal.


Step 4b – AmeriFlux Plot Review

Notebook: dugout4b_ameriflux_plots.ipynb Goal: Final visual review of the exported AmeriFlux CSV file. Every variable is plotted as an interactive time series to verify that the exported data looks correct and complete.

This reads the AmeriFlux CSV back in (converting −9999 to NaN for plotting), sets the datetime index, and loops through all columns generating plotly figures. This is the last visual check before submission.


Step 5 – Flux QAQC with Energy Balance

Notebook: dugout5_fluxqaqc.ipynb (in fluxqaqc/ subfolder) Goal: Run the AmeriFlux data through the flux-data-qaqc package to perform energy balance ratio (EBR) correction, gap-fill ET, and produce daily summaries and diagnostic reports.

Gap-Filling Redundant Sensors

Before running the QAQC, missing values in key variables are imputed using linear regression between redundant sensors:

  • NETRAD_1_1_1 gaps filled from NETRAD_1_1_2 (and vice versa) with R² ≈ 0.988.

  • G_1_1_A (mean of G plates 1 and 2) is computed where both are available. Gaps are filled from G_3_1_1 (SoilVue-derived) with R² ≈ 0.589, and vice versa.

  • Lag between G_1_1_A and G_3_1_1 is verified (optimal lag of 2 periods noted).

Flux-Data-QAQC Configuration

The analysis uses .ini configuration files that specify:

  • Which column to use for Rn (net radiation) – e.g., NETRAD_1_1_1_FINAL

  • Which column to use for G (ground heat flux) – e.g., G_1_1_A_FINAL

  • Column mappings for LE, H, air temperature, wind speed, VPD, SWC, etc.

QAQC Processing

The QaQc class processes the data with the following settings:

  • daily_frac = 1: days with any missing sub-daily measurements are dropped.

  • max_interp_hours = 2: maximum gap length to interpolate during daytime (Rn ≥ 0).

  • max_interp_hours_night = 4: maximum gap length to interpolate during nighttime (Rn < 0).

  • Method: Energy Balance Ratio (EBR) correction applied to LE.

  • ET gap-filling: uses filtered ETrF multiplied by gridMET reference ET (ETr).

Seasonal Analysis

The notebook subsets data by year and season (growing season: April 1 – October 31; winter: November 1 – March 31) to examine energy balance closure and ET patterns for individual periods.

Sensitivity Testing

Multiple runs are performed with different input combinations to evaluate sensitivity:

  • NETRAD_1_1_1_FINAL vs. NETRAD_1_1_2_FINAL as the Rn input.

  • G_1_1_A_FINAL vs. G_3_1_1_FINAL as the G input.

Each combination produces a separate HTML report for comparison.

Outputs

  • HTML diagnostic reports with interactive bokeh plots showing daily energy balance components, closure ratios, and ET.

  • Monthly status summaries showing counts of good data, gap-filled ET, and missing ET days.

  • Optional CSV export of daily corrected data.


Key Libraries and Dependencies

Library

Role in Pipeline

micromet

Core processing library: Reformatter (preprocess, finalize), validate, merge, data_cleaning, file_compile, interval_updates, columns, timestamps, fix_g_values, recalculate_albedo, eddy_plots, variable_limits

soil_heat

SoilVue-derived ground heat flux: storage_calculations, soil_heat modules

fluxdataqaqc

Energy balance ratio correction, ET gap-filling, daily aggregation (Data, QaQc, Plot classes)

pandas

DataFrame operations, time-series indexing, Parquet/CSV I/O

numpy

Numerical operations, NaN handling

scipy

Statistical analysis (cross-correlation, linear regression)

plotly

Interactive time-series and scatter plots with range sliders

matplotlib

Static regression plots, wind roses

bokeh

Interactive plots in flux-data-qaqc HTML reports

windrose

Monthly wind rose generation in variable review

prettytable

Formatted display of station visit notes and program updates

requests

REST API calls to UGS database for metadata and events


Data Flow Diagram

Raw .dat files (Met Statistics + Met AmeriFlux Stats + Eddy AmeriFlux + Eddy CSFormat)
    |
    v  [Notebook 1: Compile & Preprocess]
4x *_preprocessed.parquet
    |
    v  [Notebook 2: Create Raw Data - merge, fix time shifts, validate alignment]
*_raw.parquet
    |
    v  [Notebook 3: QC - calibrations, physical limits, flags, manual corrections]
*_qc.parquet
    |
    +---> [Notebook 3a: Variable Review - regression, wind roses, closure]
    +---> [Notebook 3b: Plot Review - all variables time series]
    |     (feedback loop: corrections go back to Notebook 3)
    |
    v  [Notebook 4: AmeriFlux Export - drop cols, signal filter, format timestamps]
*_HH_*.csv  (AmeriFlux submission file)
    |
    +---> [Notebook 4b: AmeriFlux Plot Review - final visual check]
    |
    v  [Notebook 5: Flux QAQC - EBR correction, ET gap-fill, sensitivity tests]
EBR-corrected daily ET  +  HTML diagnostic reports

Adapting the Workflow to Other Sites

The dugout notebooks serve as a template that can be copied and adapted for other UGS flux sites. The key site-specific elements that must be updated for each station are:

Parameters to Update

  • stationid – the AmeriFlux station code (e.g., US-UTW for Wellington).

  • interval – may differ between sites or change over time.

  • date_range – reflects the start/end timestamps of the available data.

Site-Specific Corrections (Notebook 3)

Each site will have its own set of corrections that must be determined from station visit logs, program update records, and data review. Common categories include:

  • Calibration correction dates and factors (soil, precipitation, etc.).

  • Sensor failure periods and the specific variables to null.

  • Wind direction offset between instruments.

  • Signal strength bad-period date ranges.

  • Manual spike removal dates.

Column Handling (Notebooks 2 and 4)

The specific columns selected for merging from the CS Format eddy data (csflux_join_cols in Notebook 2) and the columns dropped before AmeriFlux export (Notebook 4) may vary by site depending on what sensors are installed.

Flux QAQC Configuration (Notebook 5)

Each site needs its own .ini configuration file specifying column mappings for the flux-data-qaqc package. The choice of which NETRAD and G columns to use as primary inputs may also differ.