Eddy Covariance Flux Data Processing Workflow
Utah Geological Survey – Flux Monitoring Network
Based on the MicroMet/dugout processing notebooks Author: Diane Menuz | Compiled: March 2026 Station Reference Implementation: US-UTD (Dugout Ranch)
Table of Contents
Overview and Pipeline Summary
This document describes the end-to-end data processing workflow used by the Utah Geological Survey (UGS) Flux Monitoring Network to take raw eddy covariance and meteorological data from Campbell Scientific dataloggers and produce quality-controlled, AmeriFlux-formatted output files. The workflow is implemented as a series of numbered Jupyter Notebooks in the MicroMet/dugout directory, each handling a distinct stage of processing.
The pipeline follows a linear progression through six major stages, with review notebooks interspersed for interactive data exploration and visual quality assessment. Each notebook reads from the output of the previous stage and writes intermediate or final products as Parquet or CSV files to a shared Google Drive.
Pipeline Stages at a Glance
Step |
Notebook |
Purpose |
Output |
|---|---|---|---|
1 |
|
Compile raw .dat files; preprocess met and eddy data |
|
2 |
|
Merge sources into single raw dataset; fix time shifts |
|
3 |
|
Apply corrections, physical limits, flags, and manual QC |
|
3a |
|
Interactive exploration of QC data (not modifying) |
Diagnostic plots (PNG files) |
3b |
|
Quick time-series plot of every variable |
Visual review only |
4 |
|
Drop non-AmeriFlux columns; format and export |
|
4b |
|
Plot every variable in the final AmeriFlux file |
Visual review only |
5 |
|
Energy balance closure analysis with flux-data-qaqc |
EBR-corrected daily ET; HTML reports |
Italicized rows indicate review-only notebooks that do not modify or export data. Any issues found during review should be addressed by adding corrections back into the appropriate upstream notebook (primarily Notebook 3).
Prerequisites and Directory Structure
Required Software and Libraries
Python 3.x with pandas, numpy, scipy, matplotlib, plotly, geopandas
MicroMet library (custom UGS package at
micromet_path) providing:Reformatter,validate,interval_updates,file_compile,eddy_plots,data_cleaning,merge,columns,timestamps,fix_g_values,recalculate_albedo,variable_limitssoil_heat library (custom UGS package) providing:
storage_calculations,soil_heatmodulesflux-data-qaqc (
fluxdataqaqc) – third-party package for energy balance correctionSupporting tools:
prettytable,windrose,bokeh
Data Source Directory Structure
Raw data resides on a shared Google Drive under the path:
M:/Shared drives/UGS_Flux/Data_Downloads/compiled/{stationid}/
Each station folder must contain the following subdirectories and files:
{stationname}_Flux_AmeriFluxFormat.dat– AmeriFlux-format eddy file from EasyFlux{stationname}_Flux_CSFormat.dat– Campbell Scientific format eddy file from EasyFluxAmeriFluxFormat/– folder of eddy data downloaded directly from the dataloggerStatistics_Ameriflux/– folder of met data in AmeriFlux naming from the dataloggerStatistics/– folder of met data from the datalogger (TOA5 format, may need Card Convert)Flux_CSFormat/– folder of CS-format eddy data from the datalogger
Output Directory Structure
M:/Shared drives/UGS_Flux/Data_Processing/final_database_tables/
raw/– merged raw parquet files from Notebook 2qc/– quality-controlled parquet files from Notebook 3ameriflux/– final AmeriFlux CSV files from Notebook 4micromet_reports/– physical limit reports from theReformatter.finalize()step
Supporting Data
flux-met_processing_variables_*.csv– master list of AmeriFlux variable names for validationDatabase API at
ugs-koopprovidingeddy_events(station visit notes, program updates) andeddy_station_metadata(install dates)
Site Dictionary
The notebooks use a station ID to folder name mapping:
Station ID |
Folder Name |
Site Name |
|---|---|---|
US-UTD |
Dugout_Ranch |
Dugout Ranch |
US-UTB |
BSF |
Bonneville Salt Flats |
US-UTJ |
Bluff |
Bluff |
US-UTW |
Wellington |
Wellington |
US-UTE |
Escalante |
Escalante |
US-UTM |
Matheson |
Matheson |
US-UTP |
Phrag |
Phrag |
US-CdM |
Cedar_mesa |
Cedar Mesa |
US-UTV |
Desert_View_Myton |
Desert View Myton |
US-UTN |
Juab |
Juab |
US-UTG |
Green_River |
Green River |
US-UTL |
Pelican_Lake |
Pelican Lake |
Step 1 – Compile and Preprocess
Notebook: dugout1_compile_and_preprocess.ipynb
Goal: Compile data files from multiple sources for a single station and run through the MicroMet preprocessing pipeline. Gap-fill where possible between overlapping data sources.
Parameters
interval: Measurement interval in minutes (30 or 60). Controls which records are retained viainterval_updates.stationid: AmeriFlux station identifier (e.g.,US-UTD).micromet_path: Path to the MicroMet library source code.
File Compilation
The first phase copies raw .dat files from individual download folders into an organized structure under the compiled/ directory. The file_compile.compile_files() function searches source folders by regex pattern and copies matching files into target subfolders. Data types compiled include:
Statistics_Ameriflux→Statistics_AmeriFlux/Statistics_\d+(raw TOA5 format) →Statistics_Raw/(requires Card Convert before use)Flux_AmeriFluxFormat→AmeriFluxFormat/Flux_CSFormat→Flux_CSFormat/Operatn_Notes,Config_Setting_Notes,Flux_Notes→ respective folders
After compilation of raw Statistics files, they must be manually converted using Campbell Scientific Card Convert, then compiled again from Statistics_Converted/ to Statistics/.
Met Data Preprocessing
Two parallel met data streams are processed through the preprocess_data() function:
Statistics Tables (TOA5 format)
Source folder:
{stationid}/Statistics/Glob pattern:
TOA5*Statistics*.datRows 0, 2, 3 are skipped (header metadata);
-9999andNANtreated as missing.Column suffixes
_Avgand_Totare stripped from variable names.Passed through
micromet.Reformatter.preprocess()which standardizes naming and timestamps.
Statistics AmeriFlux Tables
Source folder:
{stationid}/Statistics_Ameriflux/Glob pattern:
*Statistics_AmeriFlux*.datAlready in AmeriFlux naming; no row skipping needed.
Leaf wetness columns are checked and renamed for consistency (e.g.,
LWMWET_1_1_2→LEAF_WET_1_2_1).Columns that are entirely NA are dropped (artifact columns from corrupted files).
Eddy Data Preprocessing
Two parallel eddy data streams are similarly processed:
AmeriFlux Format (from datalogger)
Source folder:
{stationid}/AmeriFluxFormat/Glob pattern:
*Flux_AmeriFluxFormat*.datAll-NA columns dropped. Variable names validated against AmeriFlux master list using
validate.compare_names_to_ameriflux().
CS Format (from datalogger)
Source folder:
{stationid}/Flux_CSFormat/Glob pattern:
*_Flux_CSFormat*.datContains additional diagnostic variables not in AmeriFlux format (e.g.,
BOWEN_RATIO,ENERGY_CLOSURE,FC_MASS, various density and QC fields).Specific columns are dropped (e.g.,
WS_RSLT,SONIC_AZIMUTH,SUN_ELEVATION) and others renamed (e.g.,CS65X_EC_1_1_1→EC_1_1_1,LI7700_AMB_TMPR→TA_1_1_5).
Validation Checks
validate.compare_names_to_ameriflux()– flags any variable not in the AmeriFlux master list.validate.validate_timestamp_consistency()– confirmsDATETIME_ENDandTIMESTAMP_ENDagree.Visual review via plotly interactive time series (
ed_plot.plotlystuff).Comparison plots overlay AmeriFlux and CS Format eddy data to identify coverage gaps.
Interval Subsetting
All datasets are passed through interval_updates.subset_interval() which filters to the target interval (30 or 60 minutes) based on a centralized dictionary (interval_update_dict). This handles sites that changed their reporting interval mid-record.
Outputs
Four Parquet files are exported per station to preprocessed_site_data/:
{stationid}_{interval}_metstats_preprocessed.parquet{stationid}_{interval}_metstatsaf_preprocessed.parquet{stationid}_{interval}_eddyaf_dl_preprocessed.parquet{stationid}_{interval}_eddycsflux_dl_preprocessed.parquet
Step 2 – Create Raw Dataset
Notebook: dugout2_create_raw_data.ipynb
Goal: Combine the four preprocessed data sources into a single raw dataset per station, resolving overlaps, fixing time shifts, and validating alignment.
Database Lookups
Station metadata and event logs are fetched from the UGS database API (ugs-koop) to retrieve:
Install date – used to drop any data preceding the station installation.
Station visit notes – provide context for data anomalies (e.g., bird nesting on sensors, IRGA zeroing events).
Program update notes – track changes to datalogger programs (sampling intervals, calibration factors, sonic azimuth updates).
Eddy Data Merging
Eddy data from AmeriFlux and CS Format sources are compared and merged:
Read preprocessed Parquet files and pass through
data_cleaning.prep_parquet()for index standardization.Run
validate.data_diff_check()to compare values rounded to 3 decimal places. Small differences (<0.5%) are expected due to rounding. Larger differences (e.g.,G_1_1_A,FETCH_90) are noted.The CS Format data contributes unique columns not found in the AmeriFlux format (
G_PLATEvalues, additional diagnostic fields likeFC_MASS,TKE,TSTAR,UX/UY/UZcomponents).merge.fillna_with_second_df()is used to fill gaps in the AmeriFlux data using CS Format values, prioritizing the AmeriFlux stream where both have data.
Met Data Merging
Met data from the two Statistics table sources are compared and merged with special attention to the SoilVue time-shift issue.
SoilVue Time-Shift Detection and Correction
A known issue exists where SoilVue sensor data in the AmeriFlux Statistics tables may be offset by one time step (30 minutes). This is detected using cross-correlation analysis (validate.review_lags()) and corrected by shifting the affected columns:
Columns affected: all
EC_3_*,K_3_*,SWC_3_*, andTS_3_*variables (SoilVue profile data).The shift is applied using pandas
shift(freq='30min')on the identified columns only.After shifting, lags are re-verified to confirm alignment (optimal lag should be 0).
Non-SoilVue met variables (NETRAD, WD, WS, radiation) are not shifted.
Time-Shift Between Met and Eddy
After merging met and eddy independently, the combined streams are checked for alignment using cross-correlation of NETRAD and WS between the two systems. If a systematic offset is detected (e.g., the met data was shifted by 1 hour for a portion of the record), the older portion of the met data is shifted to align.
Final Combination and Export
Duplicate columns between met and eddy (
FILE_NAME,TIMESTAMP,T_NR) are renamed with_EDDYor_METsuffixes.Derived columns (
G_1_1_A,SG_1_1_A) are dropped since they will be recalculated later.Timestamp integrity is verified: all records must fall on 30-minute boundaries.
Column suffixes are standardized using
columns.create_suffix_map()to append_1_1_1suffixes to variables that lack positional indices.Data before the station install date is dropped.
Output: {stationid}_{timestart}_{timeend}_raw.parquet in final_database_tables/raw/
Step 3 – Quality Control
Notebook: dugout3_qc_data.ipynb
Goal: Apply calibration corrections, physical limits, signal-strength flags, and manual data cleaning to produce a QC-level dataset.
Calibration Corrections
Site-specific calibration corrections are applied before running data through the MicroMet finalize step. Corrections are applied chronologically using date masks so that already-corrected data (post-program-update) is not double-corrected.
Soil Heat Flux Plate Calibration
The SG (storage) values had an incorrect soil thickness parameter (0.16 m instead of 0.05 m). Values before the program fix date are multiplied by the correction factor (0.05/0.16 = 0.3125) using
fix_g_values.correct_vars_by_factor().New G values are recalculated as
G = SG + G_PLATEfor each plate usingfix_g_values.calculate_new_g_value().G_PLATE_2values were inverted for a period and must be multiplied by −1 before the correction date.
Precipitation Calibration
The tipping bucket calibration factor was incorrect (0.1 instead of 0.254). Values before the program fix date are multiplied by 2.54.
SoilVue G Calculation
A third soil heat flux estimate (G_3_1_1) is calculated from SoilVue temperature and moisture profiles using the soil_heat library:
soil_heat.storage_calculations.compute_soil_storage_integrated()– computes heat storage (SG_3_1_1) in the top 5 cm.soil_heat.soil_heat.compute_heat_flux_conduction()– computes conductive heat flux at 5 cm depth using temperatures and moisture at 5 and 10 cm.G_3_1_1 = SG_3_1_1 + G_SOILVUE
Important: SWC values must be in proportions (0–1), not percent, at this stage. They are converted to percent during the finalize step.
MicroMet Finalize
The micromet.Reformatter.finalize() function applies the following:
Converts SWC from fraction to percent (multiply by 100).
Scales SSITC test values to standard 0/1/2 encoding.
Applies physical limits based on variable type (e.g., temperature ranges, radiation bounds). Values outside limits are set to NaN/−9999.
Reorders columns to match AmeriFlux conventions.
Produces a report CSV listing each variable, its limits, and the count/percentage of values flagged.
The report should be reviewed for variables with high flag percentages (e.g., >5%) which may indicate sensor issues.
Common Data Issue Corrections
After the finalize step, several categories of manual corrections are applied:
Precipitation on Field Days
Precipitation events coinciding with station visits are reviewed. Spurious values caused by sensor maintenance (e.g., tipping the bucket during a sensor swap) are set to zero. Genuine rain events on visit days are preserved.
Ground Heat Flux Plate Zeros
G_PLATE values of exactly 0 are set to NaN along with their corresponding G values, as zeros typically indicate sensor disconnection rather than zero flux.
Soil Data Spikes
SoilVue data spikes are identified where K_3_7_1 drops below a threshold (e.g., 3.5). All EC_3, K_3, SWC_3, and TS_3 columns are set to NaN for the affected timestamps.
Wind Direction Corrections
Sonic azimuth was recorded as 217° but measured as 227°. The 10° difference is added to
WD_1_1_1for all data before the program update date.The Young anemometer (
WD_1_1_2) was offset from the IRGASON by approximately 80°. This offset is subtracted.
Miscellaneous Corrections
Barometric pressure spikes on specific dates set to NaN.
Albedo set to NaN where
SW_INorSW_OUTare missing (prevents misleading calculated albedo).Early analog data issues (first few days after install) – radiation and soil variables set to NaN.
Leaf wetness sensor #2 failure after a known date – all
LWM/LEAF_WET_1_2_1values set to NaN.Footprint distance outliers capped (
FP_DIST_INTRST> 1000 m andUPWND_DIST_INTRST< 180 m set to NaN).Temperature spikes on specific dates set to NaN.
Signal Strength Flagging
Custom quality flags are created for H2O and CO2 based on IRGA signal strength:
H2O Signal Flag (H2O_SIG_FLAG_1_1_1)
Flag = 0: signal strength ≥ 0.8 (good)
Flag = 1: signal strength < 0.8 (marginal)
Flag = 2: within a known continuous low-signal stretch (bad). These periods are identified from site visit logs and applied using
data_cleaning.apply_internal_flags().
CO2 Signal Flag (CO2_SIG_FLAG_1_1_1)
Same structure as the H2O flag, using the same known bad periods (typically both gases degrade simultaneously from window contamination).
Wind Direction Flag (WD_1_1_1_FLAG)
Flag = 0: wind from the expected direction (32°–212°)
Flag = 1: wind from behind the tower or from known obstruction directions
Outputs
Exported file:
{stationid}_{daterange}_qc.parquetinfinal_database_tables/qc/Includes derived columns:
day_of_year,time_of_day,days_since_20240101(for coloring regression plots).
Step 3a – Variable Review
Notebook: dugout3a_variable_review.ipynb
Goal: Interactive exploration of the QC dataset to evaluate sensor agreement, data quality, and identify any remaining issues. This notebook does not modify data; corrections should be added back to Notebook 3.
Review Categories
Net Radiation and Radiation Components
Comparison of NETRAD, SW_IN, SW_OUT, LW_IN, LW_OUT between instruments 1 and 2 (CNR4 on eddy system vs. NR01 on met mast).
Daily-mean regression to detect drift over time.
Studentized residual plots to flag outlier days.
PPFD_IN vs. SW_IN regression to verify PAR sensor consistency.
Check for NETRAD records where component values are missing.
Albedo
ALB_1_1_1vs.ALB_1_1_2regression colored by SW_IN, day of year, and time of day.Check for albedo values where SW_IN or SW_OUT is missing.
Wind Speed and Direction
Time-lag detection between IRGASON and Young anemometer using cross-correlation.
Linear regression of
WS_1_1_1vs.WS_1_1_2colored by time.Monthly wind rose plots for both instruments to verify directional consistency.
Soil Heat Flux
Comparison of
G_1_1_1,G_2_1_1,G_3_1_1(two heat flux plates plus SoilVue-derived).SWC, TS, G_PLATE, SG intercomparisons.
Daily mean regression for G values to assess plate-to-plate and plate-to-SoilVue agreement.
Temperature
Comparison of five temperature sources:
T_SONIC,TA_1_1_1(EC100),TA_1_2_1(sonic-derived),TA_1_3_1(EE08 aspirated),TA_1_4_1(secondary aspirated).Regression colored by H2O signal strength to detect IRGA-related temperature bias.
Relative Humidity and VPD
Three RH sources compared; regression colored by H2O signal strength.
Separate analysis for low-signal vs. high-signal periods and flag=2 vs. flag<2 periods.
Energy Balance Closure
Calculated as
(H + LE)vs.(NETRAD – G)usingG_3_1_1.Daily closure analysis with filtering by record completeness (48/48 half-hours), signal strength, and flag status.
CO2 and H2O Concentrations
Time series of
CO2_1_1_1,FC_1_1_1colored by signal strength.Review of flagged periods to confirm they capture low-quality data.
Step 3b – Plot Review
Notebook: dugout3b_plot_review.ipynb
Goal: Generate a quick interactive time-series plot of every variable in the QC dataset for a visual sweep. This serves as a rapid check for remaining spikes, gaps, or artifacts before exporting to AmeriFlux format.
The notebook iterates over all columns (excluding FILE_NAME, stationid) and calls ed_plot.plotlystuff() for each, producing interactive plotly figures with range sliders. It can be run on either the raw or qc data level by changing the level parameter.
Step 4 – AmeriFlux Export
Notebook: dugout4_ameriflux.ipynb
Goal: Prepare the QC dataset for submission to the AmeriFlux network by dropping non-standard variables, applying final signal-strength filters, formatting timestamps, and exporting to CSV.
Signal Strength Filtering
Variables derived from IRGA measurements are set to NaN where signal strength is below 0.8:
H2O signal < 0.8:
H2O_1_1_1,H2O_SIGMA,LE_1_1_1,RH_1_1_1,RH_1_2_1,VPD_1_1_1,ET_1_1_1set to NaN.CO2 signal < 0.8:
CO2_1_1_1,CO2_SIGMA,FC_1_1_1set to NaN.
Column Cleanup
Drop columns that are entirely NaN.
Compare all remaining column names against the AmeriFlux variable list. Columns not in the list are flagged for removal.
Manually review the drop list to confirm no needed variables are lost.
PBLH_Fis kept despite not matching the standard list.Additional columns dropped: internal flags (
WD_1_1_1_FLAG), file names, diagnostic fields (FETCH_MAX,FETCH_90,ZL),stationid, temporal helper columns.
Remaining Variables
The final dataset retains approximately 80 variables including: flux variables (FC, LE, H, TAU with SSITC tests), radiation (SW/LW IN/OUT for two instruments, NETRAD, ALB, PPFD_IN), temperature (4 air temp sources, T_SONIC, T_CANOPY), humidity (3 RH sources, VPD), soil (G from 3 sources, SG from 3 sources, SWC at 9 SoilVue depths plus 2 CS probes, TS at 9 depths plus 2 CS probes), wind (WS and WD from 2 instruments, WS_MAX, U/V/W sigmas, USTAR, WD_SIGMA), and other (PA, P, MO_LENGTH, CO2/H2O concentrations and sigmas, LEAF_WET, PBLH_F).
Final Formatting
NaN values are replaced with −9999 (AmeriFlux missing value convention).
TIMESTAMP_STARTandTIMESTAMP_ENDare recalculated from the datetime index usingtimestamps.add_ameriflux_timestamps()inYYYYMMDDHHmmformat.The start timestamp is verified against the initial AmeriFlux submission to ensure continuity.
Output
File: {stationid}_HH_{timestamp_start}_{timestamp_end}.csv
The HH prefix indicates half-hourly data. This file is ready for upload to the AmeriFlux data portal.
Step 4b – AmeriFlux Plot Review
Notebook: dugout4b_ameriflux_plots.ipynb
Goal: Final visual review of the exported AmeriFlux CSV file. Every variable is plotted as an interactive time series to verify that the exported data looks correct and complete.
This reads the AmeriFlux CSV back in (converting −9999 to NaN for plotting), sets the datetime index, and loops through all columns generating plotly figures. This is the last visual check before submission.
Step 5 – Flux QAQC with Energy Balance
Notebook: dugout5_fluxqaqc.ipynb (in fluxqaqc/ subfolder)
Goal: Run the AmeriFlux data through the flux-data-qaqc package to perform energy balance ratio (EBR) correction, gap-fill ET, and produce daily summaries and diagnostic reports.
Gap-Filling Redundant Sensors
Before running the QAQC, missing values in key variables are imputed using linear regression between redundant sensors:
NETRAD_1_1_1gaps filled fromNETRAD_1_1_2(and vice versa) with R² ≈ 0.988.G_1_1_A(mean of G plates 1 and 2) is computed where both are available. Gaps are filled fromG_3_1_1(SoilVue-derived) with R² ≈ 0.589, and vice versa.Lag between
G_1_1_AandG_3_1_1is verified (optimal lag of 2 periods noted).
Flux-Data-QAQC Configuration
The analysis uses .ini configuration files that specify:
Which column to use for Rn (net radiation) – e.g.,
NETRAD_1_1_1_FINALWhich column to use for G (ground heat flux) – e.g.,
G_1_1_A_FINALColumn mappings for LE, H, air temperature, wind speed, VPD, SWC, etc.
QAQC Processing
The QaQc class processes the data with the following settings:
daily_frac = 1: days with any missing sub-daily measurements are dropped.max_interp_hours = 2: maximum gap length to interpolate during daytime (Rn ≥ 0).max_interp_hours_night = 4: maximum gap length to interpolate during nighttime (Rn < 0).Method: Energy Balance Ratio (EBR) correction applied to LE.
ET gap-filling: uses filtered ETrF multiplied by gridMET reference ET (ETr).
Seasonal Analysis
The notebook subsets data by year and season (growing season: April 1 – October 31; winter: November 1 – March 31) to examine energy balance closure and ET patterns for individual periods.
Sensitivity Testing
Multiple runs are performed with different input combinations to evaluate sensitivity:
NETRAD_1_1_1_FINALvs.NETRAD_1_1_2_FINALas the Rn input.G_1_1_A_FINALvs.G_3_1_1_FINALas the G input.
Each combination produces a separate HTML report for comparison.
Outputs
HTML diagnostic reports with interactive bokeh plots showing daily energy balance components, closure ratios, and ET.
Monthly status summaries showing counts of good data, gap-filled ET, and missing ET days.
Optional CSV export of daily corrected data.
Key Libraries and Dependencies
Library |
Role in Pipeline |
|---|---|
micromet |
Core processing library: |
soil_heat |
SoilVue-derived ground heat flux: |
fluxdataqaqc |
Energy balance ratio correction, ET gap-filling, daily aggregation ( |
pandas |
DataFrame operations, time-series indexing, Parquet/CSV I/O |
numpy |
Numerical operations, NaN handling |
scipy |
Statistical analysis (cross-correlation, linear regression) |
plotly |
Interactive time-series and scatter plots with range sliders |
matplotlib |
Static regression plots, wind roses |
bokeh |
Interactive plots in flux-data-qaqc HTML reports |
windrose |
Monthly wind rose generation in variable review |
prettytable |
Formatted display of station visit notes and program updates |
requests |
REST API calls to UGS database for metadata and events |
Data Flow Diagram
Raw .dat files (Met Statistics + Met AmeriFlux Stats + Eddy AmeriFlux + Eddy CSFormat)
|
v [Notebook 1: Compile & Preprocess]
4x *_preprocessed.parquet
|
v [Notebook 2: Create Raw Data - merge, fix time shifts, validate alignment]
*_raw.parquet
|
v [Notebook 3: QC - calibrations, physical limits, flags, manual corrections]
*_qc.parquet
|
+---> [Notebook 3a: Variable Review - regression, wind roses, closure]
+---> [Notebook 3b: Plot Review - all variables time series]
| (feedback loop: corrections go back to Notebook 3)
|
v [Notebook 4: AmeriFlux Export - drop cols, signal filter, format timestamps]
*_HH_*.csv (AmeriFlux submission file)
|
+---> [Notebook 4b: AmeriFlux Plot Review - final visual check]
|
v [Notebook 5: Flux QAQC - EBR correction, ET gap-fill, sensitivity tests]
EBR-corrected daily ET + HTML diagnostic reports
Adapting the Workflow to Other Sites
The dugout notebooks serve as a template that can be copied and adapted for other UGS flux sites. The key site-specific elements that must be updated for each station are:
Parameters to Update
stationid– the AmeriFlux station code (e.g.,US-UTWfor Wellington).interval– may differ between sites or change over time.date_range– reflects the start/end timestamps of the available data.
Site-Specific Corrections (Notebook 3)
Each site will have its own set of corrections that must be determined from station visit logs, program update records, and data review. Common categories include:
Calibration correction dates and factors (soil, precipitation, etc.).
Sensor failure periods and the specific variables to null.
Wind direction offset between instruments.
Signal strength bad-period date ranges.
Manual spike removal dates.
Column Handling (Notebooks 2 and 4)
The specific columns selected for merging from the CS Format eddy data (csflux_join_cols in Notebook 2) and the columns dropped before AmeriFlux export (Notebook 4) may vary by site depending on what sensors are installed.
Flux QAQC Configuration (Notebook 5)
Each site needs its own .ini configuration file specifying column mappings for the flux-data-qaqc package. The choice of which NETRAD and G columns to use as primary inputs may also differ.