micromet.report package

Submodules

micromet.report.eddy_plots module

micromet.report.eddy_plots.compare_to_sig_strength(df, var, signal_var='H2O_SIG_STRGTH_MIN', cutoff=0.8, scaling_factor=1, sig_plot=False)[source]

Create plotlystuff plots to view all data for a variable over time and values for that variable when the signal strength is below the indicated cutoff value.

Args: df (pd.DataFrame): Dataframe with datetime index. var (str): Name of variable to plot signal_var (str): Name of variable representing signal strength to plot

Should be either H2O_SIG_STRGTH_MIN or CO2_SIG_STRGTH_MIN

cutoff (float): Cutoff value to investigate for signal strength scaling_factor (int): value to scale the signal_var by so that signal

strength and variable of interest can be co-plot

sig_plot (bolean): If True, will plot second plot showing variable alongside: scaled signal strength

“””

micromet.report.eddy_plots.comparison_plot(df, var1, var2, title, xlabel, ylabel, output_path, print_plot=True)[source]

Generates a scatter plot to compare two variables from a DataFrame, including a linear regression line and a 1:1 reference line.

This function performs a linear regression on var1 (independent variable) and var2 (dependent variable), and visualizes the relationship. It drops any rows with missing values in these two columns before plotting. The plot includes several key features:

Data points are displayed as hollow circles with a blue outline. A red line shows the best-fit linear regression. A black dashed line represents the ideal 1:1 relationship for comparison. The legend provides key statistics, including the slope and R-squared value of the linear fit. The plot is saved to a file and displayed.

Parameters df : pandas.DataFrame The input DataFrame containing the data for the plot. var1 : str The name of the column in df to be used for the x-axis and linear regression. var2 : str The name of the column in df to be used for the y-axis and linear regression. title : str The title for the plot. xlabel : str The label for the x-axis. ylabel : str The label for the y-axis. output_path: str The path for where to export the plot

Returns None This function does not return any value; it displays and saves a plot directly.

Dependencies pandas as pd numpy as np matplotlib.pyplot as plt scipy.stats as stats

micromet.report.eddy_plots.create_grouped_boxplot(df, value_col, category_col)[source]

Creates an interactive Plotly Graph Objects boxplot grouped by a category.

Args:: df (pd.DataFrame): The input DataFrame. value_col (str): The name of the numeric column to plot on the Y-axis. category_col (str): The name of the categorical column to group the boxplots by.
Returns:: go.Figure: The Plotly Figure object.

micromet.report.eddy_plots.ols_plot(x, y, xlabel, ylabel, title)[source]

Create a scatterplot between two arrays, x and y, visualizing their relationship along with an Ordinary Least Squares (OLS) regression line and a 1:1 reference line.

This function calculates the OLS regression line for the given data, plots the scattered data points, the regression line with its equation and R-squared value, and a 1:1 diagonal line for comparison. It also adds a grid, custom labels, and a title.

Parameters:

x (array-like) – The independent variable data (e.g., predicted values).
y (array-like) – The dependent variable data (e.g., actual values).
xlabel (str) – The label for the x-axis.
ylabel (str) – The label for the y-axis.
title (str) – The title for the plot.

Returns:

None – This function does not return any value; it displays the plot directly.
Dependencies
————
- matplotlib.pyplot as plt
- numpy as np
- scipy.stats.linregress

micromet.report.eddy_plots.plot_flux_vs_ustar(df, mode='night', ustar_col='USTAR', le_col='LE_1_1_1', h_col='H_1_1_1', netrad_col='NETRAD_1_1_2')[source]

Plot Latent Heat (LE), Sensible Heat (H), and their sum vs Friction Velocity (u*).

This diagnostic tool bins turbulent fluxes by atmospheric turbulence levels to identify the u* threshold and detect advective conditions (Oasis Effect).

Parameters:

df (DataFrame) – Input dataframe with a DatetimeIndex and required flux columns.
mode (str) – Filter for the analysis. ‘day’ uses Rn > 10 W/m², ‘night’ uses Rn <= 10 W/m².
ustar_col (str) – Column name for friction velocity [m/s].
le_col (str) – Column name for Latent Heat Flux [W/m²].
h_col (str) – Column name for Sensible Heat Flux [W/m²].
netrad_col (str) – Column name for Net Radiation [W/m²] used for day/night partitioning.

Return type:

None

Notes

Why this is impfortant: 1. u* Threshold Detection: Under low turbulence (typically night),

measured fluxes often underestimate the true exchange. By plotting Flux vs. $u_*$, we look for the “plateau”—the $u_*$ value where the flux becomes independent of wind speed. This is your $u_*$ filter cutoff.

The Oasis Effect: In irrigated fields or wetlands surrounded by dry areas, LE can exceed Net Radiation ($R_n$). This plot helps identify if negative Sensible Heat ($H < 0$) is “feeding” evaporation, a classic indicator of regional advection.
Energy Balance Verification: Monitoring the sum $(H + LE)$ relative to $u_*$ helps determine if the “missing energy” in your balance is correlated with poor mixing or specific wind conditions.

micromet.report.eddy_plots.plot_interactive_regression_with_color(df, x_col, y_col, color_col, plot_size=500)[source]

Generates an interactive scatter plot with a linear regression line, a 1:1 line, and color-coding using Plotly. Index and variable values appear on hover.

Parameters:

df (DataFrame) – The DataFrame containing the data. DatetimeIndex is automatically handled for hover.
x_col (str) – The name of the column for the x-axis.
y_col (str) – The name of the column for the y-axis.
color_col (str) – The name of the column used to color the scatter plot points.

Return type:

None

micromet.report.eddy_plots.plot_linear_regression_with_color(data, x_col, y_col, color_col, output_path=None, print_plot=False)[source]

Generates a scatter plot with a linear regression line and a 1:1 line for data analysis.

This function is designed for plotting any three numerical columns from a pandas DataFrame. It performs a linear regression between the specified x and y columns and uses a third column to color the data points.

Parameters:

data (pandas.DataFrame) – The DataFrame containing the data to be plotted. It must contain the columns specified by x_col, y_col, and color_col.
x_col (str) – The name of the column for the x-axis, representing the independent variable.
y_col (str) – The name of the column for the y-axis, representing the dependent variable.
color_col (str) – The name of the column used to color the scatter plot points, useful for visualizing a third variable.

Returns:

None – This function displays a plot using matplotlib.
Dependencies
————
- matplotlib.pyplot
- scipy.stats
- pandas (assumed for the input ‘data’ DataFrame)
The plot includes
- Scatter points of y_col vs. x_col.
- A colorbar representing color_col. The twilight colormap is used, which is ideal – for cyclical data.
- A linear regression best-fit line with its slope and R-squared value.
- A 1 (1 line for visual comparison.)
- A legend, a grid, and auto-adjusted axis labels based on the input column names.

micromet.report.eddy_plots.plot_wind_rose_from_df(df, wd_col, ws_col, title=None, save_path=None)[source]

Generates and plots a wind rose from a pandas DataFrame.

This function creates a wind rose plot using the specified wind direction and wind speed columns from a DataFrame. The plot displays the frequency of wind coming from different directions and the distribution of wind speeds.

Parameters:

df (pandas.DataFrame) – The DataFrame containing the wind data.
wd_col (str) – The name of the column in df that contains the wind direction data (in degrees).
ws_col (str) – The name of the column in df that contains the wind speed data.
title (str, optional) – The title for the wind rose plot. If not provided, no title will be set.
save_path (str, optional) – The file path to save the plot. If not provided, the plot will not be saved. Example: ‘my_wind_rose_plot.png’

Returns:

This function displays and/or saves a plot.

Return type:

None

micromet.report.eddy_plots.plotlystuff(datasets, colnames, chrttypes=None, datatitles=None, chrttitle='', colors=None, two_yaxes=False, axisdesig=None, axislabels=['Levels', 'Barometric Pressure'], opac=None, plot_height=300)[source]

Plots one or more datasets on a shared set of axes

datasets: list of one or more datasets to plot, must have datetime index colnames: list of one or more column names to plot on the y-axis; must be one column name per dataset chrttypes: list of types of characters to plot; defaults to line; can include lines and markers (points) colors: list of colors to use in plots; defaults to [‘#228B22’, ‘#FF1493’, ‘#5acafa’, ‘#663399’, ‘#FF0000’] two_yaxes: presumably whether data should show up with two axes or one axisdesig:uncertain axislabels: list of names to for legend to label y-value on each dataset opac:list of values for opacity setting of datasets; default is 0.8 plot_height: integer value for height of plot; default is 300

micromet.report.eddy_plots.student_resid_plot(df, var1, var2, title)[source]

Generates an interactive scatter plot of studentized residuals from an OLS regression.

This function performs a simple Ordinary Least Squares (OLS) regression using var1 as the independent variable and var2 as the dependent variable from the input DataFrame df. It then calculates the studentized residuals and plots them against the DataFrame’s index (assumed to be temporal, e.g., ‘Date’). The plot includes horizontal lines indicating a the 1.96 threshold and highlights points that exceed these thresholds as outliers.

Parameters:

df (pandas.DataFrame) – The input DataFrame containing the data for regression. The DataFrame’s index is used for the x-axis in the plot.
var1 (str) – The name of the column in df to be used as the independent variable in the OLS regression.
var2 (str) – The name of the column in df to be used as the dependent variable in the OLS regression.
title (str) – The title for the plot.

Returns:

None – This function does not return any value; it displays an interactive Plotly graph directly.
Dependencies
————
- statsmodels.formula.api.ols
- plotly.express as px
- numpy as np (for np.abs)

micromet.report.fix_g_values module

micromet.report.fix_g_values.apply_limits_to_vars(df, limit_check_vars, limits)[source]

Sets values in specified columns that fall outside a given [min, max] range to NaN.

Parameters:

df (pd.DataFrame) – The input DataFrame.
limit_check_vars (list of str) – List of column names to apply the limits to.
limits (list or tuple) – A two-element sequence [min_value, max_value].

Returns:

A new DataFrame with out-of-range values set to NaN.

Return type:

pd.DataFrame

micromet.report.fix_g_values.calc_mean_value_for_soil(df, var='G')[source]

Calculates the mean of two related variables (var_1_1_1 and var_2_1_1) and stores the result in a third variable (var_1_1_A).

Parameters:

df (pd.DataFrame) – The input DataFrame.
var (str, optional) – The variable prefix (e.g., ‘G’, ‘SG’). Default is ‘G’.

Returns:

A new DataFrame with the calculated mean value in the ‘var_1_1_A’ column.

Return type:

pd.DataFrame

micromet.report.fix_g_values.calculate_new_g_value(df, plate_num)[source]

Calculates the new G value (G_{plate_num}__1_1) by summing the G_PLATE and SG components.

Note: The sum operation automatically results in NaN if either source value is NaN.

Parameters:

df (pd.DataFrame) – The input DataFrame.
plate_num (str) – The plate number (e.g., ‘1’ or ‘2’) used to construct column names.

Returns:

A new DataFrame with the calculated G value.

Return type:

pd.DataFrame

micromet.report.fix_g_values.correct_vars_by_factor(df, correction_factor=0.3125, vars_to_correct=['SG_1_1_1', 'SG_2_1_1'], min_correction_date='2010-01-01', max_correction_date='2030-01-01')[source]

Applies a multiplicative correction factor to specified variables within a defined time window. The default min and max correction dates are intended to correct the full range of values.

Parameters:

df (pd.DataFrame) – The input DataFrame, expected to have a DatetimeIndex.
correction_factor (float, optional) – The factor by which the variables should be multiplied. Default is 0.05 / 0.16.
vars_to_correct (list of str, optional) – List of column names to apply the correction to.
min_correction_date (str, optional) – Start date (inclusive) for the correction window.
max_correction_date (str, optional) – End date (inclusive) for the correction window.

Returns:

A new DataFrame with the specified columns corrected within the date range.

Return type:

pd.DataFrame

micromet.report.fix_g_values.run_soil_data_pipeline(df_input, sg_correction_factor=0.3125, sg_limits=[-100, 250], g_limits=[-250, 400])[source]

Executes the full, seven-step data processing pipeline for soil data.

The steps include: 1. Applying correction factor to SG variables. 2. Applying limits/quality control to SG variables. 3. Calculating G values for plate 1 and plate 2 based on SG plus . 4. Applying limits/quality control to calculated G variables. 5. Calculating the mean G value (G_1_1_A). 6. Applying limits/quality control to the mean G variable.

Parameters:

df_input (pd.DataFrame) – The initial input DataFrame (e.g., ‘final_eddy’).
sg_correction_factor (float) – Correction factor for SG variables.
sg_limits (list or tuple) – Min/max limits for SG variables.
g_limits (list or tuple) – Min/max limits for G variables (G_1_1_1, G_2_1_1, G_1_1_A).

Returns:

The final processed DataFrame.

Return type:

pd.DataFrame

micromet.report.gap_summary module

micromet.report.gap_summary.compare_gap_summaries(gaps_a, gaps_b, expected_freq='30min', min_steps=1)[source]

Compare two gap-summary DataFrames (from summarize_gaps) and highlight where one dataset has coverage that could fill the other’s gaps.

Parameters:

gaps_a (DataFrame) – DataFrames returned by summarize_gaps. Must include the columns: [‘STATIONID’,’COLUMN’,’GAP_START’,’GAP_END’,’N_STEPS_MISSING’,’HOURS_MISSING’,’GAP_KIND’].
gaps_b (DataFrame) – DataFrames returned by summarize_gaps. Must include the columns: [‘STATIONID’,’COLUMN’,’GAP_START’,’GAP_END’,’N_STEPS_MISSING’,’HOURS_MISSING’,’GAP_KIND’].
expected_freq (str) – Sampling frequency. Used to compute discrete step counts and to treat intervals on the expected time grid.
min_steps (int) – Only report fillable segments with at least this many steps.

Returns:

One row per fillable segment. Columns:

TARGET_DATASET (“A” or “B”)

SOURCE_DATASET (“B” or “A”)

STATIONID

COLUMN

TARGET_GAP_START

TARGET_GAP_END

FILLABLE_START

FILLABLE_END

N_STEPS_FILLABLE

HOURS_FILLABLE

TARGET_N_STEPS_MISSING

COVERAGE_RATIO (steps_fillable / TARGET_N_STEPS_MISSING)

TARGET_GAP_KIND

Return type:

DataFrame

micromet.report.gap_summary.summarize_gaps(df, station_level='STATIONID', time_level='DATETIME_END', expected_freq='30min', columns=None)[source]

Summarize runs of missing data (NaNs) per column for each station in a MultiIndex DataFrame indexed by (station, datetime).

Parameters:

df (DataFrame) – Input DataFrame with a MultiIndex (station_level, time_level).
station_level (str) – Name of the station level in the index.
time_level (str) – Name of the datetime level in the index.
expected_freq (str) – The expected sampling frequency. Used to build a complete timeline per station so that missing timestamps become explicit NaNs.
columns (list | None) – Subset of columns to analyze. Defaults to all columns.

Returns:

Columns:

STATIONID
COLUMN
GAP_START
GAP_END
N_STEPS_MISSING
HOURS_MISSING
GAP_KIND (“MissingTimestamp”, “NaN”, or “Mixed”)

Return type:

DataFrame

micromet.report.graphs module

micromet.report.graphs.bland_alt_plot(edmet, compare_dict, station, alpha=0.5, logger=None)[source]

Create a Bland-Altman plot to assess agreement between instruments.

This function generates a Bland-Altman plot to visualize the agreement between two instruments, including the bias and limits of agreement.

Parameters:

edmet (pd.DataFrame) – A DataFrame with a DatetimeIndex containing measurement data.
compare_dict (dict) – A dictionary mapping instrument column names to their metadata.
station (str) – The identifier for the station, used in the plot title.
alpha (float, optional) – The transparency level for the plot elements. Defaults to 0.5.
logger (Logger) – A logger for outputting statistics. Defaults to None.

Returns:

A tuple containing the matplotlib Figure and Axes objects.

Return type:

tuple[plt.Figure, plt.Axes]

micromet.report.graphs.energy_sankey(df, date_text='2024-06-19 12:00', logger=None)[source]

Create a Sankey diagram of energy balance for a specific time.

This function generates a Sankey diagram to visualize the flow of energy components in a system, such as incoming and outgoing radiation, and heat fluxes.

Parameters:

df (pd.DataFrame) – A DataFrame with a DatetimeIndex and columns for energy components like ‘SW_IN’, ‘LW_IN’, ‘NETRAD’, ‘G’, ‘LE’, ‘H’.
date_text (str, optional) – The date and time for which to plot the energy balance. Defaults to “2024-06-19 12:00”.
logger (Logger) – A logger for outputting debug information. Defaults to None.

Returns:

A Plotly Figure object containing the Sankey diagram.

Return type:

go.Figure

micromet.report.graphs.mean_diff_plot(m1, m2, sd_limit=1.96, ax=None, scatter_kwds=None, mean_line_kwds=None, limit_lines_kwds=None)[source]

Construct a Tukey/Bland-Altman Mean Difference Plot.

This plot shows the difference between two measurements against their mean, which is useful for assessing the agreement between two measurement methods.

Parameters:

m1 (array_like) – A 1-D array of measurements.
m2 (array_like) – A 1-D array of measurements.
sd_limit (float, optional) – The number of standard deviations for the limits of agreement. Defaults to 1.96.
ax (plt.Axes, optional) – An existing matplotlib Axes to draw the plot on. Defaults to None.
scatter_kwds (dict, optional) – Keyword arguments for the scatter plot. Defaults to None.
mean_line_kwds (dict, optional) – Keyword arguments for the mean difference line. Defaults to None.
limit_lines_kwds (dict, optional) – Keyword arguments for the limits of agreement lines. Defaults to None.

Returns:

The matplotlib Figure object.

Return type:

plt.Figure

micromet.report.graphs.mean_squared_error(series1, series2)[source]

Calculate the Mean Squared Error (MSE) between two series.

MSE is a measure of the average squared difference between the estimated values and the actual value.

Parameters:

series1 (Series) – The first data series.
series2 (Series) – The second data series.

Returns:

The Mean Squared Error between the two series.

Return type:

float

Raises:

ValueError – If the input series are not of the same length.

micromet.report.graphs.plot_timeseries_daterange(input_df, selected_station, selected_field, start_date, end_date)[source]

Plot a time series for a specific station and variable over a date range.

This function filters a DataFrame by station and date range, and then plots the selected variable over time.

Parameters:

input_df (pd.DataFrame) – A DataFrame with a MultiIndex (‘station’, ‘timestamp’).
selected_station (str) – The identifier of the station to plot.
selected_field (str) – The name of the column (variable) to plot.
start_date (str or pd.Timestamp) – The start date of the time range.
end_date (str or pd.Timestamp) – The end date of the time range.

Return type:

None

micromet.report.graphs.save_plot(b)[source]

Save the current matplotlib figure to a file.

This function is intended to be used as a callback for an interactive widget, such as a button in a Jupyter notebook.

Parameters:: b (object) – The triggering widget event (not used in the function).
Return type:: None

micromet.report.graphs.scatterplot_instrument_comparison(edmet, compare_dict, station, logger=None)[source]

Generate a scatter plot comparing two instrument measurements.

This function creates a scatter plot to compare measurements from two instruments, including a linear regression fit and a 1:1 reference line.

Parameters:

edmet (pd.DataFrame) – A DataFrame with a DatetimeIndex containing the measurement data.
compare_dict (dict) – A dictionary mapping instrument column names to their metadata.
station (str) – The identifier for the station, used in the plot title.
logger (Logger) – A logger for outputting regression statistics. Defaults to None.

Returns:

A tuple containing the slope, intercept, R-squared, p-value, standard error, and the matplotlib Figure and Axes objects.

Return type:

tuple

micromet.report.recalculate_albedo module

micromet.report.recalculate_albedo.update_albedo(df, suffix, threshold=0.1)[source]

Calculates the Shortwave Albedo percentage based on incoming and outgoing radiation.

Following the EasyFlux methodology, this function computes albedo where solar radiation is sufficient and ensures that missing sensor data or physical impossibilities (like night time or sensor shading) are handled correctly.

Args:

df (pd.DataFrame): The dataset containing radiation measurements. suffix (str): The sensor or level identifier used in column naming

(e.g., ‘1’ for ‘SW_IN_1’).

threshold (float): Minimum incoming radiation (W/m²) to consider valid for albedo: calculation. Commonly 0.1 for NR01 and 10 for SN500

Returns:

np.ndarray: Calculated Albedo values as a percentage (0-100%).: Returns 0 during night/invalid conditions and NaN if inputs are missing.

micromet.report.tools module

micromet.report.tools.aggregate_to_daily_centroid(df, date_column='Timestamp', x_column='X', y_column='Y', weighted=True)[source]

Aggregate half-hourly coordinate data to daily centroids.

This function calculates the daily centroid of a set of coordinates, with an option to weight the calculation by another variable.

Parameters:

df (pd.DataFrame) – A DataFrame with timestamp and coordinate data.
date_column (str, optional) – The name of the column containing the timestamps. Defaults to “Timestamp”.
x_column (str, optional) – The name of the column with the X coordinates. Defaults to “X”.
y_column (str, optional) – The name of the column with the Y coordinates. Defaults to “Y”.
weighted (bool, optional) – If True, the centroid calculation is weighted by the ‘ET’ column. Defaults to True.

Returns:

A DataFrame with the aggregated daily centroids.

Return type:

pd.DataFrame

micromet.report.tools.clean_extreme_variations(df, fields=None, frequency='D', variation_threshold=3.0, null_value=-9999, min_periods=2, replacement_method='nan')[source]

Clean extreme variations from time series data.

This function identifies and replaces extreme values in a DataFrame using one of several methods.

Parameters:

df (DataFrame) – A DataFrame with a DatetimeIndex.
fields (Union[str, List[str]]) – The column(s) to clean. If None, all numeric columns are used. Defaults to None.
frequency (str) – The frequency for analyzing variations. Defaults to ‘D’.
variation_threshold (float) – The threshold for detecting extreme variations. Defaults to 3.0.
null_value (Union[float, int]) – A value to treat as null. Defaults to -9999.
min_periods (int) – The minimum number of observations for calculation. Defaults to 2.
replacement_method (str) – The method for replacing extreme values (‘nan’, ‘interpolate’, ‘mean’, ‘median’). Defaults to ‘nan’.

Returns:

A dictionary containing the cleaned data, a summary of the cleaning process, and the points that were removed.

Return type:

Dict[str, DataFrame]

micromet.report.tools.compute_Cw(sigma_w, u_star, target=1.25)[source]

Compute the vertical velocity correction factor, Cw.

This function calculates Cw based on the ratio of the standard deviation of vertical velocity (sigma_w) to the friction velocity (u_star).

Parameters:

sigma_w (float) – The standard deviation of the vertical velocity.
u_star (float) – The friction velocity.
target (float, optional) – The target ratio for the correction. Defaults to 1.25.

Returns:

The calculated correction factor, Cw. Returns 1.0 if no correction is needed, or NaN if the ratio is invalid.

Return type:

float

micromet.report.tools.detect_extreme_variations(df, fields=None, frequency='D', variation_threshold=3.0, null_value=-9999, min_periods=2)[source]

Detect extreme variations in time series data.

This function analyzes one or more fields in a DataFrame to identify points that deviate significantly from the mean within a given time frequency.

Parameters:

df (DataFrame) – A DataFrame with a DatetimeIndex.
fields (Union[str, List[str]]) – The column(s) to analyze. If None, all numeric columns are used. Defaults to None.
frequency (str) – The frequency for grouping data (e.g., ‘D’ for daily). Defaults to ‘D’.
variation_threshold (float) – The number of standard deviations from the mean to be considered an extreme variation. Defaults to 3.0.
null_value (Union[float, int]) – A value to be treated as null. Defaults to -9999.
min_periods (int) – The minimum number of valid observations required to calculate variation. Defaults to 2.

Returns:

A dictionary containing the calculated variations, a boolean DataFrame of extreme points, and a summary of the analysis.

Return type:

Dict[str, DataFrame]

micromet.report.tools.filter_near_neutral(z_over_L, lower=-0.1, upper=0.0)[source]

Filter for near-neutral atmospheric stability conditions.

This function returns a boolean mask indicating where the stability parameter z/L falls within a specified range for near-neutral conditions.

Parameters:

z_over_L (array_like) – An array or Series of z/L values.
lower (float, optional) – The lower bound for near-neutral stability. Defaults to -0.1.
upper (float, optional) – The upper bound for near-neutral stability. Defaults to 0.0.

Returns:

A boolean mask that is True for near-neutral conditions.

Return type:

np.ndarray

micromet.report.tools.find_gaps(df, columns, missing_value=-9999, min_gap_periods=1)[source]

Find and report gaps in time series data.

This function identifies continuous periods of missing data (either NaN or a specified missing value) in one or more columns of a DataFrame.

Parameters:

df (pd.DataFrame) – A DataFrame with a regular time series index.
columns (str or list of str) – The column(s) to check for gaps.
missing_value (numeric, optional) – A specific value to be treated as missing. Defaults to -9999.
min_gap_periods (int, optional) – The minimum number of consecutive missing periods to be considered a gap. Defaults to 1.

Returns:

A DataFrame with information about each detected gap, including start and end times, duration, and the number of missing records.

Return type:

pd.DataFrame

micromet.report.tools.find_irr_dates(df, swc_col='SWC_1_1_1', do_plot=False, dist=20, height=30, prom=0.6)[source]

Detect irrigation events from a soil water content time series.

This function identifies peaks in soil water content (SWC) data that are likely to be irrigation events, based on their prominence, height, and the distance between them.

Parameters:

df (pd.DataFrame) – A DataFrame with a DatetimeIndex containing the SWC data.
swc_col (str, optional) – The name of the column containing SWC data. Defaults to ‘SWC_1_1_1’.
do_plot (bool, optional) – If True, a plot of the SWC time series with the detected irrigation events will be displayed. Defaults to False.
dist (int, optional) – The minimum number of time steps between detected peaks. Defaults to 20.
height (float, optional) – The minimum height of the peaks to be considered irrigation events. Defaults to 30.
prom (float, optional) – The minimum prominence of the peaks. Defaults to 0.6.

Returns:

A tuple containing the dates of the detected irrigation events and the corresponding SWC values.

Return type:

tuple[pd.DatetimeIndex, pd.Series]

micromet.report.tools.plot_gaps(gaps_df, title='Time Series Data Gaps')[source]

Create a Gantt chart visualization of gaps in time series data.

This function takes a DataFrame of gap information and creates a Gantt chart to visualize the duration and timing of data gaps for different variables.

Parameters:

gaps_df (pd.DataFrame) – A DataFrame containing gap information, as returned by the find_gaps function.
title (str, optional) – The title for the plot. Defaults to “Time Series Data Gaps”.

Returns:

An interactive Plotly figure showing the data gaps, or None if there are no gaps to plot.

Return type:

go.Figure or None

micromet.report.tools.polar_to_cartesian_dataframe(df, wd_column='WD', dist_column='Dist')[source]

Convert polar coordinates in a DataFrame to Cartesian coordinates.

Parameters:

df (pd.DataFrame) – A DataFrame containing the polar coordinate data.
wd_column (str, optional) – The name of the column with the wind direction in degrees. Defaults to “WD”.
dist_column (str, optional) – The name of the column with the distance. Defaults to “Dist”.

Returns:

The DataFrame with added ‘X’ and ‘Y’ columns.

Return type:

pd.DataFrame

micromet.report.tools.subset_year(df, year, subset_type='growing_season', gs_start='04-01', gs_end='10-31')[source]

Subsets a DataFrame based on a seasonal window for a specific year.

This function filters data based on a DatetimeIndex. It can extract either the “growing season” within a single year or the “winter” period that spans from the end of the specified year to the start of the next.

Args:

df (pd.DataFrame): The input DataFrame. Must have a DatetimeIndex. year (int): The year to perform the subsetting for. subset_type (str, optional): The type of subset to return.

Options are ‘growing_season’ or ‘winter’. Defaults to ‘growing_season’.

gs_start (str, optional): The start date of the growing season in ‘MM-DD’: format. Defaults to ‘04-01’.
gs_end (str, optional): The end date of the growing season in ‘MM-DD’: format. Defaults to ‘10-31’.

Returns:

pd.DataFrame: A subsetted DataFrame if data is found; otherwise, returns None.

Raises:

AttributeError: If the DataFrame index is not a DatetimeIndex.

micromet.report.validate module

micromet.report.validate.compare_names_to_ameriflux(df_full, amflux)[source]

Cleans column names of df_full by removing ‘_1’, ‘_2’, ‘_3’, and ‘_4’ suffixes, compares the cleaned names against an ‘amflux’ variable list, and returns a DataFrame of the results, along with printing the unmatched columns.

Return type:: DataFrame

Args:: df_full: The DataFrame whose columns need to be cleaned and matched. amflux: A DataFrame or Series that contains the ‘Variable’ column

or is the Series of variables to match against.
Returns:: A DataFrame containing the original columns, the cleaned columns, and a boolean indicating if the cleaned column is in the amflux list.

micromet.report.validate.compare_to_raw(raw_file_path, micromet_df, test_var='NETRAD', threshold=0.1)[source]

Compares a specific variable between a raw data file and a micromet DataFrame.

The function reads a ‘raw’ DAT or CSV file from the provided path, merges it with the ‘micromet’ DataFrame based on TIMESTAMP to DATETIME_END fields, and calculates the absolute difference for a specified variable (test_var) between the two sources. It returns only the rows where this absolute difference is greater than the given threshold.

Args:

raw_file_path (str): The file path to the raw data CSV file. This file is: assumed to have a specific format (header on row 1, with rows 2 and 3 skipped).

micromet_df (pd.DataFrame): DataFrame containing the micrometeorological data. test_var (str, optional): The variable to compare (e.g., ‘LE’ for Latent Energy).

Defaults to ‘LE’. The function assumes the raw column is named ‘{test_var}_1_1_1’ and the micromet column is named ‘{test_var}’.

threshold (float, optional): The absolute difference threshold. Rows where: |raw_value - micromet_value| > threshold are returned. Defaults to 0.1.

Returns:

pd.DataFrame: A DataFrame containing the ‘DATETIME_END’ and the values of the: test_var from both sources (‘{test_var}_1_1_1’ and ‘{test_var}’) for all rows where the absolute difference exceeds the threshold.

micromet.report.validate.data_diff_check(df1, df2)[source]

Calculates the percent of non-null fields that differ between two dataframes, for all column pairs with identical names.

Note: It can be helpful to round the dataframes first if you only want to note larger differences

Parameters:

df1 (DataFrame with a DatatimeIndex)
df2 (DataFrame with a DatatimeIndex)

Returns:

Dataframe with column names as index and percent (not proportion!) of values that differ in that column between dataframes

Return type:

pd.DataFrame

micromet.report.validate.detect_sectional_offsets_indexed(df1, df2, value_col1, value_col2, freq='h', max_lag=24, window_size='7D')[source]

Evaluates time offsets between two time series data frames ((datetime-indexed) in rolling sections. Returns the best lag with the best offset for each time window.

Parameters: - df1, df2: DataFrames with datetime index. - value_col1: name of the column with numerical values to compare for df1 - value_col2: name of the column with numerical values to compare for df2 - freq: resampling frequency (e.g., ‘h’ for hourly). - max_lag: maximum lag (in units of freq) to test. - window_size: time window for sectional comparison (e.g., ‘7D’ or ‘12H’).

Returns: - DataFrame with lag information per window.

micromet.report.validate.find_zero_chunks(df, var_name, days_threshold, aggregation_method='sum', tolerance=1e-06)[source]

Identifies continuous chunks of time where a variable is effectively zero or NaN, treating NaNs as part of the zero gap.

The function first resamples the high-frequency data to daily (‘D’) frequency using the specified aggregation method before checking for long zero periods.

Return type:: DataFrame

Args:

df: The pandas DataFrame with a DatetimeIndex (any frequency). var_name: The name of the column to check for zero values. days_threshold: The minimum number of consecutive days required to be

identified as a “long zero chunk”.

aggregation_method: The method used to aggregate high-frequency data to daily.: Options: ‘sum’ (default) or ‘max’.

tolerance: A small value used to check if a float is close to zero.

Returns:

A DataFrame listing the ‘Start Day’, ‘End Day’, and ‘Duration (Days)’ for each identified long zero chunk.

micromet.report.validate.plot_sectional_lags_plotly(corr_check, height=400)[source]: Plots the results of the detect_sectional_offsets_indexed function, showing the best lag for each timeperiod

micromet.report.validate.prep_for_comparison(df1, df2)[source]

Prepares two pandas DataFrames for comparison by: 1. Finding the intersection of columns. 2. Finding the intersection of indices. 3. Returning new DataFrames with only the common columns and indices.

Return type:: tuple[DataFrame, DataFrame]

Args:: df1: The first pandas DataFrame. df2: The second pandas DataFrame.
Returns:: A tuple of two pandas DataFrames (df1_prep, df2_prep) ready for comparison.

micromet.report.validate.review_lags(data1, data2, max_lag=4)[source]

Calculates the Cross-Correlation Function (CCF) to find the optimal time lag between two time series.

The optimal lag is the time shift that results in the maximum absolute correlation between the two series.

Parameters:

data1 (pd.Series) – The primary time series. Must have a DatetimeIndex and be the same frequency as data2.
data2 (pd.Series) – The secondary time series, which is shifted (lagged) relative to data1. Must have a DatetimeIndex and be the same frequency as data1.
max_lag (int) – The maximum number of periods (in both positive and negative directions) to test for the lag. The function tests lags from -max_lag to +max_lag.

Returns:

A Series containing the cross-correlation values. The index is the lag (in periods), and the values are the correlation coefficients.

Return type:

pd.Series

Notes

Lag Interpretation: - A positive lag (k > 0) means data2 leads data1 by k periods. - A negative lag (k < 0) means data1 leads data2 by |k| periods.
Missing Data: Pandas’ .corr() uses pairwise complete observation, meaning it only correlates non-NA values that align by date/time index. Shifting data2 introduces NAs at the start/end, automatically reducing the sample size, which is an expected behavior of lagged correlation.

micromet.report.validate.validate_flags(df, flag_columns=['FC_SSITC_TEST', 'LE_SSITC_TEST', 'ET_SSITC_TEST', 'H_SSITC_TEST', 'TAU_SSITC_TEST'], allowed_values=[0, 1, 2])[source]

Checks specified DataFrame columns for values outside of the allowed set, including checking for NaN (missing) values.

This is typically used for quality control (QC) flag columns which should only contain specific integer values (like 0, 1, 2).

Parameters:

df (DataFrame) – The input DataFrame containing the flag columns.
flag_columns (List[str]) – A list of column names to check.
allowed_values (List[int]) – The list of values considered valid (defaults to [0, 1, 2]).

Returns:

A dictionary where keys are the column names that failed validation, and values are a list of the unique, invalid values found in that column, including the string “NaN” if missing values are present.

Return type:

Dict[str, List]

micromet.report.validate.validate_timeseries_data(df, interval_minutes, date_format='%Y%m%d%H%M')[source]

Performs several validation checks on a time-series DataFrame with a DatetimeIndex.

This version includes a robust type coercion step (astype(str) + regex cleanup) to handle the scenario where the START/END columns contain unparsed numeric data (like floats ending in .0) which causes comparison failures.

Return type:: Dict[str, Union[bool, str]]

Args:

df: The input DataFrame, expected to have a DatetimeIndex and columns: named ‘TIMESTAMP_START’ and ‘TIMESTAMP_END’ containing datetime-like data.

interval_minutes: The expected interval between index entries (e.g., 30 or 60). date_format: The format string for converting string/numeric dates (default is ‘%Y%m%d%H%M’).

Returns:

A dictionary summarizing the results of the three validation checks.

micromet.report.validate.validate_timestamp_consistency(df)[source]

Checks for consistency between a standardized datetime column (DATETIME_END) and a string/integer timestamp column (TIMESTAMP_START) formatted as YYYYMMDDHHMM.

Parameters:: df (DataFrame) – The input DataFrame containing the columns to check.
Returns:: A DataFrame containing only the rows where the DATETIME_END and the converted TIMESTAMP_END columns do not match, along with both columns for inspection. Returns an empty DataFrame if all rows match.
Return type:: DataFrame

Module contents

This package contains modules for generating reports and visualizations.