micromet.qaqc package

Submodules

micromet.qaqc.data_cleaning module

micromet.qaqc.data_cleaning.adjust_wd(df, wd_col, public_azimuth, actual_azimuth, start_date, end_date)[source]

Adjusts wind direction values in a DataFrame based on sensor orientation differences.

This function calculates the offset between a ‘public’ (reported) azimuth and the ‘actual’ sensor azimuth, then applies this correction to a specific time range within the dataset, ensuring the results stay within the [0, 360) degree range.

Args:

df (pd.DataFrame): The input dataframe, expected to have a DatetimeIndex. wd_col (str): The name of the column containing wind direction values. public_azimuth (float): The baseline or incorrectly assumed orientation angle. actual_azimuth (float): The true physical orientation angle of the sensor. start_date (str or pd.Timestamp): The beginning of the period to adjust (inclusive). end_date (str or pd.Timestamp): The end of the period to adjust (exclusive).

Returns:

pd.DataFrame: A copy of the original DataFrame with corrected wind direction values.

micromet.qaqc.data_cleaning.adjust_wind_direction(series, degrees)[source]

Adjusts wind direction in degrees and ensures the result stays within [0, 360).

Parameters: series (pd.Series): The wind direction column. degrees (float): Degrees to add (positive) or subtract (negative).

Returns: pd.Series: The adjusted values ready for assignment.

micromet.qaqc.data_cleaning.apply_internal_flags(df, flag_cols, start_date, end_date, flag_value)[source]

Applies a specified flag value across multiple specified flag columns within a given date range. … (Docstring contents omitted for brevity, but they are correct)

micromet.qaqc.data_cleaning.apply_lag_shift(df, detected_lag, freq_unit)[source]

Applies the inverse of the detected lag to a DataFrame’s datetime index to align it with the reference dataset.

Parameters: - df (pd.DataFrame): The DataFrame to be shifted (e.g., df1 from find_optimal_shift function). - detected_lag (int): The lag detected by find_optimal_shift (e.g., -60). - freq_unit (str): The frequency unit used for the lag (e.g., ‘D’, ‘H’, ‘30T’).

Returns: - pd.DataFrame: The DataFrame with the adjusted datetime index.

micromet.qaqc.data_cleaning.despike_data_nan_aware(data, filter_size=5, threshold_factor=3.0)[source]

Remove outliers (spikes) from a 1D array using a NaN-aware median filter.

This function identifies spikes by comparing each data point to the median of its local neighborhood. It is specifically designed to handle arrays containing NaN values without allowing those NaNs to bias the filter or the noise statistics.

The process follows these steps: 1. Pads the data to handle edges using reflection. 2. Calculates a moving baseline using a sliding window median (ignoring NaNs). 3. Computes the residual noise and determines a threshold based on the

standard deviation of that noise.

  1. Replaces values exceeding the threshold with the local median.

Parameters:
  • data (array_like) – The input 1D signal or time-series data to be despiked. Can contain NaN values.

  • filter_size (int, optional) – The size of the sliding window used to calculate the local median. Must be an odd integer. Default is 5.

  • threshold_factor (float, optional) – The multiplier applied to the global standard deviation of the noise to determine the spike detection threshold. A higher value is less sensitive (detects fewer spikes). Default is 3.0.

Returns:

  • despiked_data (ndarray) – A copy of the input data where identified spikes have been replaced by the local median. Original NaN values are preserved.

  • spike_mask (ndarray (bool)) – A boolean mask of the same shape as data, where True indicates a detected spike location.

Notes

  • This function uses np.nanmedian and np.nanstd, which are computationally more expensive than their standard counterparts but necessary if the dataset is missing values.

  • If a window consists entirely of NaNs, the resulting baseline value for that window will be NaN.

Examples

>>> signal = [10, 11, 100, 12, np.nan, 11, 10]
>>> clean, mask = despike_data_nan_aware(signal, filter_size=3)
>>> clean
array([10., 11., 11., 12., nan, 11., 10.])
micromet.qaqc.data_cleaning.find_optimal_shift(df1, df2, value_col1, value_col2, freq='h', min_lag_units=100, max_lag_units=500, dropna_threshold=0.75)[source]

Identifies the optimal time shift (lag) required to align two datetime-indexed pandas DataFrames by maximizing the cross-correlation between two specified columns.

This version searches for shifts between +/- min_lag_units and +/- max_lag_units.

Interpreting the Lag: - Positive Lag: df2 is behind df1 (df2 needs to be shifted FORWARD). - Negative Lag: df2 is ahead of df1 (df2 needs to be shifted BACKWARD).

Parameters: - df1, df2 (pd.DataFrame): DataFrames with a datetime index. - value_col1 (str): Name of the column in df1 to compare. - value_col2 (str): Name of the column in df2 to compare. - freq (str): Resampling frequency (e.g., ‘D’ for daily, ‘H’ for hourly).

This determines the unit of the returned lag.

  • min_lag_units (int): The absolute minimum lag magnitude (in units of ‘freq’) to test.

  • max_lag_units (int): The absolute maximum lag magnitude (in units of ‘freq’) to test.

  • dropna_threshold (float): Minimum required fraction of non-NaN values after alignment for the data to be processed (e.g., 0.75 = 75% non-NaN data).

Returns: - tuple: (best_lag, max_correlation)

micromet.qaqc.data_cleaning.impute_missing_values(df, model, target_col, predictor_col)[source]

Imputes missing values using Linear Regression and returns the resulting imputed column as a Series, without creating a full DataFrame copy.

Return type:

Series

micromet.qaqc.data_cleaning.mask_by_rolling_window_combined(df, sig_col='H2O_SIG_STRGTH_MIN', rolling_window=9, threshold_value=0.8)[source]

Create a robust quality mask using instant and smoothed signal thresholds.

This function implements a ‘dual-condition’ filter to identify poor instrument performance (e.g., AGC or RSSI drops). It protects against over-masking transient spikes by requiring both the instantaneous signal AND a centered rolling median to fall below the threshold before a point is rejected.

Parameters:
  • df (DataFrame) – The input dataframe containing the signal strength telemetry.

  • sig_col (str) – The name of the column to evaluate.

  • rolling_window (int) – The size of the moving window (number of periods). An odd integer is recommended to ensure the window is perfectly centered on the timestamp.

  • threshold_value (float) – The minimum acceptable signal strength. Values below this are considered potential failures.

Returns:

A boolean Series (mask) where True indicates ‘Good Data’ (Keep) and False indicates ‘Bad Data’ (Filter).

Return type:

Series

Notes

  • Robustness: Uses a rolling median rather than a mean to ignore short-duration impulse noise (spikes) within the window.

  • Logic: A data point is masked ONLY if:

    (Instant Signal < Threshold) AND (Rolling Median < Threshold).

  • Edge Handling: Uses min_periods=1 to ensure valid masking at the beginning and end of the dataset.

  • Missing Data: Existing NaN values in sig_col are excluded from the printed quality report to provide an accurate ‘dropped points’ percentage.

micromet.qaqc.data_cleaning.mask_wind_direction(df, wd_col, start_deg, end_deg)[source]

Creates a boolean mask for bad wind directions.

Parameters: df (pd.DataFrame): Your dataset. wd_col (str): Name of the wind direction column (0-360). start_deg (float): The start of the exclusion zone (clockwise). end_deg (float): The end of the exclusion zone (clockwise).

Returns: pd.Series: A boolean mask where True = BAD data (inside the zone).

micromet.qaqc.data_cleaning.prep_parquet(station, df)[source]
micromet.qaqc.data_cleaning.set_range_to_nan(df, column_name, start_date, end_date, index_is_datetime=True, date_col=None)[source]

Sets values in a specified column to np.nan within a given datetime range.

Return type:

DataFrame

Args:

df: The pandas DataFrame. column_name: The name of the column whose values will be set to NaN. start_date: The start of the datetime range (inclusive). Can be a string

(e.g., ‘2025-10-01’) or a pd.Timestamp.

end_date: The end of the datetime range (inclusive). Can be a string

(e.g., ‘2025-10-02 12:00:00’) or a pd.Timestamp.

index_is_datetime: If True (default), the function uses the DataFrame’s index

for filtering.

date_col: If index_is_datetime is False, provide the name of the column

containing the datetime information for filtering.

Returns:

The modified pandas DataFrame.

micromet.qaqc.data_cleaning.train_linear_regression_model(df, target_col, predictor_col)[source]

Trains a Linear Regression model using complete data from two specified columns.

Return type:

Tuple[LinearRegression | None, Dict[str, Any]]

Args:

df: The input pandas DataFrame. target_col: The name of the column containing the dependent variable (Y). predictor_col: The name of the column containing the independent variable (X).

Returns:

A tuple containing: 1. The trained LinearRegression model instance (or None if training fails). 2. A dictionary of model results (e.g., intercept and coefficient).

micromet.qaqc.netrad_limits module

AmeriFlux-like Timestamp Alignment QA/QC

Implements the core ideas from the AmeriFlux Timestamp Alignment Module: - Compute potential incoming shortwave at the top of atmosphere (SW_IN_POT)

in local standard time (no DST).

  • Build 15-day non-overlapping “maximum diurnal composites”.

  • Compute (1) % of time composite observed radiation exceeds potential and (2) lag (in time steps) at which cross-correlation between observed and

    potential is maximized.

Also provides heuristic flags for: - Time zone mismatch / DST usage - Timestamp START vs END mis-specification - Stream desynchronization between SW_IN and PPFD_IN - Possible radiation sensor issues (shading/not level/unexpectedly high)

References

AmeriFlux Data QA/QC: Timestamp Alignment Module (Design: 15-day max diurnal composite; local standard time; exceedance &

cross-correlation interpretation). See:

https://ameriflux.lbl.gov/data/flux-data-products/data-qaqc/timestamp-alignment-module/

class micromet.qaqc.netrad_limits.WindowComposite(year, window_id, step_minutes, steps_per_day, comp_pot, comp_sw, comp_ppfd, pct_exceed_sw, pct_exceed_ppfd, lag_sw, corr_sw, lag_ppfd, corr_ppfd)[source]

Bases: object

A container for the results of a 15-day window analysis.

This dataclass holds the composite data and statistical results for a single 15-day analysis window.

year

The year of the analysis window.

Type:

int

window_id

The 15-day window number within the year.

Type:

int

step_minutes

The sampling interval in minutes.

Type:

int

steps_per_day

The number of time steps per day.

Type:

int

comp_pot

The maximum diurnal composite for potential incoming shortwave radiation.

Type:

np.ndarray

comp_sw

The maximum diurnal composite for observed shortwave radiation.

Type:

np.ndarray, optional

comp_ppfd

The maximum diurnal composite for observed PPFD.

Type:

np.ndarray, optional

pct_exceed_sw

The percentage of time the observed SW composite exceeds the potential.

Type:

float, optional

pct_exceed_ppfd

The percentage of time the observed PPFD composite exceeds the potential.

Type:

float, optional

lag_sw

The lag (in time steps) at which the SW cross-correlation is maximized.

Type:

int, optional

corr_sw

The maximum cross-correlation for shortwave radiation.

Type:

float, optional

lag_ppfd

The lag (in time steps) at which the PPFD cross-correlation is maximized.

Type:

int, optional

corr_ppfd

The maximum cross-correlation for PPFD.

Type:

float, optional

comp_pot: ndarray
comp_ppfd: ndarray | None
comp_sw: ndarray | None
corr_ppfd: float | None
corr_sw: float | None
lag_ppfd: int | None
lag_sw: int | None
pct_exceed_ppfd: float | None
pct_exceed_sw: float | None
step_minutes: int
steps_per_day: int
window_id: int
year: int
micromet.qaqc.netrad_limits.add_buffer(min_max, buffer=100)[source]

Add a buffer to a min/max tuple, with a hard lower limit.

This function expands a range defined by a min/max tuple by adding a buffer. The minimum value is not allowed to go below -200.

Parameters:
  • min_max (tuple) – A tuple containing the minimum and maximum values.

  • buffer (float) – The buffer to add to the max value and subtract from the min value. Defaults to 100.

Returns:

The buffered minimum and maximum values.

Return type:

tuple[float, float]

micromet.qaqc.netrad_limits.analyze_timestamp_alignment(df, *, lat=39.5, lon=-111.5, std_utc_offset_hours=-7, time_from='CENTER', start_col='TIMESTAMP_START', end_col='TIMESTAMP_END', time_col=None, sw_col='SW_IN', ppfd_col='PPFD_IN', assume_naive_is_local=False, max_lag_steps=6)[source]

Perform the main timestamp alignment analysis.

This function analyzes the timestamp alignment of radiation data by comparing observed values against potential top-of-atmosphere radiation. It computes composites, cross-correlations, and other statistics for 15-day windows.

Parameters:
  • df (DataFrame) – The input DataFrame containing timestamp and radiation data.

  • lat (float | None) – Latitude of the site.

  • lon (float | None) – Longitude of the site.

  • std_utc_offset_hours (int) – The UTC offset for local standard time.

  • time_from (str) – How to interpret the timestamp (‘CENTER’, ‘START’, ‘END’). Defaults to ‘CENTER’.

  • start_col (str) – Name of the start timestamp column. Defaults to ‘TIMESTAMP_START’.

  • end_col (str) – Name of the end timestamp column. Defaults to ‘TIMESTAMP_END’.

  • time_col (Optional[str]) – Name of a single datetime column. Defaults to None.

  • sw_col (str) – Name of the shortwave radiation column. Defaults to ‘SW_IN’.

  • ppfd_col (str) – Name of the PPFD column. Defaults to ‘PPFD_IN’.

  • assume_naive_is_local (bool) – Whether to assume naive timestamps are in local time. Defaults to False.

  • max_lag_steps (int) – Maximum lag for cross-correlation. Defaults to 6.

Returns:

A tuple containing a summary DataFrame of the analysis and a dictionary of WindowComposite objects for each window.

Return type:

Tuple[DataFrame, Dict[Tuple[int, int], WindowComposite]]

micromet.qaqc.netrad_limits.clear_sky_radiation(doy, hour, latitude=39.5)[source]

Estimate the clear-sky incoming shortwave radiation.

This function provides a simplified estimation of the incoming shortwave radiation on a clear day, taking into account the solar constant, Earth-Sun distance, and atmospheric transmissivity.

Parameters:
  • doy (int) – The day of the year.

  • hour (int) – The hour of the day.

  • latitude (float) – The latitude in degrees. Defaults to LATITUDE.

Returns:

The estimated clear-sky radiation in W/m^2.

Return type:

float

micromet.qaqc.netrad_limits.estimate_net_radiation_range(doy, hour)[source]

Estimate the min/max net radiation for a given hour and day of the year.

This function calculates a plausible range for net radiation by estimating the components of the radiation budget (shortwave and longwave radiation) under typical conditions.

Parameters:
  • doy (int) – The day of the year.

  • hour (int) – The hour of the day.

Returns:

A tuple containing the minimum and maximum estimated net radiation in W/m^2.

Return type:

tuple[float, float]

micromet.qaqc.netrad_limits.flag_issues(summary)[source]

Apply simple heuristics to flag likely timestamp and data quality issues.

This function uses a set of heuristics to identify potential issues such as timezone mismatches, DST usage, and sensor problems based on the summary of the timestamp alignment analysis.

Parameters:

summary (DataFrame) – A DataFrame containing the summary of the timestamp alignment analysis.

Returns:

A dictionary of flagged issues, where keys are issue types and values are descriptive messages.

Return type:

Dict[str, str]

micromet.qaqc.netrad_limits.hour_angle(hour)[source]

Calculate the hour angle for a given hour of the day.

The hour angle is the angular displacement of the Sun east or west of the local meridian due to Earth’s rotation.

Parameters:

hour (int) – The hour of the day (0-23).

Returns:

The hour angle in radians.

Return type:

float

micromet.qaqc.netrad_limits.longwave_radiation(T_kelvin)[source]

Estimate the longwave radiation using the Stefan-Boltzmann law.

This function calculates the longwave radiation emitted by a surface based on its temperature, using the Stefan-Boltzmann law.

Parameters:

T_kelvin (float) – The temperature of the surface in Kelvin.

Returns:

The estimated longwave radiation in W/m^2.

Return type:

float

micromet.qaqc.netrad_limits.plot_summary(summary, composites, which_year=None, outfile_prefix=None)[source]

Create and display summary plots of the timestamp alignment analysis.

This function generates a set of plots to visualize the results of the timestamp alignment analysis, including percent exceedance, lag times, and a composite overlay for the “worst” window.

Parameters:
  • summary (DataFrame) – A DataFrame containing the summary of the analysis.

  • composites (Dict[Tuple[int, int], WindowComposite]) – A dictionary of WindowComposite objects for each analysis window.

  • which_year (Optional[int]) – The year to plot. If None, all years are plotted. Defaults to None.

  • outfile_prefix (Optional[str]) – A prefix for the output plot filenames. If provided, plots are saved to files. Defaults to None.

Returns:

A dictionary of matplotlib Figure handles for the generated plots.

Return type:

dict

micromet.qaqc.netrad_limits.solar_declination(doy)[source]

Calculate the solar declination angle for a given day of the year.

The solar declination is the angle between the Earth’s equatorial plane and the line connecting the centers of the Earth and the Sun.

Parameters:

doy (int) – The day of the year (1-365 or 1-366).

Returns:

The solar declination angle in radians.

Return type:

float

micromet.qaqc.netrad_limits.solar_elevation(doy, hour, latitude=39.5)[source]

Calculate the solar elevation angle.

The solar elevation angle is the angle of the Sun above the horizon.

Parameters:
  • doy (int) – The day of the year.

  • hour (int) – The hour of the day.

  • latitude (float) – The latitude in degrees. Defaults to LATITUDE.

Returns:

The solar elevation angle in radians.

Return type:

float

micromet.qaqc.netrad_limits.sw_in_pot_noaa(dt_local_standard, lat_deg=40.7607, lon_deg=-111.8939, std_utc_offset_hours=-7)[source]

Compute top-of-atmosphere shortwave irradiance using NOAA’s approximations.

This function calculates the potential incoming shortwave radiation at the top of the atmosphere on a horizontal surface, using solar position approximations from NOAA.

Parameters:
  • dt_local_standard (DatetimeIndex) – A DatetimeIndex in local standard time (fixed UTC offset, no DST).

  • lat_deg (float | None) – Latitude in degrees (positive for Northern Hemisphere).

  • lon_deg (float | None) – Longitude in degrees (positive for Eastern Hemisphere).

  • std_utc_offset_hours (int | None) – The local standard time UTC offset (e.g., -7 for MST).

Returns:

A Series of potential incoming shortwave radiation (W/m^2), indexed by the input DatetimeIndex.

Return type:

Series

Raises:

ValueError – If the input DatetimeIndex is not timezone-aware.

micromet.qaqc.variable_limits module

This module contains the limits dictionary, which defines the expected physical and plausible ranges for various meteorological and flux variables. The keys of the dictionary are variable names, and the values are dictionaries containing metadata such as description, units, and min/max values.

Module contents

This package contains modules for quality assurance and quality control.