micromet.qaqc package
Submodules
micromet.qaqc.data_cleaning module
- micromet.qaqc.data_cleaning.adjust_wd(df, wd_col, public_azimuth, actual_azimuth, start_date, end_date)[source]
Adjusts wind direction values in a DataFrame based on sensor orientation differences.
This function calculates the offset between a ‘public’ (reported) azimuth and the ‘actual’ sensor azimuth, then applies this correction to a specific time range within the dataset, ensuring the results stay within the [0, 360) degree range.
- Args:
df (pd.DataFrame): The input dataframe, expected to have a DatetimeIndex. wd_col (str): The name of the column containing wind direction values. public_azimuth (float): The baseline or incorrectly assumed orientation angle. actual_azimuth (float): The true physical orientation angle of the sensor. start_date (str or pd.Timestamp): The beginning of the period to adjust (inclusive). end_date (str or pd.Timestamp): The end of the period to adjust (exclusive).
- Returns:
pd.DataFrame: A copy of the original DataFrame with corrected wind direction values.
- micromet.qaqc.data_cleaning.adjust_wind_direction(series, degrees)[source]
Adjusts wind direction in degrees and ensures the result stays within [0, 360).
Parameters: series (pd.Series): The wind direction column. degrees (float): Degrees to add (positive) or subtract (negative).
Returns: pd.Series: The adjusted values ready for assignment.
- micromet.qaqc.data_cleaning.apply_internal_flags(df, flag_cols, start_date, end_date, flag_value)[source]
Applies a specified flag value across multiple specified flag columns within a given date range. … (Docstring contents omitted for brevity, but they are correct)
- micromet.qaqc.data_cleaning.apply_lag_shift(df, detected_lag, freq_unit)[source]
Applies the inverse of the detected lag to a DataFrame’s datetime index to align it with the reference dataset.
Parameters: - df (pd.DataFrame): The DataFrame to be shifted (e.g., df1 from find_optimal_shift function). - detected_lag (int): The lag detected by find_optimal_shift (e.g., -60). - freq_unit (str): The frequency unit used for the lag (e.g., ‘D’, ‘H’, ‘30T’).
Returns: - pd.DataFrame: The DataFrame with the adjusted datetime index.
- micromet.qaqc.data_cleaning.despike_data_nan_aware(data, filter_size=5, threshold_factor=3.0)[source]
Remove outliers (spikes) from a 1D array using a NaN-aware median filter.
This function identifies spikes by comparing each data point to the median of its local neighborhood. It is specifically designed to handle arrays containing NaN values without allowing those NaNs to bias the filter or the noise statistics.
The process follows these steps: 1. Pads the data to handle edges using reflection. 2. Calculates a moving baseline using a sliding window median (ignoring NaNs). 3. Computes the residual noise and determines a threshold based on the
standard deviation of that noise.
Replaces values exceeding the threshold with the local median.
- Parameters:
data (array_like) – The input 1D signal or time-series data to be despiked. Can contain NaN values.
filter_size (int, optional) – The size of the sliding window used to calculate the local median. Must be an odd integer. Default is 5.
threshold_factor (float, optional) – The multiplier applied to the global standard deviation of the noise to determine the spike detection threshold. A higher value is less sensitive (detects fewer spikes). Default is 3.0.
- Returns:
despiked_data (ndarray) – A copy of the input data where identified spikes have been replaced by the local median. Original NaN values are preserved.
spike_mask (ndarray (bool)) – A boolean mask of the same shape as data, where True indicates a detected spike location.
Notes
This function uses np.nanmedian and np.nanstd, which are computationally more expensive than their standard counterparts but necessary if the dataset is missing values.
If a window consists entirely of NaNs, the resulting baseline value for that window will be NaN.
Examples
>>> signal = [10, 11, 100, 12, np.nan, 11, 10] >>> clean, mask = despike_data_nan_aware(signal, filter_size=3) >>> clean array([10., 11., 11., 12., nan, 11., 10.])
- micromet.qaqc.data_cleaning.find_optimal_shift(df1, df2, value_col1, value_col2, freq='h', min_lag_units=100, max_lag_units=500, dropna_threshold=0.75)[source]
Identifies the optimal time shift (lag) required to align two datetime-indexed pandas DataFrames by maximizing the cross-correlation between two specified columns.
This version searches for shifts between +/- min_lag_units and +/- max_lag_units.
Interpreting the Lag: - Positive Lag: df2 is behind df1 (df2 needs to be shifted FORWARD). - Negative Lag: df2 is ahead of df1 (df2 needs to be shifted BACKWARD).
Parameters: - df1, df2 (pd.DataFrame): DataFrames with a datetime index. - value_col1 (str): Name of the column in df1 to compare. - value_col2 (str): Name of the column in df2 to compare. - freq (str): Resampling frequency (e.g., ‘D’ for daily, ‘H’ for hourly).
This determines the unit of the returned lag.
min_lag_units (int): The absolute minimum lag magnitude (in units of ‘freq’) to test.
max_lag_units (int): The absolute maximum lag magnitude (in units of ‘freq’) to test.
dropna_threshold (float): Minimum required fraction of non-NaN values after alignment for the data to be processed (e.g., 0.75 = 75% non-NaN data).
Returns: - tuple: (best_lag, max_correlation)
- micromet.qaqc.data_cleaning.impute_missing_values(df, model, target_col, predictor_col)[source]
Imputes missing values using Linear Regression and returns the resulting imputed column as a Series, without creating a full DataFrame copy.
- Return type:
- micromet.qaqc.data_cleaning.mask_by_rolling_window_combined(df, sig_col='H2O_SIG_STRGTH_MIN', rolling_window=9, threshold_value=0.8)[source]
Create a robust quality mask using instant and smoothed signal thresholds.
This function implements a ‘dual-condition’ filter to identify poor instrument performance (e.g., AGC or RSSI drops). It protects against over-masking transient spikes by requiring both the instantaneous signal AND a centered rolling median to fall below the threshold before a point is rejected.
- Parameters:
df (
DataFrame) – The input dataframe containing the signal strength telemetry.sig_col (
str) – The name of the column to evaluate.rolling_window (
int) – The size of the moving window (number of periods). An odd integer is recommended to ensure the window is perfectly centered on the timestamp.threshold_value (
float) – The minimum acceptable signal strength. Values below this are considered potential failures.
- Returns:
A boolean Series (mask) where True indicates ‘Good Data’ (Keep) and False indicates ‘Bad Data’ (Filter).
- Return type:
Notes
Robustness: Uses a rolling median rather than a mean to ignore short-duration impulse noise (spikes) within the window.
- Logic: A data point is masked ONLY if:
(Instant Signal < Threshold) AND (Rolling Median < Threshold).
Edge Handling: Uses min_periods=1 to ensure valid masking at the beginning and end of the dataset.
Missing Data: Existing NaN values in sig_col are excluded from the printed quality report to provide an accurate ‘dropped points’ percentage.
- micromet.qaqc.data_cleaning.mask_wind_direction(df, wd_col, start_deg, end_deg)[source]
Creates a boolean mask for bad wind directions.
Parameters: df (pd.DataFrame): Your dataset. wd_col (str): Name of the wind direction column (0-360). start_deg (float): The start of the exclusion zone (clockwise). end_deg (float): The end of the exclusion zone (clockwise).
Returns: pd.Series: A boolean mask where True = BAD data (inside the zone).
- micromet.qaqc.data_cleaning.set_range_to_nan(df, column_name, start_date, end_date, index_is_datetime=True, date_col=None)[source]
Sets values in a specified column to np.nan within a given datetime range.
- Return type:
- Args:
df: The pandas DataFrame. column_name: The name of the column whose values will be set to NaN. start_date: The start of the datetime range (inclusive). Can be a string
(e.g., ‘2025-10-01’) or a pd.Timestamp.
- end_date: The end of the datetime range (inclusive). Can be a string
(e.g., ‘2025-10-02 12:00:00’) or a pd.Timestamp.
- index_is_datetime: If True (default), the function uses the DataFrame’s index
for filtering.
- date_col: If index_is_datetime is False, provide the name of the column
containing the datetime information for filtering.
- Returns:
The modified pandas DataFrame.
- micromet.qaqc.data_cleaning.train_linear_regression_model(df, target_col, predictor_col)[source]
Trains a Linear Regression model using complete data from two specified columns.
- Args:
df: The input pandas DataFrame. target_col: The name of the column containing the dependent variable (Y). predictor_col: The name of the column containing the independent variable (X).
- Returns:
A tuple containing: 1. The trained LinearRegression model instance (or None if training fails). 2. A dictionary of model results (e.g., intercept and coefficient).
micromet.qaqc.netrad_limits module
AmeriFlux-like Timestamp Alignment QA/QC
Implements the core ideas from the AmeriFlux Timestamp Alignment Module: - Compute potential incoming shortwave at the top of atmosphere (SW_IN_POT)
in local standard time (no DST).
Build 15-day non-overlapping “maximum diurnal composites”.
Compute (1) % of time composite observed radiation exceeds potential and (2) lag (in time steps) at which cross-correlation between observed and
potential is maximized.
Also provides heuristic flags for: - Time zone mismatch / DST usage - Timestamp START vs END mis-specification - Stream desynchronization between SW_IN and PPFD_IN - Possible radiation sensor issues (shading/not level/unexpectedly high)
References
AmeriFlux Data QA/QC: Timestamp Alignment Module (Design: 15-day max diurnal composite; local standard time; exceedance &
cross-correlation interpretation). See:
https://ameriflux.lbl.gov/data/flux-data-products/data-qaqc/timestamp-alignment-module/
- class micromet.qaqc.netrad_limits.WindowComposite(year, window_id, step_minutes, steps_per_day, comp_pot, comp_sw, comp_ppfd, pct_exceed_sw, pct_exceed_ppfd, lag_sw, corr_sw, lag_ppfd, corr_ppfd)[source]
Bases:
objectA container for the results of a 15-day window analysis.
This dataclass holds the composite data and statistical results for a single 15-day analysis window.
- comp_pot
The maximum diurnal composite for potential incoming shortwave radiation.
- Type:
np.ndarray
- comp_sw
The maximum diurnal composite for observed shortwave radiation.
- Type:
np.ndarray, optional
- comp_ppfd
The maximum diurnal composite for observed PPFD.
- Type:
np.ndarray, optional
- pct_exceed_sw
The percentage of time the observed SW composite exceeds the potential.
- Type:
float, optional
- pct_exceed_ppfd
The percentage of time the observed PPFD composite exceeds the potential.
- Type:
float, optional
- micromet.qaqc.netrad_limits.add_buffer(min_max, buffer=100)[source]
Add a buffer to a min/max tuple, with a hard lower limit.
This function expands a range defined by a min/max tuple by adding a buffer. The minimum value is not allowed to go below -200.
- micromet.qaqc.netrad_limits.analyze_timestamp_alignment(df, *, lat=39.5, lon=-111.5, std_utc_offset_hours=-7, time_from='CENTER', start_col='TIMESTAMP_START', end_col='TIMESTAMP_END', time_col=None, sw_col='SW_IN', ppfd_col='PPFD_IN', assume_naive_is_local=False, max_lag_steps=6)[source]
Perform the main timestamp alignment analysis.
This function analyzes the timestamp alignment of radiation data by comparing observed values against potential top-of-atmosphere radiation. It computes composites, cross-correlations, and other statistics for 15-day windows.
- Parameters:
df (
DataFrame) – The input DataFrame containing timestamp and radiation data.std_utc_offset_hours (
int) – The UTC offset for local standard time.time_from (
str) – How to interpret the timestamp (‘CENTER’, ‘START’, ‘END’). Defaults to ‘CENTER’.start_col (
str) – Name of the start timestamp column. Defaults to ‘TIMESTAMP_START’.end_col (
str) – Name of the end timestamp column. Defaults to ‘TIMESTAMP_END’.time_col (
Optional[str]) – Name of a single datetime column. Defaults to None.sw_col (
str) – Name of the shortwave radiation column. Defaults to ‘SW_IN’.ppfd_col (
str) – Name of the PPFD column. Defaults to ‘PPFD_IN’.assume_naive_is_local (
bool) – Whether to assume naive timestamps are in local time. Defaults to False.max_lag_steps (
int) – Maximum lag for cross-correlation. Defaults to 6.
- Returns:
A tuple containing a summary DataFrame of the analysis and a dictionary of WindowComposite objects for each window.
- Return type:
- micromet.qaqc.netrad_limits.clear_sky_radiation(doy, hour, latitude=39.5)[source]
Estimate the clear-sky incoming shortwave radiation.
This function provides a simplified estimation of the incoming shortwave radiation on a clear day, taking into account the solar constant, Earth-Sun distance, and atmospheric transmissivity.
- micromet.qaqc.netrad_limits.estimate_net_radiation_range(doy, hour)[source]
Estimate the min/max net radiation for a given hour and day of the year.
This function calculates a plausible range for net radiation by estimating the components of the radiation budget (shortwave and longwave radiation) under typical conditions.
- micromet.qaqc.netrad_limits.flag_issues(summary)[source]
Apply simple heuristics to flag likely timestamp and data quality issues.
This function uses a set of heuristics to identify potential issues such as timezone mismatches, DST usage, and sensor problems based on the summary of the timestamp alignment analysis.
- micromet.qaqc.netrad_limits.hour_angle(hour)[source]
Calculate the hour angle for a given hour of the day.
The hour angle is the angular displacement of the Sun east or west of the local meridian due to Earth’s rotation.
- micromet.qaqc.netrad_limits.longwave_radiation(T_kelvin)[source]
Estimate the longwave radiation using the Stefan-Boltzmann law.
This function calculates the longwave radiation emitted by a surface based on its temperature, using the Stefan-Boltzmann law.
- micromet.qaqc.netrad_limits.plot_summary(summary, composites, which_year=None, outfile_prefix=None)[source]
Create and display summary plots of the timestamp alignment analysis.
This function generates a set of plots to visualize the results of the timestamp alignment analysis, including percent exceedance, lag times, and a composite overlay for the “worst” window.
- Parameters:
summary (
DataFrame) – A DataFrame containing the summary of the analysis.composites (
Dict[Tuple[int,int],WindowComposite]) – A dictionary of WindowComposite objects for each analysis window.which_year (
Optional[int]) – The year to plot. If None, all years are plotted. Defaults to None.outfile_prefix (
Optional[str]) – A prefix for the output plot filenames. If provided, plots are saved to files. Defaults to None.
- Returns:
A dictionary of matplotlib Figure handles for the generated plots.
- Return type:
- micromet.qaqc.netrad_limits.solar_declination(doy)[source]
Calculate the solar declination angle for a given day of the year.
The solar declination is the angle between the Earth’s equatorial plane and the line connecting the centers of the Earth and the Sun.
- micromet.qaqc.netrad_limits.solar_elevation(doy, hour, latitude=39.5)[source]
Calculate the solar elevation angle.
The solar elevation angle is the angle of the Sun above the horizon.
- micromet.qaqc.netrad_limits.sw_in_pot_noaa(dt_local_standard, lat_deg=40.7607, lon_deg=-111.8939, std_utc_offset_hours=-7)[source]
Compute top-of-atmosphere shortwave irradiance using NOAA’s approximations.
This function calculates the potential incoming shortwave radiation at the top of the atmosphere on a horizontal surface, using solar position approximations from NOAA.
- Parameters:
dt_local_standard (
DatetimeIndex) – A DatetimeIndex in local standard time (fixed UTC offset, no DST).lat_deg (
float|None) – Latitude in degrees (positive for Northern Hemisphere).lon_deg (
float|None) – Longitude in degrees (positive for Eastern Hemisphere).std_utc_offset_hours (
int|None) – The local standard time UTC offset (e.g., -7 for MST).
- Returns:
A Series of potential incoming shortwave radiation (W/m^2), indexed by the input DatetimeIndex.
- Return type:
- Raises:
ValueError – If the input DatetimeIndex is not timezone-aware.
micromet.qaqc.variable_limits module
This module contains the limits dictionary, which defines the expected physical and plausible ranges for various meteorological and flux variables. The keys of the dictionary are variable names, and the values are dictionaries containing metadata such as description, units, and min/max values.
Module contents
This package contains modules for quality assurance and quality control.