micromet.format.transformers package
Submodules
micromet.format.transformers.cleanup module
Column cleanup and type conversion functions for the reformatter pipeline.
This module handles dropping unwanted columns, setting proper data types, and filtering soil-related columns.
- micromet.format.transformers.cleanup.drop_extra_soil_columns(df, config, logger)[source]
Drop redundant or unused soil-related columns from the DataFrame.
This function identifies and removes soil-related columns that are considered extra or redundant based on the provided configuration.
- Parameters:
- Returns:
The DataFrame with extra soil columns removed.
- Return type:
- micromet.format.transformers.cleanup.drop_extras(df, config)[source]
Drop extra or unwanted columns from the DataFrame based on configuration.
This function removes columns from the DataFrame that are listed in the ‘drop_cols’ section of the configuration dictionary.
- micromet.format.transformers.cleanup.process_and_match_columns(df_full, amflux)[source]
Cleans column names of df_full by removing ‘_1’, ‘_2’, ‘_3’, and ‘_4’ suffixes, compares the cleaned names against an ‘amflux’ variable list, and returns a DataFrame of the results, along with printing the unmatched columns.
- Return type:
- Args:
df_full: The DataFrame whose columns need to be cleaned and matched. amflux: A DataFrame or Series that contains the ‘Variable’ column
or is the Series of variables to match against.
- Returns:
A DataFrame containing the original columns, the cleaned columns, and a boolean indicating if the cleaned column is in the amflux list.
- micromet.format.transformers.cleanup.set_number_types(df, logger)[source]
Convert columns in a DataFrame to the appropriate numeric types.
This function iterates through the columns of a DataFrame and converts them to numeric types (integer or float) where appropriate. It handles special cases for certain columns and logs warnings for duplicate columns.
micromet.format.transformers.columns module
Column naming and organization functions for the reformatter pipeline.
This module handles column renaming, prefix normalization, legacy format updates, and column ordering operations.
- micromet.format.transformers.columns.col_order(df, logger)[source]
Reorder DataFrame columns to place priority columns at the beginning.
This function moves specified columns (‘TIMESTAMP_END’, ‘TIMESTAMP_START’) to the front of the DataFrame for better readability and consistency.
- micromet.format.transformers.columns.make_unique(cols)[source]
Make a list of column names unique by appending numeric suffixes to duplicates.
This function takes a list of column names and ensures that all names are unique by appending a numeric suffix (e.g., ‘.1’, ‘.2’) to any duplicate names.
- micromet.format.transformers.columns.make_unique_cols(df)[source]
Ensure that all column names in a DataFrame are unique.
This function uses the make_unique helper function to append numeric suffixes to any duplicate column names, ensuring that every column has a unique identifier.
- micromet.format.transformers.columns.modernize_soil_legacy(df, logger)[source]
Update legacy soil sensor column names to a standardized format.
This function identifies and renames legacy soil sensor columns to a modern, standardized format based on predefined mapping rules for depth and orientation.
- micromet.format.transformers.columns.normalize_prefixes(df, logger)[source]
Normalize column name prefixes for soil and temperature measurements.
This function standardizes column name prefixes by renaming them based on a set of predefined patterns. For example, it can change ‘BulkEC_’ to ‘EC_’.
- micromet.format.transformers.columns.rename_columns(df, data_type, config, logger)[source]
Rename DataFrame columns based on configuration and standardize their names.
This function renames columns using a predefined mapping from the configuration, normalizes soil and temperature-related prefixes, and converts all column names to uppercase.
- Parameters:
df (
DataFrame) – The input DataFrame with columns to be renamed.data_type (
str) – The type of data (‘eddy’ or ‘met’), which determines which renaming map to use.config (
dict) – The configuration dictionary containing the renaming maps.logger (
Logger) – The logger for tracking the renaming process.
- Returns:
The DataFrame with renamed and standardized column names.
- Return type:
micromet.format.transformers.corrections module
Data correction functions for the reformatter pipeline.
This module contains variable-specific corrections and data value fixes, including handling special values, unit conversions, and merging duplicate columns.
- micromet.format.transformers.corrections.apply_fixes(df, logger)[source]
Apply a set of minor, variable-specific data corrections.
This function serves as a pipeline for applying several small, targeted fixes to the data, such as correcting ‘TAU’ values, converting soil water content to percent, and scaling SSITC test values.
- micromet.format.transformers.corrections.fill_na_drop_dups(df)[source]
Merge any number of duplicate columns with numeric suffixes (
.1,.2, …), treating-9999as missing, and drop redundant duplicates.This function groups columns by their base name (the part before a trailing
.<number>suffix). For each group, it merges values across the base column (if present) and all suffixed duplicates by preferring the first non-missing value at each row. During merging, the sentinel value-9999is treated as missing (converted toNaN). After merging, remaining missing values are filled back with-9999and all duplicate suffixed columns are dropped, preserving the base column as the canonical result.- Parameters:
df (
DataFrame) – Input DataFrame that may contain duplicate columns named with numeric suffixes (e.g.,"A.1","A.2", …). The unsuffixed base column (e.g.,"A") is optional. Sentinel missing values are expected to be encoded as-9999.- Returns:
A new DataFrame where, for each base column, all suffixed duplicates have been merged into the base column and the duplicates removed. Any remaining missing values are filled with
-9999.- Return type:
Notes
Columns are grouped by the regex pattern
r"^(?P<base>.+?)\.(?P<idx>\d+)$". Columns not matching this pattern are treated as base columns.Merge precedence follows ascending numeric suffix order, with the base column (if present) considered first.
The input DataFrame is not modified in place; a copy is returned.
Examples
>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame({ ... "A": [1, -9999, 3, -9999], ... "A.1": [np.nan, 2, -9999, 4], ... "A.2": [-9999, 9, np.nan, -9999], ... "B.1": [10, -9999, np.nan, 13], # no base 'B' column present ... "B.3": [np.nan, 11, 12, -9999] ... }) >>> fill_na_drop_dups(df) A B 0 1 10.0 1 2 11.0 2 3 12.0 3 4 13.0
- micromet.format.transformers.corrections.fix_swc_percent(df, logger)[source]
Convert fractional soil water content (SWC) values to percentages.
This function checks soil water content columns (those starting with ‘SWC_’) and, if the values appear to be fractional (<= 1.5), multiplies them by 100 to convert them to percentages.
- micromet.format.transformers.corrections.rating(x)[source]
Categorize a numeric value into a discrete rating level (0, 1, or 2).
This function categorizes a numeric value into one of three levels: - 0 for values between 0 and 3. - 1 for values between 4 and 6. - 2 for all other values.
- Parameters:
x (numeric or None) – The input value to be rated.
- Returns:
The rating level (0, 1, or 2).
- Return type:
- micromet.format.transformers.corrections.scale_and_convert(column)[source]
Apply a rating transformation and convert the column to float type.
This function applies a ‘rating’ function to each element of the Series and then converts the entire Series to float.
- micromet.format.transformers.corrections.ssitc_scale(df, logger)[source]
Scale SSITC (Signal Strength and Integrity Test) columns.
This function checks specific SSITC columns and, if their values exceed a certain threshold (3), applies a scaling and rating transformation to them.
- micromet.format.transformers.corrections.tau_fixer(df, threshold=0.5, logger=None)[source]
Replace zero values in the ‘TAU’ column with NaN and flips sign if needed.
Loops through all columns with TAU in the name that don’t also have SSITC or QC in the name.
This function checks for zero values or negative infinity values in the ‘TAU’ column and replaces them with NaN. This is often done to handle cases where zero represents a missing or invalid measurement.
The function also determines whether to reverse the sign of TAU. If more than the specified threshold of TAU values are positive, it flips the sign of all TAU values.
micromet.format.transformers.interval_updates module
This module contains a dictionary of the datetime when sampling freuency was updated from 30 minutes to 60 minutes for eddy data (first item in list) and met data (second item in list).
It also contains a funtion that subsets out data to only include data from before or after the interval switch, for a dataframe with a multindex of STATIONID and DATETIME_END
- micromet.format.transformers.interval_updates.subset_interval(df, date_dict, interval, data_type)[source]
Subsets a MultiIndex DataFrame based on station ID, a date cutoff, and a data_type, using a single vectorized boolean mask.
- Return type:
- Args:
- df (pd.DataFrame): MultiIndex DataFrame with levels ‘STATIONID’
and ‘DATETIME_END’.
- date_dict (dict): Dictionary where keys are ‘STATIONID’ and values
are a list of two date strings [date1, date2].
- interval (int): Condition for subsetting. 30 for dates <= cutoff,
60 for dates > cutoff.
- data_type (str): Determines which date to use as the cutoff:
‘eddy’ uses the first date (index 0). ‘met’ uses the second date (index 1).
- Returns:
pd.DataFrame: The subsetted DataFrame containing data from all relevant stations.
micromet.format.transformers.timestamp_update module
various scripts for trying to address timestamp issues in the data
- micromet.format.transformers.timestamp_update.process_by_interval(in_df, key, interval_dict, datatype)[source]
The goal of this script is to use the interval_updates dictionary to identify when data switched from 30 to 60 minute sampling and then process the data correctly.
- micromet.format.transformers.timestamp_update.resample_alternating_frequency_with_other(df, min_records_threshold=24)[source]
Identifies contiguous blocks of data, resamples 30min/60min blocks, and assigns ‘OTHER’ to the timestep for unclassified (non-gap) blocks.
- micromet.format.transformers.timestamp_update.resample_single_frequency_switch(df, sample_size=100)[source]
Resamples a DataFrame based on a single detected frequency switch (30min to 60min). It uses the mode of the first 100 records to robustly determine the initial frequency, handling minor clock jitter and occasional gaps.
- Args:
df (pd.DataFrame): DataFrame with a DatetimeIndex. sample_size (int): The number of initial records to analyze for the starting routine.
- Returns:
pd.DataFrame: Resampled DataFrame with a ‘timestep’ column.
micromet.format.transformers.timestamps module
Timestamp transformation functions for the reformatter pipeline.
This module handles all datetime-related operations including timestamp detection, conversion, resampling, and formatting.
- micromet.format.transformers.timestamps.add_ameriflux_timestamps(df, interval_minutes=30)[source]
Creates TIMESTAMP_START and TIMESTAMP_END columns from a DatetimeIndex in the YYYYMMDDHHmm format required by AmeriFlux.
- micromet.format.transformers.timestamps.fix_timestamps(df, logger)[source]
Convert the timestamp column to datetime objects and handle missing values.
This function identifies the timestamp column, converts it to datetime objects, and removes any rows where the timestamp could not be parsed.
- micromet.format.transformers.timestamps.infer_datetime_col(df, logger)[source]
Infer the name of the timestamp column in a DataFrame.
This function searches for a timestamp column in the DataFrame by checking a list of common names (e.g., ‘TIMESTAMP_END’). If a matching column is found, its name is returned. Otherwise, it logs a warning and returns the name of the first column.
- micromet.format.transformers.timestamps.resample_timestamps(df, interval, logger)[source]
Resample a DataFrame to 30- or 60- minute intervals.
This function resamples the DataFrame to a fixed 30-or 60-minute frequency based on the ‘DATETIME_END’ column. It also handles duplicate timestamps by selecting the first available value.
- micromet.format.transformers.timestamps.timestamp_reset(df, minutes=30)[source]
Reset TIMESTAMP_START and TIMESTAMP_END columns based on the DataFrame index.
This function generates new ‘TIMESTAMP_START’ and ‘TIMESTAMP_END’ columns based on the DataFrame’s datetime index. The ‘TIMESTAMP_START’ is calculated by subtracting a specified number of minutes to the start time.
micromet.format.transformers.validation module
Data validation and quality control functions for the reformatter pipeline.
This module handles applying physical limits to data values and detecting stuck or anomalous sensor readings.
- micromet.format.transformers.validation.apply_physical_limits(df, how='mask', inplace=False, prefer_longest_key=True, return_mask=False, round_et=True)[source]
Apply physical Min/Max bounds to columns in a DataFrame.
This function applies physical limits (minimum and maximum) to the columns of a DataFrame. It can either mask out-of-bounds values with NaN or clip them to the limits.
- Parameters:
df (
DataFrame) – The input DataFrame to which the limits will be applied.how (
str) – The method to use for applying limits: ‘mask’ (default) or ‘clip’.inplace (
bool) – If True, modify the DataFrame in place. Defaults to False.prefer_longest_key (
bool) – If True, prefer longer matching keys from the limits dictionary. Defaults to True.return_mask (
bool) – If True, return a boolean mask of the values that were flagged. Defaults to False.round_et (
bool) – If True, ET values below 0 will be rounded to 1 digit before applying variable limits. Defaults to False
- Returns:
A tuple containing: - The DataFrame with physical limits applied. - A boolean mask of flagged values (if return_mask is True). - A report summarizing the number of flagged values for each column.
- Return type:
- micromet.format.transformers.validation.mask_stuck_values(df, threshold, columns=None, tolerance=None, mask_value=nan, return_mask=False)[source]
Detect and mask ‘stuck’ values in a datetime-indexed DataFrame.
A run is considered ‘stuck’ when the series does not change (within an optional numeric tolerance) for at least threshold. Threshold can be a count of rows (int) or a time duration (str like ‘30min’ / ‘2H’ or pd.Timedelta).
- Parameters:
df (
DataFrame) – DataFrame with a DatetimeIndex (required).threshold (
Union[int,str,Timedelta]) – Minimum length of a non-changing run to be masked. - If int: count of consecutive rows (e.g., 5). - If str or Timedelta: minimum duration (e.g., ‘30min’, pd.Timedelta(‘2H’)).columns (
Optional[Iterable[str]]) – Subset of columns to check. Defaults to all columns.tolerance (
Optional[float]) – For numeric columns only: treat changes with absolute difference <= tolerance as ‘no change’. If None, exact equality is used.mask_value (any, default np.nan) – Value to assign to masked entries.
return_mask (
bool) – If True, also return a boolean DataFrame mask where True marks masked cells.
- Return type:
Union[Tuple[DataFrame,DataFrame],Tuple[DataFrame,DataFrame,DataFrame]]- Returns:
masked_df (pd.DataFrame) – Copy of df with stuck runs masked.
report (pd.DataFrame) – Tidy report with one row per masked run, columns: [‘column’,’value’,’start’,’end’,’n_rows’,’duration’,’threshold_type’,’threshold_value’]
mask_df (pd.DataFrame (optional)) – Boolean DataFrame (same shape as df[columns]) with True where values were masked.
Notes
NaNs act as boundaries and are never considered part of a ‘stuck’ run.
For irregular time steps and time-based thresholds, the run ‘duration’ is computed as end_time - start_time (inclusive of row timestamps).
Entire runs that meet/exceed the threshold are masked (not just the tail beyond threshold).
Module contents
Data transformation functions for the reformatter pipeline.
This package contains modular transformation functions organized by category: - timestamps: Datetime handling and resampling - columns: Column naming, renaming, and organization - validation: Data quality checks and boundary enforcement - corrections: Variable-specific data fixes - cleanup: Column filtering and type setting
For backward compatibility, all functions are re-exported at the package level.
- micromet.format.transformers.apply_fixes(df, logger)[source]
Apply a set of minor, variable-specific data corrections.
This function serves as a pipeline for applying several small, targeted fixes to the data, such as correcting ‘TAU’ values, converting soil water content to percent, and scaling SSITC test values.
- micromet.format.transformers.apply_physical_limits(df, how='mask', inplace=False, prefer_longest_key=True, return_mask=False, round_et=True)[source]
Apply physical Min/Max bounds to columns in a DataFrame.
This function applies physical limits (minimum and maximum) to the columns of a DataFrame. It can either mask out-of-bounds values with NaN or clip them to the limits.
- Parameters:
df (
DataFrame) – The input DataFrame to which the limits will be applied.how (
str) – The method to use for applying limits: ‘mask’ (default) or ‘clip’.inplace (
bool) – If True, modify the DataFrame in place. Defaults to False.prefer_longest_key (
bool) – If True, prefer longer matching keys from the limits dictionary. Defaults to True.return_mask (
bool) – If True, return a boolean mask of the values that were flagged. Defaults to False.round_et (
bool) – If True, ET values below 0 will be rounded to 1 digit before applying variable limits. Defaults to False
- Returns:
A tuple containing: - The DataFrame with physical limits applied. - A boolean mask of flagged values (if return_mask is True). - A report summarizing the number of flagged values for each column.
- Return type:
- micromet.format.transformers.col_order(df, logger)[source]
Reorder DataFrame columns to place priority columns at the beginning.
This function moves specified columns (‘TIMESTAMP_END’, ‘TIMESTAMP_START’) to the front of the DataFrame for better readability and consistency.
- micromet.format.transformers.drop_extra_soil_columns(df, config, logger)[source]
Drop redundant or unused soil-related columns from the DataFrame.
This function identifies and removes soil-related columns that are considered extra or redundant based on the provided configuration.
- Parameters:
- Returns:
The DataFrame with extra soil columns removed.
- Return type:
- micromet.format.transformers.drop_extras(df, config)[source]
Drop extra or unwanted columns from the DataFrame based on configuration.
This function removes columns from the DataFrame that are listed in the ‘drop_cols’ section of the configuration dictionary.
- micromet.format.transformers.fill_na_drop_dups(df)[source]
Merge any number of duplicate columns with numeric suffixes (
.1,.2, …), treating-9999as missing, and drop redundant duplicates.This function groups columns by their base name (the part before a trailing
.<number>suffix). For each group, it merges values across the base column (if present) and all suffixed duplicates by preferring the first non-missing value at each row. During merging, the sentinel value-9999is treated as missing (converted toNaN). After merging, remaining missing values are filled back with-9999and all duplicate suffixed columns are dropped, preserving the base column as the canonical result.- Parameters:
df (
DataFrame) – Input DataFrame that may contain duplicate columns named with numeric suffixes (e.g.,"A.1","A.2", …). The unsuffixed base column (e.g.,"A") is optional. Sentinel missing values are expected to be encoded as-9999.- Returns:
A new DataFrame where, for each base column, all suffixed duplicates have been merged into the base column and the duplicates removed. Any remaining missing values are filled with
-9999.- Return type:
Notes
Columns are grouped by the regex pattern
r"^(?P<base>.+?)\.(?P<idx>\d+)$". Columns not matching this pattern are treated as base columns.Merge precedence follows ascending numeric suffix order, with the base column (if present) considered first.
The input DataFrame is not modified in place; a copy is returned.
Examples
>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame({ ... "A": [1, -9999, 3, -9999], ... "A.1": [np.nan, 2, -9999, 4], ... "A.2": [-9999, 9, np.nan, -9999], ... "B.1": [10, -9999, np.nan, 13], # no base 'B' column present ... "B.3": [np.nan, 11, 12, -9999] ... }) >>> fill_na_drop_dups(df) A B 0 1 10.0 1 2 11.0 2 3 12.0 3 4 13.0
- micromet.format.transformers.fix_swc_percent(df, logger)[source]
Convert fractional soil water content (SWC) values to percentages.
This function checks soil water content columns (those starting with ‘SWC_’) and, if the values appear to be fractional (<= 1.5), multiplies them by 100 to convert them to percentages.
- micromet.format.transformers.fix_timestamps(df, logger)[source]
Convert the timestamp column to datetime objects and handle missing values.
This function identifies the timestamp column, converts it to datetime objects, and removes any rows where the timestamp could not be parsed.
- micromet.format.transformers.infer_datetime_col(df, logger)[source]
Infer the name of the timestamp column in a DataFrame.
This function searches for a timestamp column in the DataFrame by checking a list of common names (e.g., ‘TIMESTAMP_END’). If a matching column is found, its name is returned. Otherwise, it logs a warning and returns the name of the first column.
- micromet.format.transformers.make_unique(cols)[source]
Make a list of column names unique by appending numeric suffixes to duplicates.
This function takes a list of column names and ensures that all names are unique by appending a numeric suffix (e.g., ‘.1’, ‘.2’) to any duplicate names.
- micromet.format.transformers.make_unique_cols(df)[source]
Ensure that all column names in a DataFrame are unique.
This function uses the make_unique helper function to append numeric suffixes to any duplicate column names, ensuring that every column has a unique identifier.
- micromet.format.transformers.mask_stuck_values(df, threshold, columns=None, tolerance=None, mask_value=nan, return_mask=False)[source]
Detect and mask ‘stuck’ values in a datetime-indexed DataFrame.
A run is considered ‘stuck’ when the series does not change (within an optional numeric tolerance) for at least threshold. Threshold can be a count of rows (int) or a time duration (str like ‘30min’ / ‘2H’ or pd.Timedelta).
- Parameters:
df (
DataFrame) – DataFrame with a DatetimeIndex (required).threshold (
Union[int,str,Timedelta]) – Minimum length of a non-changing run to be masked. - If int: count of consecutive rows (e.g., 5). - If str or Timedelta: minimum duration (e.g., ‘30min’, pd.Timedelta(‘2H’)).columns (
Optional[Iterable[str]]) – Subset of columns to check. Defaults to all columns.tolerance (
Optional[float]) – For numeric columns only: treat changes with absolute difference <= tolerance as ‘no change’. If None, exact equality is used.mask_value (any, default np.nan) – Value to assign to masked entries.
return_mask (
bool) – If True, also return a boolean DataFrame mask where True marks masked cells.
- Return type:
Union[Tuple[DataFrame,DataFrame],Tuple[DataFrame,DataFrame,DataFrame]]- Returns:
masked_df (pd.DataFrame) – Copy of df with stuck runs masked.
report (pd.DataFrame) – Tidy report with one row per masked run, columns: [‘column’,’value’,’start’,’end’,’n_rows’,’duration’,’threshold_type’,’threshold_value’]
mask_df (pd.DataFrame (optional)) – Boolean DataFrame (same shape as df[columns]) with True where values were masked.
Notes
NaNs act as boundaries and are never considered part of a ‘stuck’ run.
For irregular time steps and time-based thresholds, the run ‘duration’ is computed as end_time - start_time (inclusive of row timestamps).
Entire runs that meet/exceed the threshold are masked (not just the tail beyond threshold).
- micromet.format.transformers.modernize_soil_legacy(df, logger)[source]
Update legacy soil sensor column names to a standardized format.
This function identifies and renames legacy soil sensor columns to a modern, standardized format based on predefined mapping rules for depth and orientation.
- micromet.format.transformers.normalize_prefixes(df, logger)[source]
Normalize column name prefixes for soil and temperature measurements.
This function standardizes column name prefixes by renaming them based on a set of predefined patterns. For example, it can change ‘BulkEC_’ to ‘EC_’.
- micromet.format.transformers.process_and_match_columns(df_full, amflux)[source]
Cleans column names of df_full by removing ‘_1’, ‘_2’, ‘_3’, and ‘_4’ suffixes, compares the cleaned names against an ‘amflux’ variable list, and returns a DataFrame of the results, along with printing the unmatched columns.
- Return type:
- Args:
df_full: The DataFrame whose columns need to be cleaned and matched. amflux: A DataFrame or Series that contains the ‘Variable’ column
or is the Series of variables to match against.
- Returns:
A DataFrame containing the original columns, the cleaned columns, and a boolean indicating if the cleaned column is in the amflux list.
- micromet.format.transformers.rating(x)[source]
Categorize a numeric value into a discrete rating level (0, 1, or 2).
This function categorizes a numeric value into one of three levels: - 0 for values between 0 and 3. - 1 for values between 4 and 6. - 2 for all other values.
- Parameters:
x (numeric or None) – The input value to be rated.
- Returns:
The rating level (0, 1, or 2).
- Return type:
- micromet.format.transformers.rename_columns(df, data_type, config, logger)[source]
Rename DataFrame columns based on configuration and standardize their names.
This function renames columns using a predefined mapping from the configuration, normalizes soil and temperature-related prefixes, and converts all column names to uppercase.
- Parameters:
df (
DataFrame) – The input DataFrame with columns to be renamed.data_type (
str) – The type of data (‘eddy’ or ‘met’), which determines which renaming map to use.config (
dict) – The configuration dictionary containing the renaming maps.logger (
Logger) – The logger for tracking the renaming process.
- Returns:
The DataFrame with renamed and standardized column names.
- Return type:
- micromet.format.transformers.resample_timestamps(df, interval, logger)[source]
Resample a DataFrame to 30- or 60- minute intervals.
This function resamples the DataFrame to a fixed 30-or 60-minute frequency based on the ‘DATETIME_END’ column. It also handles duplicate timestamps by selecting the first available value.
- micromet.format.transformers.scale_and_convert(column)[source]
Apply a rating transformation and convert the column to float type.
This function applies a ‘rating’ function to each element of the Series and then converts the entire Series to float.
- micromet.format.transformers.set_number_types(df, logger)[source]
Convert columns in a DataFrame to the appropriate numeric types.
This function iterates through the columns of a DataFrame and converts them to numeric types (integer or float) where appropriate. It handles special cases for certain columns and logs warnings for duplicate columns.
- micromet.format.transformers.ssitc_scale(df, logger)[source]
Scale SSITC (Signal Strength and Integrity Test) columns.
This function checks specific SSITC columns and, if their values exceed a certain threshold (3), applies a scaling and rating transformation to them.
- micromet.format.transformers.tau_fixer(df, threshold=0.5, logger=None)[source]
Replace zero values in the ‘TAU’ column with NaN and flips sign if needed.
Loops through all columns with TAU in the name that don’t also have SSITC or QC in the name.
This function checks for zero values or negative infinity values in the ‘TAU’ column and replaces them with NaN. This is often done to handle cases where zero represents a missing or invalid measurement.
The function also determines whether to reverse the sign of TAU. If more than the specified threshold of TAU values are positive, it flips the sign of all TAU values.
- micromet.format.transformers.timestamp_reset(df, minutes=30)[source]
Reset TIMESTAMP_START and TIMESTAMP_END columns based on the DataFrame index.
This function generates new ‘TIMESTAMP_START’ and ‘TIMESTAMP_END’ columns based on the DataFrame’s datetime index. The ‘TIMESTAMP_START’ is calculated by subtracting a specified number of minutes to the start time.