micromet package
Subpackages
- micromet.format package
- Subpackages
- micromet.format.transformers package
- Submodules
- micromet.format.transformers.cleanup module
- micromet.format.transformers.columns module
- micromet.format.transformers.corrections module
- micromet.format.transformers.interval_updates module
- micromet.format.transformers.timestamp_update module
- micromet.format.transformers.timestamps module
- micromet.format.transformers.validation module
- Module contents
- micromet.format.transformers package
- Submodules
- micromet.format.compare module
- micromet.format.file_compile module
- micromet.format.headers module
- Key Features
apply_header()count_columns()detect_delimiter_and_header()find_header_donor()fix_all_in_parent()fix_directory_pairs()get_first_line_raw()header_line_is_valid()looks_like_header()name_similarity()open_text()patch_file()prepend_header_in_place()process_file()read_colnames()scan()sniff_delimiter()
- micromet.format.merge module
- micromet.format.reformatter module
- micromet.format.reformatter_vars module
- Module contents
- Subpackages
- micromet.qaqc package
- Submodules
- micromet.qaqc.data_cleaning module
- micromet.qaqc.netrad_limits module
WindowCompositeWindowComposite.yearWindowComposite.window_idWindowComposite.step_minutesWindowComposite.steps_per_dayWindowComposite.comp_potWindowComposite.comp_swWindowComposite.comp_ppfdWindowComposite.pct_exceed_swWindowComposite.pct_exceed_ppfdWindowComposite.lag_swWindowComposite.corr_swWindowComposite.lag_ppfdWindowComposite.corr_ppfdWindowComposite.comp_potWindowComposite.comp_ppfdWindowComposite.comp_swWindowComposite.corr_ppfdWindowComposite.corr_swWindowComposite.lag_ppfdWindowComposite.lag_swWindowComposite.pct_exceed_ppfdWindowComposite.pct_exceed_swWindowComposite.step_minutesWindowComposite.steps_per_dayWindowComposite.window_idWindowComposite.year
add_buffer()analyze_timestamp_alignment()clear_sky_radiation()estimate_net_radiation_range()flag_issues()hour_angle()longwave_radiation()plot_summary()solar_declination()solar_elevation()sw_in_pot_noaa()
- micromet.qaqc.variable_limits module
- Module contents
- micromet.report package
Submodules
micromet.pipeline module
Complete pipeline for processing micrometeorological data with Micromet.
This module provides high-level orchestration for the complete data processing workflow, from raw data files to cleaned, validated, and analyzed datasets.
Classes
Pipeline : Main orchestration class for data processing PipelineConfig : Configuration container for pipeline settings ProcessingResult : Container for processing results and metadata
Functions
run_pipeline : Convenience function to run complete pipeline process_station : Process a single station’s data batch_process : Process multiple stations
Examples
Basic usage:
>>> from micromet.pipeline import Pipeline
>>>
>>> # Process a single file
>>> pipeline = Pipeline()
>>> result = pipeline.process_file(
... 'data/US-UTW_Flux.dat',
... site_id='US-UTW'
... )
>>>
>>> # Batch process all stations
>>> results = pipeline.batch_process(
... input_dir='./raw_data',
... output_dir='./processed_data'
... )
Command-line usage:
$ python -m micromet.pipeline –site US-UTW –input data/ –output results/ $ python -m micromet.pipeline –batch –input data/ –output results/
- class micromet.pipeline.Pipeline(config=None, logger=None)[source]
Bases:
objectMain orchestration class for micrometeorological data processing.
This class coordinates the complete workflow from raw data files to cleaned, validated, and analyzed datasets.
- Parameters:
config (
Optional[PipelineConfig]) – Configuration settings for the pipeline.logger (
Optional[Logger]) – Logger instance for tracking progress.
- config
Pipeline configuration.
- Type:
- logger
Logger instance.
- Type:
- reader
Data reader instance.
- Type:
- batch_process(input_dir, output_dir, pattern='*Flux*.dat', data_type='eddy')[source]
Process multiple files in a directory.
- Parameters:
- Returns:
Results for all processed files.
- Return type:
- process_file(input_file, site_id=None, output_dir=None, data_type='eddy')[source]
Process a single data file through the complete pipeline.
- Parameters:
- Returns:
Container with processing results and metadata.
- Return type:
- class micromet.pipeline.PipelineConfig(check_timestamps=True, drop_soil=True, generate_reports=True, generate_plots=False, save_intermediate=False, var_limits_csv=None, expected_freq='30min', output_format='csv')[source]
Bases:
objectConfiguration settings for the data processing pipeline.
- var_limits_csv
Path to custom variable limits CSV file.
- Type:
Path or None
- class micromet.pipeline.ProcessingResult(site_id, success, input_file, output_file=None, n_records_input=0, n_records_output=0, n_flagged=0, processing_time=0.0, timestamp_issues=None, error_message=None, reports=<factory>)[source]
Bases:
objectContainer for processing results and metadata.
- input_file
Path to input file.
- Type:
Path
- output_file
Path to output file (if saved).
- Type:
Path or None
- micromet.pipeline.batch_process(input_dir, output_dir, **kwargs)[source]
Convenience function for batch processing.
micromet.reader module
This module provides the AmerifluxDataProcessor class for reading and parsing AmeriFlux-style CSV files (TOA5 or AmeriFlux output) into a pandas DataFrame.
- class micromet.reader.AmerifluxDataProcessor(logger=None)[source]
Bases:
objectA class for reading and parsing AmeriFlux-style CSV files.
This class is designed to handle Campbell Scientific TOA5 files or standard AmeriFlux output files, parsing them into a pandas DataFrame.
- Parameters:
logger (
Logger) – A logger for tracking the data processing. If not provided, a default logger is used.
- logger
The logger used for logging messages.
- Type:
- NA_VALUES = ['-9999', 'NAN', 'NaN', 'nan', nan, -9999.0]
- __init__(logger=None)[source]
Initialize the AmerifluxDataProcessor.
- Parameters:
logger (
Logger) – A logger for tracking the data processing. If not provided, a default logger is used.
- iterate_through_stations()[source]
Iterate through all stations and compile their data.
This method iterates through a predefined list of stations, compiles the data for each station, and returns a dictionary of DataFrames.
- Returns:
A dictionary where keys are station IDs and values are DataFrames of the compiled data for each station.
- Return type:
- raw_file_compile(main_dir, station_folder_name, search_str='*Flux_AmeriFluxFormat*.dat')[source]
Compile raw AmeriFlux datalogger files into a single DataFrame.
This method searches for files matching a given pattern within a station’s directory, processes each file, and concatenates them into a single DataFrame.
- Parameters:
- Returns:
A DataFrame containing the compiled data, or None if no valid files were found.
- Return type:
micromet.station_data_pull module
- class micromet.station_data_pull.StationDataDownloader(config, logger=None)[source]
Bases:
objectA class to manage downloading data from a station’s logger.
This class handles the connection and data download from a Campbell Scientific data logger via its web API.
- Parameters:
config (
Union[ConfigParser,dict]) – A configuration object containing station details and credentials.logger (
Logger) – A logger for logging messages. If None, a new logger is created.
- config
The configuration object.
- Type:
- logger
The logger instance.
- Type:
- logger_credentials
The authentication credentials for the logger.
- Type:
requests.auth.HTTPBasicAuth
- __init__(config, logger=None)[source]
Initialize the StationDataDownloader.
- Parameters:
config (
Union[ConfigParser,dict]) – A configuration object containing station details and credentials.logger (
Logger) – A logger for logging messages. If None, a new logger is created.
- download_from_station(station, loggertype='eddy', mode='since-time', p1='0', p2='0')[source]
Download data from a station’s logger.
This method constructs a request to the station’s web API to download data based on the specified parameters.
- Parameters:
station (
str) – The identifier for the station.loggertype (
str) – The type of logger (‘eddy’ or ‘met’). Defaults to ‘eddy’.mode (
str) – The data query mode (‘since-time’, ‘most-recent’, etc.). Defaults to ‘since-time’.p1 (
str) – The primary parameter for the query (e.g., start time). Defaults to “0”.p2 (
str) – The secondary parameter for the query (e.g., end time). Defaults to “0”.
- Returns:
A tuple containing the downloaded data as a DataFrame, the size of the data packet in MB, and the HTTP status code.
- Return type:
- static get_station_id(stationid)[source]
Extract the station ID from a full station identifier string.
- class micromet.station_data_pull.StationDataProcessor(config, engine, logger=None)[source]
Bases:
StationDataDownloaderA class for processing and managing station data.
This class extends StationDataDownloader to add functionality for reformatting data, interacting with a database, and managing the overall data processing workflow.
- Parameters:
config (
Union[ConfigParser,dict]) – A configuration object with station details.engine (
Engine) – A SQLAlchemy engine for database connections.logger (
Logger) – A logger for logging messages.
- engine
The SQLAlchemy engine instance.
- Type:
sqlalchemy.engine.base.Engine
- __init__(config, engine, logger=None)[source]
Initialize the StationDataProcessor.
- Parameters:
config (
Union[ConfigParser,dict]) – A configuration object with station details.engine (
Engine) – A SQLAlchemy engine for database connections.logger (
Logger) – A logger for logging messages.
- compare_sql_to_station(df, station, field='timestamp_end', loggertype='eddy')[source]
Compare station data with records in the database and filter new entries.
- Parameters:
- Returns:
A DataFrame containing only the new records.
- Return type:
- get_max_date(station, loggertype='eddy')[source]
Get the maximum timestamp from the station’s data in the database.
- get_station_data(station, reformat=True, loggertype='eddy', config_path='./data/reformatter_vars.yml', var_limits_csv='./data/extreme_values.csv', drop_soil=False)[source]
Fetch and process data for a single station.
This method downloads data from a station, optionally reformats it, and returns the processed data.
- Parameters:
station (
str) – The identifier for the station.reformat (
bool) – Whether to reformat the downloaded data. Defaults to True.loggertype (
str) – The type of logger (‘eddy’ or ‘met’). Defaults to ‘eddy’.config_path (
str) – The path to the reformatter configuration file.var_limits_csv (
str) – The path to the variable limits CSV file.drop_soil (
bool) – Whether to drop soil-related data. Defaults to False.
- Returns:
A tuple containing the processed DataFrame and the size of the downloaded data packet in MB.
- Return type:
- process_station_data(site_folders, config_path='./data/reformatter_vars.yml', var_limits_csv='./data/extreme_values.csv')[source]
Process and upload data for all specified stations.
This method iterates through a dictionary of site folders, fetches data for each station, processes it, and uploads it to the database.
- Parameters:
- Return type:
micromet.station_info module
This module contains station-specific information, such as site folders and logger IDs.
micromet.utils module
Utility functions for the micromet package.
- micromet.utils.create_reformatter_from_site(site_id, config_dir='src/micromet/data', check_timestamps=True, **reformatter_kwargs)[source]
Create a Reformatter instance with site configuration loaded from .ini file.
This is a convenience factory function that reads the site configuration and creates a properly configured Reformatter instance.
- Parameters:
site_id (
str) – The site identifier (e.g., ‘US-CdM’, ‘US-UTW’).config_dir (
Path|str) – Directory containing the .ini files. Defaults to ‘src/micromet/data’.check_timestamps (
bool) – Whether to enable timestamp checking. Defaults to True.**reformatter_kwargs – Additional keyword arguments passed to Reformatter (e.g., drop_soil, var_limits_csv).
- Returns:
A configured Reformatter instance.
- Return type:
Examples
>>> reformatter = create_reformatter_from_site('US-CdM') >>> df_clean, report, ts_results = reformatter.process(raw_data)
>>> # With additional options >>> reformatter = create_reformatter_from_site( ... 'US-UTW', ... drop_soil=False, ... check_timestamps=True ... )
>>> # Disable timestamp checking for speed >>> reformatter = create_reformatter_from_site( ... 'US-UTB', ... check_timestamps=False ... )
- micromet.utils.extract_config_for_reformatter(site_id, config_dir='src/micromet/data')[source]
Extract only the values needed for Reformatter from a site config.
This is a convenience function that returns just the three values needed to initialize a Reformatter with timestamp checking.
- Parameters:
- Returns:
A tuple of (site_lat, site_lon, site_utc_offset).
- Return type:
Examples
>>> lat, lon, utc = extract_config_for_reformatter('US-CdM') >>> lat, lon, utc (37.5241, -109.7471, -7.0)
- micromet.utils.get_all_site_configs(config_dir='src/micromet/data')[source]
Read all site configurations from .ini files in a directory.
- Parameters:
config_dir (
Path|str) – Directory containing the .ini files. Defaults to ‘src/micromet/data’.- Returns:
Dictionary mapping site_id to configuration dictionaries.
- Return type:
Examples
>>> all_configs = get_all_site_configs() >>> all_configs['US-CdM']['site_lat'] 37.5241 >>> list(all_configs.keys()) ['US-CdM', 'US-UTB', 'US-UTD', ...]
- micromet.utils.load_yaml(path)[source]
Load a YAML file and return its contents as a dictionary.
- Parameters:
- Returns:
The contents of the YAML file as a dictionary.
- Return type:
- Raises:
FileNotFoundError – If the specified file does not exist.
- micromet.utils.logger_check(logger)[source]
Initialize and return a logger instance if none is provided.
This function checks if a logger object is provided. If not, it creates and configures a new logger.
- micromet.utils.read_site_config(site_id, config_dir='src/micromet/data')[source]
Read site configuration from an .ini file.
- Parameters:
- Returns:
Dictionary with keys: - ‘site_lat’: float - Station latitude - ‘site_lon’: float - Station longitude - ‘site_utc_offset’: float - UTC offset in hours - ‘site_elevation’: float - Station elevation in meters - ‘site_name’: str - Full station name - ‘site_id’: str - Station identifier
- Return type:
- Raises:
FileNotFoundError – If the .ini file for the site is not found.
KeyError – If required metadata fields are missing.
Examples
>>> config = read_site_config('US-CdM') >>> config['site_lat'] 37.5241 >>> config['site_utc_offset'] -7.0
Module contents
Micromet: A package for processing and analyzing micrometeorological data.
This package provides a collection of tools for reading, reformatting, performing quality control, and generating reports from micrometeorological and flux data, particularly from AmeriFlux-style data sources.
The main components of the package are: - AmerifluxDataProcessor: For reading and parsing data files. - Reformatter: For cleaning and standardizing data. - tools: A collection of utility functions for analysis. - graphs: For creating various plots and visualizations. - StationDataDownloader: For downloading data from stations. - StationDataProcessor: For processing and managing station data.
- class micromet.AmerifluxDataProcessor(logger=None)[source]
Bases:
objectA class for reading and parsing AmeriFlux-style CSV files.
This class is designed to handle Campbell Scientific TOA5 files or standard AmeriFlux output files, parsing them into a pandas DataFrame.
- Parameters:
logger (
Logger) – A logger for tracking the data processing. If not provided, a default logger is used.
- logger
The logger used for logging messages.
- Type:
- NA_VALUES = ['-9999', 'NAN', 'NaN', 'nan', nan, -9999.0]
- __init__(logger=None)[source]
Initialize the AmerifluxDataProcessor.
- Parameters:
logger (
Logger) – A logger for tracking the data processing. If not provided, a default logger is used.
- iterate_through_stations()[source]
Iterate through all stations and compile their data.
This method iterates through a predefined list of stations, compiles the data for each station, and returns a dictionary of DataFrames.
- Returns:
A dictionary where keys are station IDs and values are DataFrames of the compiled data for each station.
- Return type:
- raw_file_compile(main_dir, station_folder_name, search_str='*Flux_AmeriFluxFormat*.dat')[source]
Compile raw AmeriFlux datalogger files into a single DataFrame.
This method searches for files matching a given pattern within a station’s directory, processes each file, and concatenates them into a single DataFrame.
- Parameters:
- Returns:
A DataFrame containing the compiled data, or None if no valid files were found.
- Return type:
- class micromet.Reformatter(var_limits_csv=None, drop_soil=True, check_timestamps=False, site_lat=None, site_lon=None, site_utc_offset=-7, logger=None)[source]
Bases:
objectA class to clean and standardize station data for flux/met processing.
This class provides a pipeline for preparing raw station data by applying a series of transformations, including fixing timestamps, renaming columns, applying physical limits, and checking timestamp alignment.
- Parameters:
var_limits_csv (
str|Path|None) – Path to a CSV file containing variable limits. If not provided, default limits are used.drop_soil (
bool) – If True, extra soil-related columns are dropped. Defaults to True.check_timestamps (
bool) – If True, perform timestamp alignment analysis on radiation data. Defaults to False.site_lat (
float|None) – Latitude of the site (required if check_timestamps=True).site_lon (
float|None) – Longitude of the site (required if check_timestamps=True).site_utc_offset (
int) – UTC offset in hours for the site (required if check_timestamps=True).logger (
Logger|None) – A logger for tracking the reformatting process. If not provided, a default logger is used.
- logger
The logger used for logging messages.
- Type:
- varlimits
A DataFrame containing the physical limits for each variable.
- Type:
pd.DataFrame
- __init__(var_limits_csv=None, drop_soil=True, check_timestamps=False, site_lat=None, site_lon=None, site_utc_offset=-7, logger=None)[source]
Initialize the Reformatter.
- Parameters:
var_limits_csv (
str|Path|None) – Path to a CSV file containing variable limits.drop_soil (
bool) – If True, extra soil-related columns are dropped. Defaults to True.check_timestamps (
bool) – If True, perform timestamp alignment analysis. Defaults to False.site_lat (
float|None) – Latitude of the site (required if check_timestamps=True).site_lon (
float|None) – Longitude of the site (required if check_timestamps=True).site_utc_offset (
int) – UTC offset in hours (required if check_timestamps=True).logger (
Logger|None) – A logger for tracking the reformatting process.
- prepare(df, interval=30, data_type='eddy')[source]
Current method - keep for backward compatibility
- preprocess(df, data_type='eddy', interval=30)[source]
Preprocess the data by applying initial cleaning and standardization steps.
- process(df, interval, data_type='eddy')[source]
Prepare the data by applying a series of cleaning and standardization steps.
This method takes a DataFrame of station data and applies a pipeline of transformations to clean and standardize it. The steps include fixing timestamps, renaming columns, setting numeric types, resampling, applying physical limits, and optionally checking timestamp alignment.
- Parameters:
- Returns:
A tuple containing: - The prepared DataFrame with standardized and cleaned data. - A report DataFrame detailing the changes made during the
application of physical limits.
A dictionary with timestamp alignment results (if check_timestamps=True), or None otherwise. Contains keys: ‘summary’, ‘composites’, ‘flags’.
- Return type:
- class micromet.StationDataDownloader(config, logger=None)[source]
Bases:
objectA class to manage downloading data from a station’s logger.
This class handles the connection and data download from a Campbell Scientific data logger via its web API.
- Parameters:
config (
Union[ConfigParser,dict]) – A configuration object containing station details and credentials.logger (
Logger) – A logger for logging messages. If None, a new logger is created.
- config
The configuration object.
- Type:
- logger
The logger instance.
- Type:
- logger_credentials
The authentication credentials for the logger.
- Type:
requests.auth.HTTPBasicAuth
- __init__(config, logger=None)[source]
Initialize the StationDataDownloader.
- Parameters:
config (
Union[ConfigParser,dict]) – A configuration object containing station details and credentials.logger (
Logger) – A logger for logging messages. If None, a new logger is created.
- download_from_station(station, loggertype='eddy', mode='since-time', p1='0', p2='0')[source]
Download data from a station’s logger.
This method constructs a request to the station’s web API to download data based on the specified parameters.
- Parameters:
station (
str) – The identifier for the station.loggertype (
str) – The type of logger (‘eddy’ or ‘met’). Defaults to ‘eddy’.mode (
str) – The data query mode (‘since-time’, ‘most-recent’, etc.). Defaults to ‘since-time’.p1 (
str) – The primary parameter for the query (e.g., start time). Defaults to “0”.p2 (
str) – The secondary parameter for the query (e.g., end time). Defaults to “0”.
- Returns:
A tuple containing the downloaded data as a DataFrame, the size of the data packet in MB, and the HTTP status code.
- Return type:
- static get_station_id(stationid)[source]
Extract the station ID from a full station identifier string.
- class micromet.StationDataProcessor(config, engine, logger=None)[source]
Bases:
StationDataDownloaderA class for processing and managing station data.
This class extends StationDataDownloader to add functionality for reformatting data, interacting with a database, and managing the overall data processing workflow.
- Parameters:
config (
Union[ConfigParser,dict]) – A configuration object with station details.engine (
Engine) – A SQLAlchemy engine for database connections.logger (
Logger) – A logger for logging messages.
- engine
The SQLAlchemy engine instance.
- Type:
sqlalchemy.engine.base.Engine
- __init__(config, engine, logger=None)[source]
Initialize the StationDataProcessor.
- Parameters:
config (
Union[ConfigParser,dict]) – A configuration object with station details.engine (
Engine) – A SQLAlchemy engine for database connections.logger (
Logger) – A logger for logging messages.
- compare_sql_to_station(df, station, field='timestamp_end', loggertype='eddy')[source]
Compare station data with records in the database and filter new entries.
- Parameters:
- Returns:
A DataFrame containing only the new records.
- Return type:
- get_max_date(station, loggertype='eddy')[source]
Get the maximum timestamp from the station’s data in the database.
- get_station_data(station, reformat=True, loggertype='eddy', config_path='./data/reformatter_vars.yml', var_limits_csv='./data/extreme_values.csv', drop_soil=False)[source]
Fetch and process data for a single station.
This method downloads data from a station, optionally reformats it, and returns the processed data.
- Parameters:
station (
str) – The identifier for the station.reformat (
bool) – Whether to reformat the downloaded data. Defaults to True.loggertype (
str) – The type of logger (‘eddy’ or ‘met’). Defaults to ‘eddy’.config_path (
str) – The path to the reformatter configuration file.var_limits_csv (
str) – The path to the variable limits CSV file.drop_soil (
bool) – Whether to drop soil-related data. Defaults to False.
- Returns:
A tuple containing the processed DataFrame and the size of the downloaded data packet in MB.
- Return type:
- process_station_data(site_folders, config_path='./data/reformatter_vars.yml', var_limits_csv='./data/extreme_values.csv')[source]
Process and upload data for all specified stations.
This method iterates through a dictionary of site folders, fetches data for each station, processes it, and uploads it to the database.
- Parameters:
- Return type: