adi_py
¶
This module provides the new ADI Python bindings which incorporate full XArray compatibility.
Submodules¶
Classes¶
Used to apply special ADI functions to an xarray data array (i.e., variable) |
|
Used to apply special ADI functions to an xarray dataset with the |
|
Used to easily reference different types of ADI datasets. |
|
This class provides python-like logging API facade around the dsproc |
|
Used to easily reference bit assessment values used in ADI QC |
|
NamedTuple class that holds various information used to identify a specific |
|
Generic enumeration. |
|
The base class for running an ADI process in Python. All Python processes |
|
Enumerates the special XArray variable attributes that are assigned |
|
Enumerates the split mode which is used to define the output file size |
|
Used to easily reference transformation metadata attrs used in ADI QC |
- exception adi_py.DatasetConversionException¶
Bases:
Exception
Exception used when converting from XArray to ADI or vice versa and the data are incompatible.
Initialize self. See help(type(self)) for accurate signature.
- exception adi_py.SkipProcessingIntervalException(msg: str = '', log_level: adi_py.logger.LogLevel = LogLevel.INFO)¶
Bases:
Exception
Processes should throw this exception if the current processing interval should be skipped. All other exceptions will be considered to fail the process.
Initialize self. See help(type(self)) for accurate signature.
- class adi_py.ADIAtts¶
- ANCILLARY_VARIABLES = ancillary_variables¶
- DESCRIPTION = description¶
- FILL_VALUE = ['_FillValue']¶
- LONG_NAME = long_name¶
- MISSING_VALUE = missing_value¶
- STANDARD_NAME = standard_name¶
- UNITS = units¶
- VALID_MAX = valid_max¶
- VALID_MIN = valid_min¶
- class adi_py.ADIDataArrayAccessor(xarray_obj)¶
Used to apply special ADI functions to an xarray data array (i.e., variable) with the namespace ‘adi’
Class Methods
Method Descriptions
- assign_coordinate_system(self, coordinate_system_name: str)¶
- assign_output_datastream(self, output_datastream_name: str, variable_name_in_datastream: str = None)¶
- property nsamples(self) int ¶
- property source_ds_name(self) str ¶
- property source_var_name(self) str ¶
- class adi_py.ADIDatasetAccessor(xarray_obj)¶
Used to apply special ADI functions to an xarray dataset with the namespace ‘adi’
Class Methods
Method Descriptions
- add_qc_variable(self, variable_name: str)¶
- add_variable(self, variable_name: str, dim_names: List[str], data: numpy.ndarray, long_name: str = None, standard_name: str = None, units: str = None, valid_min=None, valid_max=None, missing_value: numpy.ndarray = None, fill_value=None)¶
- convert_units(self, old_units: str, new_units: str, variable_names: List[str] = None, converter_function: Callable = None)¶
- drop_transform_metadata(self, variable_names: List[str]) xarray.Dataset ¶
- drop_variables(self, variable_names: List[str]) xarray.Dataset ¶
- get_companion_transform_variable_names(self, variable_name: str) List[str] ¶
- get_qc_variable(self, variable_name: str)¶
- record_qc_results(self, variable_name: str, bit_number: int = None, test_results: numpy.ndarray = None)¶
- variables_exist(self, variable_names: List[str] = []) numpy.ndarray ¶
- class adi_py.ADIDatasetType¶
Bases:
enum.Enum
Used to easily reference different types of ADI datasets.
- OUTPUT = 3¶
- RETRIEVED = 1¶
- TRANSFORMED = 2¶
- class adi_py.ADILogger¶
This class provides python-like logging API facade around the dsproc logging methods.
Class Methods
Use this method to log the stack trace of any raised exception to the process’s
Method Descriptions
- static debug(message, debug_level=1)¶
- static error(message)¶
- static exception(message)¶
Use this method to log the stack trace of any raised exception to the process’s ADI log file.
- Parameters
message (-) – str An optional additional message to log, in addition to the stack trace.
- static info(message)¶
- static warning(message)¶
- class adi_py.BitAssessment¶
Bases:
enum.Enum
Used to easily reference bit assessment values used in ADI QC
- BAD = Bad¶
- INDETERMINATE = Indeterminate¶
- class adi_py.DatastreamIdentifier¶
Bases:
NamedTuple
NamedTuple class that holds various information used to identify a specific ADI dataset.
- datastream_name :str¶
- dsid :int¶
- facility :str¶
- site :str¶
- class adi_py.LogLevel¶
Bases:
enum.Enum
Generic enumeration.
Derive from this class to define new enumerations.
- DEBUG = debug¶
- ERROR = error¶
- INFO = info¶
- WARNING = warning¶
- class adi_py.Process¶
The base class for running an ADI process in Python. All Python processes should extend this class.
Class Methods
Add a companion qc variable for the given variable
Create a new variable in the given xarray dataset with the specified dimensions,
Assign the given variable to the designated ADI coordinate system.
Assign the given variable to the designated output datastream.
For the specified variables, convert the units from old_units to new_units.
Get the debug level passed on the command line when running the process.
This method removes all associated companion variables that are generated
This method removes the given variables plus all associated companion
Get the facility where this invocation of the process is running
Find the input datastream where the given retrieved variable came
This hook will be called once just after the main data processing loop finishes. This function should be used
Get a mask of same shape as the variable’s data which contains True values
For the given variable, get a list of the companion/ancillary variables
Gets the corresponding dataset id for the given datastream (input or output)
Get True/False mask of same shape as passed variable(s) which is used to
Get a True/False mask of same shape as passed variable(s) that is used to
Get the ADI sample count for the given variable (i.e., the length
Get an ADI output dataset converted to an xr.Dataset.
Get an ADI output dataset converted to an xr.Dataset.
Return the companion qc variable for the given data variable.
Create a properly formatted file name where a quicklooks plot should be
Get an ADI retrieved dataset converted to an xr.Dataset.
Get the ADI retrieved datasets converted to a list of xarray Datasets.
For the given variable, get name of the input datastream
For the given variable, get the name of the variable
Get an ADI transformed dataset converted to an xr.Dataset.
Get an ADI transformed dataset converted to an xr.Dataset.
Setting controlling whether this process should provide debug dumps of the
This hook will will be called once just before the main data processing loop begins and before the initial
Get the location where this invocation of the process is running.
This hook will will be called once per processing interval just after data retrieval,
This hook will be called once per processing interval just after data
This hook will will be called once per processing interval just prior to data retrieval.
This hook will be called once per processing interval just prior to data
This hook will be called once per processing interval just after the output
The processing model to use. It can be one of:
The name of the process that is currently being run.
The name(s) of the process(es) that could run this code. Subclasses must
The version of this process’s code. Subclasses must define the
This hook will be called once per processing interval just after all data
For the given variable, add bitwise test results to the companion qc
ADI setting controlling whether all the qc bits are rolled up into a
Run the process.
Apply a set of ADI control flags to a datastream as identified by the
This method should be called in your init_process_hook if you need to
This method should be called in your init_process_hook if you need to override
This method should be called in your init_process_hook (i.e., before the
This method should be called in your init_process_hook (i.e., before the
Get the site where this invocation of the process is running
Sync the contents of one or more XArray.Datasets with the corresponding ADI
Check if the given variables exist in the given dataset.
Method Descriptions
- static add_qc_variable(dataset: xarray.Dataset, variable_name: str)¶
Add a companion qc variable for the given variable
- Parameters
dataset (xr.Dataset) –
variable_name (str) –
- Returns
The newly created DataArray
- static add_variable(dataset: xarray.Dataset, variable_name: str, dim_names: List[str], data: numpy.ndarray, long_name: str = None, standard_name: str = None, units: str = None, valid_min: Any = None, valid_max: Any = None, missing_value: numpy.ndarray = None, fill_value: Any = None)¶
Create a new variable in the given xarray dataset with the specified dimensions, data, and attributes.
Important
If you want to add the created variable to a given coordinate system, then you follow this with a call to assign_coordinate_system_to_variable. Similarly, if you want to add the created variable to a given output datastream, then you should follow this with a call to assign_output_datastream_to_variable
See also
assign_coordinate_system_to_variable
assign_output_datastream_to_variable
- Parameters
dataset (xr.Dataset) – The xarray dataset to add the new variable to
variable_name (str) – The name of the variable
dim_names (List[str]) – A list of dimension names for the variable
data (np.ndarray) – A multidimensional array of the variable’s data Must have the same shape as the dimensions.
long_name (str) – The long_name attribute for the variable
standard_name (str) – The standard_name attribute for the variable
units (str) – The units attribute for the variable
valid_min (Any) – The valid_min attribute for the variable. Must be the same data type as the variable.
valid_max (Any) – The valid_max attribute for the variable Must be the same data type as the variable.
missing_value (np.ndarray) – An array of possible missing_value attributes for the variable. Must be the same data type as the variable.
() (fill_value) – The fill_value attribute for the variable. Must be the same data type as the variable.
- Returns
The newly created variable (i.e., xr.DataArray object)
- static assign_coordinate_system_to_variable(variable: xarray.DataArray, coordinate_system_name: str)¶
Assign the given variable to the designated ADI coordinate system.
- Parameters
variable (xr.DataArray) – A data variable from an xarray dataset
coordinate_system_name (str) – The name of one of the process’s coordinate systems as specified in the PCM process definition.
- static assign_output_datastream_to_variable(variable: xarray.DataArray, output_datastream_name: str, variable_name_in_datastream: str = None)¶
Assign the given variable to the designated output datastream.
- Parameters
variable (xr.DataArray) – A data variable from an xarray dataset
output_datastream_name (str) – An output datastream name as specified in PCM process definition
variable_name_in_datastream (str) – The name of the variable as it should appear in the output datastream. If not specified, then the name of the given variable will be used.
- static convert_units(xr_datasets: List[xarray.Dataset], old_units: str, new_units: str, variable_names: List[str] = None, converter_function: Callable = None)¶
For the specified variables, convert the units from old_units to new_units. For applicable variables, this conversion will include changing the units attribute value and optionally converting all the data values if a converter function is provided.
This method is needed for special cases where the units conversion is not supported by udunits and the default ADI converters.
- Parameters
xr_datasets (List[xr.Dataset]) – One or more xarray datasets upon which to apply the conversion
old_units (str) – The old units (e.g., ‘degree F’)
new_units (str) – The new units (e.g., ‘K’)
variable_names (List[str]) – A list of specific variable names to convert. If not specified, it converts all variables with the given old_units to new_units.
() (converter_function) –
A function to run on an Xarray variable (i.e., DataArray that converts a variable’s values from old_units to new_units. If not specified, then only the units attribute value will be changed. This could happen if we just want to change the units attribute value because of a typo.
The function should take one parameter, an xarray.DataArray, and operate in place on the variable’s values.
- property debug_level(self) int ¶
Get the debug level passed on the command line when running the process.
- Returns
int – the debug level
- static drop_transform_metadata(dataset: xarray.Dataset, variable_names: List[str]) xarray.Dataset ¶
This method removes all associated companion variables that are generated byt the transformation process (if they exist), as well as transformation attributes, but it does not remove the original variable.
- Parameters
dataset (xr.Dataset) – The dataset containing the transformed variables.
variable_names (List[str]) – The variable names for which to remove transformation metadata.
- Returns
xr.Dataset – A new dataset with the transform companion variables and metadata removed.
- static drop_variables(dataset: xarray.Dataset, variable_names: List[str]) xarray.Dataset ¶
This method removes the given variables plus all associated companion variables that were added as part of the transformation step (if they exist).
- Parameters
dataset (xr.Dataset) – The dataset containing the given variables.
variable_names (List[str]) – The variable names to remove.
- Returns
xr.Dataset – A new dataset with the given variables and their transform companion variables removed.
- property facility(self) str ¶
Get the facility where this invocation of the process is running
- Returns
str – The facility where this process is running
- static find_retrieved_variable(retrieved_variable_name) Optional[adi_py.utils.DatastreamIdentifier] ¶
Find the input datastream where the given retrieved variable came from. We may need this if there are complex retrieval rules and the given variable may be retrieved from different datastreams depending upon the site/facility where this process runs. We need to get the DatastreamIdentifier so we can load the correct xarray dataset if we need to modify the data values.
- Parameters
retrieved_variable_name (str) – The name of the retrieved variable to find
- Returns
A DatastreamIdentifier containing all the information needed to look up the given dataset or None if the retrieved variable was not found.
- finish_process_hook(self)¶
This hook will be called once just after the main data processing loop finishes. This function should be used to clean up any temporary files used.
- static get_bad_qc_mask(dataset: xarray.Dataset, variable_name: str, include_indeterminate: bool = False, bit_numbers: List[int] = None) numpy.ndarray ¶
Get a mask of same shape as the variable’s data which contains True values for each data point that has a corresponding bad qc bit set.
- Parameters
dataset (xr.Dataset) – The dataset containing the variables
variable_name (str) – The variable name to check qc for
include_indeterminate (bool) – Whether to include indeterminate bits when determining the mask. By default this is False and only bad bits are used to compute the mask.
bit_numbers (List(int)) – The specific bit numbers to include in the qc check (i.e., 1,2,3,4, etc.). Note that if not specified, all bits will be used to compute the mask.
- Returns
np.ndarray – An array of same shape as the variable consisting of True/False values, where each True indicates that the corresponding data point had bad (or indeterminate if include_indeterminate is specified) qc for the specified bit numbers (all bits if bit_numbers not specified).
- static get_companion_transform_variable_names(dataset: xarray.Dataset, variable_name: str) List[str] ¶
For the given variable, get a list of the companion/ancillary variables that were added as a result of the ADI transformation.
- Parameters
dataset (xr.Dataset) – The dataset
variable_name (str) – The name of a data variable in the dataset
- Returns
A list of string companion variable names that were created from the transform engine. This is used for cleaning up associated variables when a variable is deleted from a dataset.
- static get_datastream_files(datastream_name: str, begin_date: int, end_date: int) List[str] ¶
- static get_dsid(datastream_name: str, site: str = None, facility: str = None, dataset_type: adi_py.constants.ADIDatasetType = None) Optional[int] ¶
Gets the corresponding dataset id for the given datastream (input or output)
- Parameters
datastream_name (str) – The name of the datastream to find
site (str) – Optional parameter used only to find some input datasets (RETRIEVED or TRANSFORMED). Site is only required if the retrieval rules in the PCM specify two different rules for the same datastream that differ by site.
facility (str) – Optional parameter used only to find some input datasets (RETRIEVED or TRANSFORMED). Facility is only required if the retrieval rules in the PCM specify two different rules for the same datastream that differ by facility.
dataset_type (ADIDatasetType) – The type of the dataset to convert (RETRIEVED, TRANSFORMED, OUTPUT)
- Returns
Optional[int] – The dataset id or None if not found
- static get_missing_value_mask(*args) xarray.DataArray ¶
Get True/False mask of same shape as passed variable(s) which is used to select data points for which one or more of the values of any of the specified variables are missing.
- Parameters
*args (xr.DataArray) – Pass one or more xarray variables to check for missing values. All variables in the list must have the same shape.
- Returns
xr.DataArray – An array of True/False values of the same shape as the input variables where each True represents the case where one or more of the variables has a missing_value at that index.
- static get_non_missing_value_mask(*args) xarray.DataArray ¶
Get a True/False mask of same shape as passed variable(s) that is used to select data points for which none of the values of any of the specified variables are missing.
- Parameters
*args (xr.DataArray) – Pass one or more xarray variables to check. All variables in the list must have the same shape.
- Returns
xr.DataArray – An array of True/False values of the same shape as the input variables where each True represents the case where all variables passed in have non-missing value data at that index.
- static get_nsamples(xr_var: xarray.DataArray) int ¶
Get the ADI sample count for the given variable (i.e., the length of the first dimension or 1 if the variable has no dimensions)
- Parameters
xr_var (xr.DataArray) –
- Returns
int – The ADI sample count
- static get_output_dataset(output_datastream_name: str) Optional[xarray.Dataset] ¶
Get an ADI output dataset converted to an xr.Dataset.
Note: This method will return at most a single xr.Dataset. If you expect multiple datasets, or would like to handle cases where multiple dataset files may be retrieved, please use the Process.get_retrieved_datasets() function.
- Parameters
output_datastream_name (str) – The name of one of the process’ output datastreams as specified in the PCM.
- Returns
xr.Dataset | None –
- Returns a single xr.Dataset, or None if no output
datasets exist for the specified datastream / site / facility / coord system.
- static get_output_dataset_by_dsid(dsid: int) Optional[xarray.Dataset] ¶
- static get_output_datasets(output_datastream_name: str) List[xarray.Dataset] ¶
Get an ADI output dataset converted to an xr.Dataset.
- Parameters
output_datastream_name (str) – The name of one of the process’ output datastreams as specified in the PCM.
- Returns
List[xr.Dataset] –
- Returns a list of xr.Datasets. If no output datasets
exist for the specified datastream / site / facility / coord system then the list will be empty.
- static get_output_datasets_by_dsid(dsid: int) List[xarray.Dataset] ¶
- static get_qc_variable(dataset: xarray.Dataset, variable_name: str) xarray.DataArray ¶
Return the companion qc variable for the given data variable.
- Parameters
dataset (xr.Dataset) –
variable_name (str) –
- Returns
xr.DataArray – The companion qc variable or None if it doesn’t exist
- get_quicklooks_file_name(self, datastream_name: str, begin_date: int, description: str = None, ext: str = 'png', mkdirs: bool = False)¶
Create a properly formatted file name where a quicklooks plot should be saved for the given processing interval. For example:
${QUICKLOOK_DATA}/ena/enamfrsrcldod1minC1.c1/2021/01/01/enamfrsrcldod1minC1.c1.20210101.000000.lwp.png
- Parameters
datastream_name (str) – The name of the datastream which this plot applies to. For example, mfrsrcldod1min.c1
begin_date (int) – The begin timestamp of the current processing interval as passed to the quicklook hook function
description (str) – The description of the plot to be used in the file name For example, in the file enamfrsrcldod1minC1.c1.20210101.000000.lwp.png, the description is ‘lwp’.
ext (str) – The file extension for the image. Default is ‘png’
mkdirs (bool) – If True, then the folder path to the quicklooks file will be automatically created if it does not exist. Default is False.
- Returns
str – The full path to where the quicklooks file should be saved.
- static get_retrieved_dataset(input_datastream_name: str, site: Optional[str] = None, facility: Optional[str] = None) Optional[xarray.Dataset] ¶
Get an ADI retrieved dataset converted to an xr.Dataset.
Note: This method will return at most a single xr.Dataset. If you expect multiple datasets, or would like to handle cases where multiple dataset files may be retrieved, please use the Process.get_retrieved_datasets() function.
- Parameters
input_datastream_name (str) – The name of one of the process’ input datastreams as specified in the PCM.
site (str) – Optional parameter used only to find some input datasets (RETRIEVED or TRANSFORMED). Site is only required if the retrieval rules in the PCM specify two different rules for the same datastream that differ by site.
facility (str) – Optional parameter used only to find some input datasets (RETRIEVED or TRANSFORMED). Facility is only required if the retrieval rules in the PCM specify two different rules for the same datastream that differ by facility.
- Returns
xr.Dataset | None –
- Returns a single xr.Dataset, or None if no retrieved datasets
exist for the specified datastream / site / facility.
- static get_retrieved_dataset_by_dsid(dsid: int) Optional[xarray.Dataset] ¶
- static get_retrieved_datasets(input_datastream_name: str, site: Optional[str] = None, facility: Optional[str] = None) List[xarray.Dataset] ¶
Get the ADI retrieved datasets converted to a list of xarray Datasets.
- Parameters
input_datastream_name (str) – The name of one of the process’ input datastreams as specified in the PCM.
site (str) – Optional parameter used only to find some input datasets (RETRIEVED or TRANSFORMED). Site is only required if the retrieval rules in the PCM specify two different rules for the same datastream that differ by site.
facility (str) – Optional parameter used only to find some input datasets (RETRIEVED or TRANSFORMED). Facility is only required if the retrieval rules in the PCM specify two different rules for the same datastream that differ by facility.
- Returns
List[xr.Dataset] –
- Returns a list of xr.Datasets. If no retrieved datasets
exist for the specified datastream / site / facility / coord system then the list will be empty.
- static get_retrieved_datasets_by_dsid(dsid: int) List[xarray.Dataset] ¶
- static get_source_ds_name(xr_var: xarray.DataArray) str ¶
For the given variable, get name of the input datastream where it came from :param xr_var: :type xr_var: xr.DataArray
Returns: str
- static get_source_var_name(xr_var: xarray.DataArray) str ¶
For the given variable, get the name of the variable used in the input datastream :param xr_var: :type xr_var: xr.DataArray
Returns: str
- static get_transformed_dataset(input_datastream_name: str, coordinate_system_name: str, site: Optional[str] = None, facility: Optional[str] = None) Optional[xarray.Dataset] ¶
Get an ADI transformed dataset converted to an xr.Dataset.
Note: This method will return at most a single xr.Dataset. If you expect multiple datasets, or would like to handle cases where multiple dataset files may be retrieved, please use the Process.get_retrieved_datasets() function.
- Parameters
input_datastream_name (str) – The name of one of the process’ input datastreams as specified in the PCM.
coordinate_system_name (str) – A coordinate system specified in the PCM or None if no coordinate system was specified.
site (str) – Optional parameter used only to find some input datasets (RETRIEVED or TRANSFORMED). Site is only required if the retrieval rules in the PCM specify two different rules for the same datastream that differ by site.
facility (str) – Optional parameter used only to find some input datasets (RETRIEVED or TRANSFORMED). Facility is only required if the retrieval rules in the PCM specify two different rules for the same datastream that differ by facility.
- Returns
xr.Dataset | None –
- Returns a single xr.Dataset, or None if no transformed
datasets exist for the specified datastream / site / facility / coord system.
- static get_transformed_dataset_by_dsid(dsid: int, coordinate_system_name: str) Optional[xarray.Dataset] ¶
- static get_transformed_datasets(input_datastream_name: str, coordinate_system_name: str, site: Optional[str] = None, facility: Optional[str] = None) List[xarray.Dataset] ¶
Get an ADI transformed dataset converted to an xr.Dataset.
- Parameters
input_datastream_name (str) – The name of one of the process’ input datastreams as specified in the PCM.
coordinate_system_name (str) – A coordinate system specified in the PCM or None if no coordinate system was specified.
site (str) – Optional parameter used only to find some input datasets (RETRIEVED or TRANSFORMED). Site is only required if the retrieval rules in the PCM specify two different rules for the same datastream that differ by site.
facility (str) – Optional parameter used only to find some input datasets (RETRIEVED or TRANSFORMED). Facility is only required if the retrieval rules in the PCM specify two different rules for the same datastream that differ by facility.
- Returns
List[xr.Dataset] –
- Returns a list of xr.Datasets. If no transformed datasets
exist for the specified datastream / site / facility / coord system then the list will be empty.
- static get_transformed_datasets_by_dsid(dsid: int, coordinate_system_name: str) List[xarray.Dataset] ¶
- property include_debug_dumps(self) bool ¶
Setting controlling whether this process should provide debug dumps of the data after each hook.
- Returns
bool – Whether debug dumps should be automatically included. If True and debug level is > 1, then debug dumps will be performed automatically before and after each code hook.
- init_process_hook(self)¶
This hook will will be called once just before the main data processing loop begins and before the initial database connection is closed.
- property location(self) dsproc3.PyProcLoc ¶
Get the location where this invocation of the process is running.
- Returns
dsproc.PyProcLoc – A class containing the alt, lat, and lon where the process is running.
- post_retrieval_hook(self, begin_date: int, end_date: int)¶
This hook will will be called once per processing interval just after data retrieval, but before the retrieved observations are merged and QC is applied.
- Parameters
begin_date (int) – the begin time of the current processing interval
end_date (int) – the end time of the current processing interval
- post_transform_hook(self, begin_date: int, end_date: int)¶
This hook will be called once per processing interval just after data transformation, but before the output datasets are created.
- Parameters
begin_date (int) – the begin time of the current processing interval
end_date (int) – the end time of the current processing interval
- pre_retrieval_hook(self, begin_date: int, end_date: int)¶
This hook will will be called once per processing interval just prior to data retrieval.
- Parameters
begin_date (- int) – the begin time of the current processing interval
end_date (- int) – the end time of the current processing interval
- pre_transform_hook(self, begin_date: int, end_date: int)¶
This hook will be called once per processing interval just prior to data transformation,and after the retrieved observations are merged and QC is applied.
- Parameters
begin_date (int) – the begin time of the current processing interval
end_date (int) – the end time of the current processing interval
- process_data_hook(self, begin_date: int, end_date: int)¶
This hook will be called once per processing interval just after the output datasets are created, but before they are stored to disk.
- Parameters
begin_date (int) – the begin time of the current processing interval
end_date (int) – the end time of the current processing interval
- property process_model(self) int ¶
The processing model to use. It can be one of:
dsproc.PM_GENERIC dsproc.PM_INGEST dsproc.PM_RETRIEVER_INGEST dsproc.PM_RETRIEVER_VAP dsproc.PM_TRANSFORM_INGEST dsproc.PM_TRANSFORM_VAP
Default value is PM_TRANSFORM_VAP. Subclasses can override in their constructor.
- Returns
int – The processing modelj (see dsproc.ProcModel cdeftype)
- property process_name(self) str ¶
The name of the process that is currently being run.
- Returns
str – the name of the current process
- property process_names(self) List[str] ¶
The name(s) of the process(es) that could run this code. Subclasses must define the self._names field in their constructor.
- Returns
List[str] – One or more process names
- property process_version(self) str ¶
The version of this process’s code. Subclasses must define the self._process_version field in their constructor.
- Returns
str – The process version
- quicklook_hook(self, begin_date: int, end_date: int)¶
This hook will be called once per processing interval just after all data is stored.
- Parameters
begin_date (int) – the begin timestamp of the current processing interval
end_date (int) – the end timestamp of the current processing interval
- static record_qc_results(xr_dataset: xarray.Dataset, variable_name: str, bit_number: int = None, test_results: numpy.ndarray = None)¶
For the given variable, add bitwise test results to the companion qc variable for the given test.
- Parameters
xr_dataset (xr.Dataset) – The xr dataset
variable_name (str) – The name of the data variable (e.g., rh_ambient).
bit_number (int) – The bit/test number to record Note that bit numbering starts at 1 (i.e., 1, 2, 3, 4, etc.)
test_results (np.ndarray) – A ndarray mask of the same shape as the variable with True/False values for each data point. True means the test failed for that data point. False means the test passed for that data point.
- property rollup_qc(self) bool ¶
ADI setting controlling whether all the qc bits are rolled up into a single 0/1 value or not.
- Returns
bool – Whether this process should rollup qc or not
- run(self) int ¶
Run the process.
- Returns
int – The processing status:
1 if an error occurred
0 if successful
- static set_datastream_flags(dsid: int, flags: int)¶
Apply a set of ADI control flags to a datastream as identified by the dsid. Multiple flags can be combined together using a bitwise OR (e.g., dsproc.DS_STANDARD_QC | dsproc.DS_FILTER_NANS). The allowed flags are identified below:
dsproc.DS_STANDARD_QC = Apply standard QC before storing a dataset.
- dsproc.DS_FILTER_NANS = Replace NaN and Inf values with missing values
before storing a dataset.
- dsproc.DS_OVERLAP_CHECK = Check for overlap with previously processed data.
This flag will be ignored and the overlap check will be skipped if reprocessing mode is enabled, or asynchronous processing mode is enabled.
- dsproc.DS_PRESERVE_OBS = Preserve distinct observations when retrieving
data. Only observations that start within the current processing interval will be read in.
- dsproc.DS_DISABLE_MERGE = Do not merge multiple observations in retrieved
data. Only data for the current processing interval will be read in.
- dsproc.DS_SKIP_TRANSFORM = Skip the transformation logic for all variables
in this datastream.
- dsproc.DS_ROLLUP_TRANS_QC = Consolidate the transformation QC bits for all
variables when mapped to the output datasets.
- dsproc.DS_SCAN_MODE = Enable scan mode for datastream that are not
expected to be continuous. This prevents warning messages from being generated when data is not found within a processing interval. Instead, a message will be written to the log file indicating that the procesing interval was skipped.
- dsproc.DS_OBS_LOOP = Loop over observations instead of time intervals.
This also sets the DS_PRESERVE_OBS flag.
- dsproc.DS_FILTER_VERSIONED_FILES = Check for files with .v# version extensions
and filter out lower versioned files. Files without a version extension take precedence.
Call self.get_dsid() to obtain the dsid value for a specific datastream. If the flags value is < 0, then the following default flags will be set:
dsprc.DS_STANDARD_QC ‘b’ level datastreams dsproc.DS_FILTER_NANS ‘a’ and ‘b’ level datastreams dsproc.DS_OVERLAP_CHECK all output datastreams dsproc.DS_FILTER_VERSIONED_FILES input datastreams that are not level ‘0’
- Parameters
dsid (int) – Datastream ID
flags (int) – Flags to set
- Returns
int – The processing modelj (see dsproc.ProcModel cdeftype)
- static set_datastream_split_mode(output_datastream_name: str, split_mode: adi_py.constants.SplitMode, split_start: int, split_interval: int)¶
This method should be called in your init_process_hook if you need to change the size of the output file for a given datastream. For example, to create monthly output files.
- Parameters
output_datastream_name (str) – The name of the output datastream whose file output size will be changed.
split_mode (SplitMode) – One of the options from the SplitMode enum
split_start (int) – Depends on the split_mode selected
split_interval (int) – Depends on the split_mode selected
- static set_retriever_time_offsets(input_datastream_name: str, begin_offset: int, end_offset: int)¶
This method should be called in your init_process_hook if you need to override the offsets per input datastream. By default, PCM only allows you to set global offsets that apply to all datastreams. If you need to change only one datastream, then you can do it via this method.
- Parameters
input_datastream_name (str) – The specific input datastream to change the processing interval for.
begin_offset (int) – Seconds of data to fetch BEFORE the process interval starts
end_offset (int) – Seconds of data to fetch AFTER the process interval ends
- static shift_output_interval(output_datastream_name: str, hours: int)¶
This method should be called in your init_process_hook (i.e., before the processing loop begins) if you need to shift the output interval to account for the timezone difference at the data location. For example, if you shift the output interval by -6 hours at SGP, the file will be split at 6:00 a.m. GMT.
- Parameters
output_datastream_name (str) – The name of the output datastream whose file output will be shifted.
hours (int) – Number of hours to shift
- static shift_processing_interval(seconds: int)¶
This method should be called in your init_process_hook (i.e., before the processing loop begins) if you need to shift the processing interval.
- Parameters
seconds (int) – Number of seconds to shift
- property site(self) str ¶
Get the site where this invocation of the process is running
- Returns
str – The site where this process is running
- static sync_datasets(*args: xarray.Dataset)¶
Sync the contents of one or more XArray.Datasets with the corresponding ADI data structure.
Important
This method MUST be called at the end of a hook function if any changes have been made to the XArray Dataset so that updates can be pushed back to ADI.
Important
This dataset must have been previously loaded via one of the get_*_dataset methods in order to have the correct embedded metadata to be able to sync to ADI. Specifically, this will include datastream name, coordinate system, dataset type, and obs_index.
See also
get_retrieved_dataset
get_transformed_dataset
get_output_dataset
- Parameters
*args (xr.Dataset) – One or more xr.Datasets to sync. Note: These datasets must have been previously loaded via one of the get_*_dataset methods in order to have the correct embedded metadata to be able to sync to ADI. Specifically, this will include datastream name, coordinate system, dataset type, and obs_index.
- static variables_exist(xr_dataset: xarray.Dataset, variable_names: numpy.ndarray = np.array([])) numpy.ndarray ¶
Check if the given variables exist in the given dataset.
- Parameters
xr_dataset (xr.Dataset) – The dataset
variable_names (np.ndarray[str]) – The variable names to check. Any array-like object can be provided, but ndarrays work best in order to easily select missing or existing variables from the results array (mask).
- Returns
np.ndarray – Array of same length as variable_names where each value is True or False. True if the variable exists. Use np.ndarray.all() to check if all the variables exist.
- class adi_py.SpecialXrAttributes¶
Enumerates the special XArray variable attributes that are assigned temporarily to help sync data between xarray and adi.
- COORDINATE_SYSTEM = __coordsys_name¶
- DATASET_TYPE = __dataset_type¶
- DATASTREAM_DSID = __datastream_dsid¶
- OBS_INDEX = __obs_index¶
- OUTPUT_TARGETS = __output_targets¶
- SOURCE_DS_NAME = __source_ds_name¶
- SOURCE_VAR_NAME = __source_var_name¶