Preprocessing functionalities

Simple preprocessing of MLR model input.

Description

This diagnostic performs preprocessing operations for datasets used as MLR model input in a desired way. It can also be used to process output of MLR models for plotting.

Author

Manuel Schlund (DLR, Germany)

Project

CRESCENDO

Configuration options in recipe

aggregate_by: dict, optional

Aggregate over given coordinates (dict values; given as list of str) using a desired aggregator (dict key; given as str). Allowed aggregators are 'max', 'mean', 'median', 'min', 'sum', 'std', 'var', and 'trend'.

apply_common_mask: bool, optional (default: False)

Apply common mask to all datasets. Requires identical shapes for all datasets.

area_weighted: bool, optional (default: True)

Use weighted aggregation when collapsing over latitude and/or longitude using collapse. Weights are estimated using grid cell bounds. Only possible for datasets on regular grids that contain latitude and longitude coordinates.

argsort: dict, optional

Calculate numpy.ma.argsort() along given coordinate to get ranking. The coordinate can be specified by the coord key. If descending is set to True, use descending order instead of ascending.

collapse: dict, optional

Collapse over given coordinates (dict values; given as list of str) using a desired aggregator (dict key; given as str). Allowed aggregators are 'max', 'mean', 'median', 'min', 'sum', 'std', 'var', and 'trend'.

convert_units_to: str, optional

Convert units of the input data. Can also be given as dataset option.

extract: dict, optional

Extract certain values (dict values, given as int, float or iterable of them) for certain coordinates (dict keys, given as str).

extract_ignore_bounds: bool, optional (default: False)

If True, ignore coordinate bounds when using extract or extract_range. If False, consider coordinate bounds when using extract or extract_range. For time coordinates, bounds are always ignored.

extract_range: dict, optional

Like extract, but instead of specific values extract ranges (dict values, given as iterable of exactly two int s or float s) for certain coordinates (dict keys, given as str).

ignore: list of dict, optional

Ignore specific datasets by specifying multiple dict s of metadata.

landsea_fraction_weighted: str, optional

When given, use land/sea fraction for weighted aggregation when collapsing over latitude and/or longitude using collapse. Only possible if the dataset contains latitude and longitude coordinates and for regular grids. Must be one of 'land', 'sea'.

mask: dict of dict

Mask datasets. Keys have to be numpy.ma conversion operations (see https://docs.scipy.org/doc/numpy/reference/routines.ma.html) and values all the keyword arguments of them.

n_jobs: int (default: 1)

Maximum number of jobs spawned by this diagnostic script. Use -1 to use all processors. More details are given here.

normalize_by_mean: bool, optional (default: False)

Remove total mean of the dataset in the last step (resulting mean will be 0.0). Calculates weighted mean if area_weighted, time_weighted or landsea_fraction_weighted are set and the cube contains the corresponding coordinates. Does not apply to error datasets.

normalize_by_std: bool, optional (default: False)

Scale total standard deviation of the dataset in the last step (resulting standard deviation will be 1.0).

output_attributes: dict, optional

Write additional attributes to netcdf files, e.g. 'tag'.

pattern: str, optional

Pattern matched against ancestor file names.

ref_calculation: str, optional

Perform calculations involving reference dataset. Must be one of merge (simply merge two datasets by adding the data of the reference dataset as iris.coords.AuxCoord to the original dataset), add (add reference dataset), divide (divide by reference dataset), multiply (multiply with reference dataset), subtract (subtract reference dataset) or trend (use reference dataset as x axis for calculation of linear trend along a specified axis, see ref_kwargs).

ref_kwargs: dict, optional

Keyword arguments for calculations involving reference datasets. Allowed keyword arguments are:

matched_by (list of str, default: []): Use a given set of attributes to match datasets with their corresponding reference datasets (specified by ref = True).
collapse_over (str, default: 'time'): Coordinate which is collapsed. Only relevant when ref_calculation is set to trend.

return_trend_stderr: bool, optional (default: True)

Return standard error of slope in case of trend calculations (as var_type prediction_input_error).

scalar_operations: dict, optional

Operations involving scalars. Allowed keys are add, divide, multiply or subtract. The corresponding values (float or int) are scalars that are used with the operations.

time_weighted: bool, optional (default: True)

Use weighted aggregation when collapsing over time dimension using collapse. Weights are estimated using time bounds.

unify_coords_to: dict, optional

If given, replace coordinates of all datasets with that of a reference cube (if necessary and possible, broadcast beforehand). The reference dataset is determined by keyword arguments given to this option (keyword arguments must point to exactly one dataset).