Preprocessing functionalities
Simple preprocessing of MLR model input.
Description
This diagnostic performs preprocessing operations for datasets used as MLR model input in a desired way. It can also be used to process output of MLR models for plotting.
Project
CRESCENDO
Configuration options in recipe
- aggregate_by: dict, optional
Aggregate over given coordinates (dict values; given as
list
ofstr
) using a desired aggregator (dict key; given asstr
). Allowed aggregators are'max'
,'mean'
,'median'
,'min'
,'sum'
,'std'
,'var'
, and'trend'
.- apply_common_mask: bool, optional (default: False)
Apply common mask to all datasets. Requires identical shapes for all datasets.
- area_weighted: bool, optional (default: True)
Use weighted aggregation when collapsing over latitude and/or longitude using
collapse
. Weights are estimated using grid cell bounds. Only possible for datasets on regular grids that containlatitude
andlongitude
coordinates.- argsort: dict, optional
Calculate
numpy.ma.argsort()
along given coordinate to get ranking. The coordinate can be specified by thecoord
key. Ifdescending
is set toTrue
, use descending order instead of ascending.- collapse: dict, optional
Collapse over given coordinates (dict values; given as
list
ofstr
) using a desired aggregator (dict key; given asstr
). Allowed aggregators are'max'
,'mean'
,'median'
,'min'
,'sum'
,'std'
,'var'
, and'trend'
.- convert_units_to: str, optional
Convert units of the input data. Can also be given as dataset option.
- extract: dict, optional
Extract certain values (dict values, given as
int
,float
or iterable of them) for certain coordinates (dict keys, given asstr
).- extract_ignore_bounds: bool, optional (default: False)
If
True
, ignore coordinate bounds when usingextract
orextract_range
. IfFalse
, consider coordinate bounds when usingextract
orextract_range
. For time coordinates, bounds are always ignored.- extract_range: dict, optional
Like
extract
, but instead of specific values extract ranges (dict values, given as iterable of exactly twoint
s orfloat
s) for certain coordinates (dict keys, given asstr
).- ignore: list of dict, optional
Ignore specific datasets by specifying multiple
dict
s of metadata.- landsea_fraction_weighted: str, optional
When given, use land/sea fraction for weighted aggregation when collapsing over latitude and/or longitude using
collapse
. Only possible if the dataset containslatitude
andlongitude
coordinates and for regular grids. Must be one of'land'
,'sea'
.- mask: dict of dict
Mask datasets. Keys have to be
numpy.ma
conversion operations (see https://docs.scipy.org/doc/numpy/reference/routines.ma.html) and values all the keyword arguments of them.- n_jobs: int (default: 1)
Maximum number of jobs spawned by this diagnostic script. Use
-1
to use all processors. More details are given here.- normalize_by_mean: bool, optional (default: False)
Remove total mean of the dataset in the last step (resulting mean will be 0.0). Calculates weighted mean if
area_weighted
,time_weighted
orlandsea_fraction_weighted
are set and the cube contains the corresponding coordinates. Does not apply to error datasets.- normalize_by_std: bool, optional (default: False)
Scale total standard deviation of the dataset in the last step (resulting standard deviation will be 1.0).
- output_attributes: dict, optional
Write additional attributes to netcdf files, e.g.
'tag'
.- pattern: str, optional
Pattern matched against ancestor file names.
- ref_calculation: str, optional
Perform calculations involving reference dataset. Must be one of
merge
(simply merge two datasets by adding the data of the reference dataset asiris.coords.AuxCoord
to the original dataset),add
(add reference dataset),divide
(divide by reference dataset),multiply
(multiply with reference dataset),subtract
(subtract reference dataset) ortrend
(use reference dataset as x axis for calculation of linear trend along a specified axis, seeref_kwargs
).- ref_kwargs: dict, optional
Keyword arguments for calculations involving reference datasets. Allowed keyword arguments are:
- return_trend_stderr: bool, optional (default: True)
Return standard error of slope in case of trend calculations (as
var_type
prediction_input_error
).- scalar_operations: dict, optional
Operations involving scalars. Allowed keys are
add
,divide
,multiply
orsubtract
. The corresponding values (float
orint
) are scalars that are used with the operations.- time_weighted: bool, optional (default: True)
Use weighted aggregation when collapsing over time dimension using
collapse
. Weights are estimated using time bounds.- unify_coords_to: dict, optional
If given, replace coordinates of all datasets with that of a reference cube (if necessary and possible, broadcast beforehand). The reference dataset is determined by keyword arguments given to this option (keyword arguments must point to exactly one dataset).