Auxiliary functions for emergent constraints scripts
Convenience functions for emergent constraints diagnostics.
Functions:
|
Calculate cumulative distribution function for a 1-dimensional PDF. |
|
Check metadata. |
|
|
|
Get array with all relevant parameters of emergent constraint. |
|
Create simple scatterplot of an emergent relationship (without saving). |
|
Export CSV file. |
|
Construct caption from plotting attributes for (feature, label) pair. |
|
Get color palette. |
|
Get constraint on target variable. |
|
Get constraint on target variable from |
|
Extract groups from training data. |
|
Extract input data. |
|
Get input files. |
|
Get provenance record. |
|
Get (X, Y) data for |
|
Convert pandas object to |
|
Plot individual scatterplots for the different groups. |
|
Plot merged scatterplots (all groups in one plot). |
|
Plot distributions of target variable for every feature. |
|
Return x and y coordinates of the regression line (mean and error). |
|
Set appearance of a plot. |
|
Return a function to calculate standard prediction error. |
|
Calculate probability density function (PDF) for target variable. |
- esmvaltool.diag_scripts.emergent_constraints.cdf(data, pdf)[source]
Calculate cumulative distribution function for a 1-dimensional PDF.
- Parameters
data (numpy.ndarray) – Data points (1D array).
pdf (numpy.ndarray) – Corresponding probability density function (PDF).
- Returns
Corresponding cumulative distribution function (CDF).
- Return type
- esmvaltool.diag_scripts.emergent_constraints.check_metadata(metadata, allowed_var_types=None)[source]
Check metadata.
- Parameters
- Raises
KeyError – Metadata does not contain necessary keys
'var_type'
and'tag'
.ValueError – Got invalid value for key
'var_type'
.
- esmvaltool.diag_scripts.emergent_constraints.constraint_info_array(x_data, y_data, obs_mean, obs_std, n_points=1000, necessary_p_value=None)[source]
Get array with all relevant parameters of emergent constraint.
- Parameters
x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
n_points (int, optional (default: 1000)) – Number of sampled points for PDF of target variable.
necessary_p_value (float, optional) – If given, replace constrained mean and standard deviation with unconstrained values when p-value of emergent relationship is greater than the given necessary p-value.
- Returns
- Array of shape (8,) with the elements:
Constrained mean of target variable.
Constrained standard deviation of target variable.
Unconstrained mean of target variable.
Unconstrained standard deviation of target variable.
Slope of emergent relationship.
Intercept of emergent relationship.
Correlation coefficient r of emergent relationship.
p-value of emergent relationship.
- Return type
- esmvaltool.diag_scripts.emergent_constraints.create_simple_scatterplot(x_data, y_data, obs_mean, obs_std)[source]
Create simple scatterplot of an emergent relationship (without saving).
- Parameters
x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
- esmvaltool.diag_scripts.emergent_constraints.export_csv(data_frame, attributes, basename, cfg, tags=None)[source]
Export CSV file.
- Parameters
data_frame (pandas.DataFrame) – Data to export.
attributes (dict) – Plot attributes for the different features and the label data. Used to retrieve provenance information.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.
tags (iterable of str, optional) – Tags for which provenance information should be retrieved (using
attributes
). If not specified, use (last level of) columns of the givendata_frame
.
- Returns
Path to the new CSV file.
- Return type
Construct caption from plotting attributes for (feature, label) pair.
- esmvaltool.diag_scripts.emergent_constraints.get_colors(cfg, groups=None)[source]
Get color palette.
- Parameters
- Returns
List of colors that can be used for
matplotlib
.- Return type
- esmvaltool.diag_scripts.emergent_constraints.get_constraint(x_data, y_data, obs_mean, obs_std, confidence_level=0.66)[source]
Get constraint on target variable.
- Parameters
x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
confidence_level (float, optional (default: 0.66)) – Confindence level to estimate the range of the target variable.
- Returns
Lower confidence limit, best estimate and upper confidence limit of target variable.
- Return type
- esmvaltool.diag_scripts.emergent_constraints.get_constraint_from_df(training_data, pred_input_data, confidence_level=0.66)[source]
Get constraint on target variable from
pandas.DataFrame
.- Parameters
training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
confidence_level (float, optional (default: 0.66)) – Confindence level to estimate the range of the target variable.
- Returns
Lower confidence limit, best estimate and upper confidence limit of target variable.
- Return type
- esmvaltool.diag_scripts.emergent_constraints.get_groups(training_data, add_combined_group=False)[source]
Extract groups from training data.
- Parameters
training_data (pandas.DataFrame) – Training data (features, label).
add_combined_group (bool, optional (default: False)) – Add combined group of all other groups at the beginning of the returned
list
.
- Returns
Groups.
- Return type
- esmvaltool.diag_scripts.emergent_constraints.get_input_data(cfg)[source]
Extract input data.
Return training data, prediction input data and corresponding attributes.
- Parameters
cfg (dict) – Recipe configuration.
- Returns
A tuple containing the training data (
pandas.DataFrame
), the prediction input data (pandas.DataFrame
) and the corresponding attributes (dict
).- Return type
- esmvaltool.diag_scripts.emergent_constraints.get_input_files(cfg, patterns=None, ignore_patterns=None)[source]
Get input files.
- esmvaltool.diag_scripts.emergent_constraints.get_provenance_record(attributes, tags, **kwargs)[source]
Get provenance record.
- Parameters
attributes (dict) – Plot attributes. All provenance keys need to start with
'provenance_'
.tags (list of str) – Tags used to retrieve data from the
attributes
dict
, i.e. features and/or label.**kwargs (Keyword arguments) – Additional
key:value
pairs directly passed to the provenance recorddict
. All values may include the format strings{feature}
and{label}
.
- Returns
Provenance record.
- Return type
- esmvaltool.diag_scripts.emergent_constraints.get_xy_data_without_nans(data_frame, feature, label)[source]
Get (X, Y) data for
(feature, label)
combination without nans.- Parameters
data_frame (pandas.DataFrame) – Training data.
feature (str) – Name of the feature data.
label (str) – Name of the label data.
- Returns
Tuple containing a
pandas.DataFrame
for the X axis (feature) and apandas.DataFrame
for the Y axis (label) without missing values.- Return type
- esmvaltool.diag_scripts.emergent_constraints.pandas_object_to_cube(pandas_object, index_droplevel=None, columns_droplevel=None, **kwargs)[source]
Convert pandas object to
iris.cube.Cube
.- Parameters
pandas_object (pandas.DataFrame or pandas.Series) – Data to convert.
index_droplevel (int or list of int, optional) – Drop levels of index if not
None
.columns_droplevel (int or list of int, optional) – Drop levels of columns if not
None
. Can only be used ifpandas_object
is apandas.DataFrame
.**kwargs (Keyword arguments) – Keyword arguments used for the cube metadata, e.g.
standard_name
,var_name
, etc.
- Returns
Data cube.
- Return type
- Raises
TypeError –
columns_droplevel
is used whenpandas_object
is not apandas.DataFrame
.
- esmvaltool.diag_scripts.emergent_constraints.plot_individual_scatterplots(training_data, pred_input_data, attributes, basename, cfg)[source]
Plot individual scatterplots for the different groups.
Plot scatterplots for all pairs of
(feature, label)
data (Separate plot for each group).- Parameters
training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
attributes (dict) – Plot attributes for the different features and the label data.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.
- esmvaltool.diag_scripts.emergent_constraints.plot_merged_scatterplots(training_data, pred_input_data, attributes, basename, cfg)[source]
Plot merged scatterplots (all groups in one plot).
Plot scatterplots for all pairs of
(feature, label)
data (all groups in one plot).- Parameters
training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
attributes (dict) – Plot attributes for the different features and the label data.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.
- esmvaltool.diag_scripts.emergent_constraints.plot_target_distributions(training_data, pred_input_data, attributes, basename, cfg)[source]
Plot distributions of target variable for every feature.
- Parameters
training_data (pandas.DataFrame) – Training data (features, label).
pred_input_data (pandas.DataFrame) – Prediction input data (mean and error).
attributes (dict) – Plot attributes for the different features and the label data.
basename (str) – Basename for the name of the file.
cfg (dict) – Recipe configuration.
- esmvaltool.diag_scripts.emergent_constraints.regression_line(x_data, y_data, n_points=1000)[source]
Return x and y coordinates of the regression line (mean and error).
- Parameters
x_data (numpy.ndarray) – X data used to fit the linear regression.
y_data (numpy.ndarray) – Y data used to fit the linear regression.
n_points (int, optional (default: 1000)) – Number of points for the regression lines.
- Returns
numpy.ndarray
s for the keys'x'
,'y'
,'y_minus_err'
,'y_plus_err'
,'slope'
,'intercept'
,'pvalue'
and'rvalue'
.- Return type
- esmvaltool.diag_scripts.emergent_constraints.set_plot_appearance(axes, attributes, **kwargs)[source]
Set appearance of a plot.
- Parameters
axes (matplotlib.axes.Axes) – Matplotlib Axes object which contains the plot.
attributes (dict) – Plot attributes.
**kwargs (Keyword arguments) – Keyword arguments of the form
plot_option=tag
whereplot_option
is something likeplot_title
,plot_xlabel
,plot_xlim
, etc. andtag
a key for the plot attributesdict
that describes which attributes should be considered for thatplot_option
.
- esmvaltool.diag_scripts.emergent_constraints.standard_prediction_error(x_data, y_data)[source]
Return a function to calculate standard prediction error.
The standard prediction error of a linear regression is the error when predicting a data point which was not used to fit the regression line in the first place.
- Parameters
x_data (numpy.ndarray) – X data used to fit the linear regression.
y_data (numpy.ndarray) – Y data used to fit the linear regression.
- Returns
Function that takes a
float
as single argument (representing the X value of a new data point) and returns the standard prediction error for that.- Return type
callable
- esmvaltool.diag_scripts.emergent_constraints.target_pdf(x_data, y_data, obs_mean, obs_std, n_points=1000, necessary_p_value=None)[source]
Calculate probability density function (PDF) for target variable.
- Parameters
x_data (numpy.ndarray) – X data of the emergent constraint.
y_data (numpy.ndarray) – Y data of the emergent constraint.
obs_mean (float) – Mean of observational data.
obs_std (float) – Standard deviation of observational data.
n_points (int, optional (default: 1000)) – Number of sampled points for PDF of target variable.
necessary_p_value (float, optional) – If given, return unconstrained PDF (using Gaussian distribution with unconstrained mean and standard deviation) when p-value of emergent relationship is greater than the given necessary p-value.
- Returns
x and y values for the PDF.
- Return type