Making a new dataset#

If you are contributing a new dataset, please have a look at Writing a CMORizer script for an additional dataset for how to do so. Please always create separate pull requests for CMORizer scripts, even when introducing a new dataset or updating an existing dataset with a new recipe.

If you are updating a CMORizer script to support a different dataset version, please have a look at Support for multiple versions of a dataset for how to handle multiple dataset versions.

Dataset documentation#

The documentation required for a CMORizer script is the following:

For more general information on writing documentation, see Documentation.

Testing#

When contributing a new script, add an entry for the CMORized data to recipes/examples/recipe_check_obs.yml and run the recipe, to make sure the CMOR checks pass without warnings or errors.

To test a pull request for a new CMORizer script:

  1. Download the data following the instructions included in the script and place it in the RAWOBS path specified in your config-user.yml

  2. If available, use the downloading script by running esmvaltool data download --config_file <config-file>  <dataset>

  3. Run the cmorization by running esmvaltool data format <config-file> <dataset>

  4. Copy the resulting data to the OBS (for CMIP5 compliant data) or OBS6 (for CMIP6 compliant data) path specified in your config-user.yml

  5. Run recipes/examples/recipe_check_obs.yml with the new dataset to check that the data can be used

Scientific sanity check#

When contributing a new dataset, we expect that the numbers and units of the dataset look physically meaningful. The scientific reviewer needs to check this.

Data availability#

Once your pull request has been approved by the reviewers, ask a member of @OBS-maintainers to add the new dataset to the data pool at DKRZ and CEDA-Jasmin. This team is in charge of merging CMORizer pull requests.

Detailed checklist for reviews#

This (non-exhaustive) checklist provides ideas for things to check when reviewing pull requests for new or updated CMORizer scripts.

Dataset description#

Check that new dataset has been added to the table of observations defined in the ESMValTool guide user’s guide in section Obtaining input data (generated from doc/sphinx/source/input.rst). Check that the new dataset has also been added to the file datasets.yml.

BibTeX info file#

Check that a BibTeX file, i.e. <dataset>.bibtex defining the reference for the new dataset has been created in esmvaltool/references/.

recipe_check_obs.yml#

Check that new dataset has been added to the testing recipe esmvaltool/recipes/examples/recipe_check_obs.yml

Downloader script#

If present, check that the new downloader script esmvaltool/cmorizers/data/downloaders/datasets/<dataset>.py meets standards. This includes the following items:

  • Code quality checks

    1. Code quality

    2. No Codacy errors reported

CMORizer script#

Check that the new CMORizer script esmvaltool/cmorizers/data/formatters/datasets/<dataset>.{py,ncl} meets standards. This includes the following items:

  • In-code documentation (header) contains

    1. Download instructions

    2. Reference(s)

  • Code quality checks

    1. Code quality (e.g. no hardcoded pathnames)

    2. No Codacy errors reported

Config file#

If present, check config file <dataset>.yml in esmvaltool/cmorizers/data/cmor_config/ for correctness. Use yamllint to check for syntax errors and common mistakes.

Run downloader script#

If available, make sure the downloader script is working by running esmvaltool data download --config_file <config-file> <dataset>

Run CMORizer#

Make sure CMORizer is working by running esmvaltool data format --config_file <config-file> <dataset>

Check output of CMORizer#

After successfully running the new CMORizer, check that:

  • Output contains (some) valid values (e.g. not only nan or zeros)

  • Metadata is defined properly

Run esmvaltool/recipes/examples/recipe_check_obs.yml for new dataset.

RAW data#

Contact the team in charge of ESMValTool data pool (@OBS-maintainers) and request to copy RAW data to RAWOBS/Tier2 (Tier3).

CMORized data#

Contact the team in charge of ESMValTool data pool (@OBS-maintainers) and request to

  • Merge the pull request

  • Copy CMORized dataset to OBS/Tier2 (Tier3)

  • Set file access rights for new dataset