Traceability#

Configuration#

An Autosubmit experiment starts with its creation using a version of Autosubmit to issue the command autosubmit expid. The generated experiments contain minimal YAML configuration to bootstrap the experiment.

For an Autosubmit experiment of type Git, the rest of the experiment configuration is located at a location like <EXPID>/proj/git_project/ (the proj part is constant, but the git_project is configurable) and imported by Autosubmit. The <EXPID>/proj/git_project/ subdirectory contains a clone of a Git repository (i.e. there is a proj/git_project/.git).

Note

Autosubmit combines multiple YAML files and generates a merged YAML file at <EXPID>/conf/metadata/experiment_data.yml. This file can be used to analyse the final configuration used for the run and compare with the information from trace files.

The cloned repository may contain YAML configuration files in a location such as <EXPID>/proj/git_project/conf, for example, with settings for models and applications, Autosubmit jobs, as well as the template scripts (e.g. under <EXPID>/proj/git_project/templates, or anywhere the user may choose).

These configuration files, template scripts, and the Git information from <EXPID>/proj/git_project/ (and any Git submodules), are one part of the traces used for provenance and reproducibility of the Autosubmit experiments. The rest of the traces and the data produced by running the experiment workflow jobs are explained in the following sections.

Logs#

Most of the Autosubmit commands that take an expid argument (autosubmit create, autosubmit setstatus, autosubmit run, etc.) write to log files persisted in the computer where the command is issued, along with the rest of the workflow configuration and other traces. The only exception being autosubmit delete <EXPID>, which will write to the global log path, as it deletes the experiment folder, along with its ASLOGS folder.

By default, these command logs are saved under <EXPID>/tmp/ASLOGS, contain in their names the timestamp of the command, and always come in pairs of “.log” and “_err.log” files (one for the command standard output, and one for the error output).

If the user issuing the command is not the owner of the experiment, then Autosubmit will try to write the log file in the ASLOGS folder first, and should that fail, it will try to write to the tmp folder or to the global log path, depending on the file system permissions for the user.

Note

Autosubmit keeps 10 logs of each command, i.e. up to 10 logs of autosubmit create, 10 logs of autosubmit run, etc., and then removes older log files when new ones are created.

For Autosubmit commands that do not contain an expid argument (e.g. autosubmit expid, autosubmit testcase, autosubmit readme, etc.) will write to the global log path, which can be configured in the .autosubmitrc configuration file.

Note

There are commands that do not produce any log, e.g. autosubmit delete.

The logs of the workflow tasks are retrieved from remote platforms by Autosubmit and written to <EXPID>/tmp/LOG_<EXPID>/. They contain the output and errors, as well as the trace output of the template script after parameter expansion (done via the set -x mode in Bash Shell).

The parent directory, <EXPID>/tmp, contains other trace files:

  • .cmd files that are the scripts created by Autosubmit from the templates and used to run each task (locally or to a remote platform with Slurm, for example);

  • *_COMPLETED files that confirm a task was marked as completed by the platform;

  • *_STAT files that contain the latest start and end date of the job; and

  • *_TOTAL_STATS that aggregates the information of all *_STAT info for the current and previous jobs.

Data#

The Autosubmit experiment ID acts as an persistent identifier (PID), which can be used to link data produced, traces, and configuration.

For example, it is possible to use the experiment ID in directories or as metadata to data written to remote file systems and databases. This way, one can verify if the experiment produced the expected data, or what experiment produced certain data.

Users must decide on the policy to maintain experiments. Depending on the number of experiments (thousands, millions) and storage limitations (user quota) it may be necessary to remove experiments and any data in the experiment directory.

It is possible to archive Autosubmit experiments, or delete old experiments. Another possibility is to compress logs and traces generated by experiments, keeping the experiments in the Autosubmit experiments directory.

Version control#

If your Autosubmit project uses Git (i.e. you have PROJECT.PROJECT_TYPE=git), then Autosubmit will check out your project code and keep track of the Git information for traceability and provenance.

Your experiment configuration will contain the Git details like the URL (GIT.PROJECT_ORIGIN), the branch (GIT.PROJECT_BRANCH), and the commit used (GIT.PROJECT_COMMIT).

If you have access to the project destination folder (PROJECT.PROJECT_DESTINATION) in the environment where you are running Autosubmit, then you can also obtain the same information using the command-line utility git in that directory, or inspect the contents of the .git sub-directory.

Autosubmit will inspect the Git repository of your project and extract the current commit. That value will be then written to the configuration AUTOSUBMIT.WORKFLOW_COMMIT.

The table below contains the complete list of Git parameters and their description:

2 Parameters Description#

Parameters

Description

GIT.PROJECT_ORIGIN

URL pointing towards the project

GIT.PROJECT_BRANCH

Branch that should be checkout out to, once repository is downloaded

GIT.PROJECT_COMMIT

Git commit message

GIT.PROJECT_TAG

Tag of the project

GIT.PROJECT_SUBMODULES

Creates a folder within the main project that allows for a submodule being located within the main repository keeping the history and content split and only giving availability to the a specific commit

GIT.PROJECT_SUBMODULES_DEPTH

Allow for a shallow clone to be done within the submodule with the specified depth. This will allow for smaller local clones, but limits its commit reachability

GIT.FETCH_SINGLE_BRANCH

Limits the data of the submodule to a single branch.

GIT.REMOTE_CLONE_ROOT

Clones a git project on the main HPC platform

Note

Depending on how you configure and run your experiment, the GIT.PROJECT_COMMIT may differ from what was actually used by Autosubmit (AUTOSUBMIT.WORKFLOW_COMMIT).

For example, you may use a commit for GIT.PROJECT_COMMIT and run autosubmit refresh <EXPID>. Later, you may add more commits to your local Git working copy, or you may check out another branch. Doing any of that, the next time autosubmit commands are used, they will use the latest version of your local project folder, unless you run autosubmit refresh again.

For traceability and provenance, we recommend the use of AUTOSUBMIT.WORKFLOW_COMMIT.

A practical example#

Given an experiment ID, such as a001, the experiment directory in a machine could be something similar to /$HOME/a001/ (configurable). For brevity, the rest of this section will use relative directories like tmp/ instead of /app/autosubmit/a001/tmp/.

The YAML configuration files of the experiments are stored in the conf/ subdirectory and may import other YAML files from proj/git_project/ (where proj is a directory common to all Autosubmit experiments, but git_project is configurable).

The complete YAML configuration used by Autosubmit, after all files have been included by Autosubmit, is stored at conf/metadata/experiment_data.yml.

The autosubmit commands issued for the experiment a001 will have access to this YAML configuration, and will be logged to files in the platforms configured (local or remote). The log files are later retrieved by Autosubmit automatically, and saved to the machine where the autosubmit command was issued at. The command logs are stored in the directory tmp/ASLOGS.

Running autosubmit setstatus, for example, would produce files that could be stored for example as tmp/ASLOGS/20240319_141712_setstatus.log and tmp/ASLOGS/20240319_141712_setstatus_err.log.. These two files contain the standard output and error output of the autosubmit setstatus command, issued on 2024-03-19 at 14:17:12 (computer time). The “.log” file contains the output produced by Autosubmit, whereas the “_err.log file would contain the error or be empty if no error occurred.

2024-03-19 14:17:17,772 Autosubmit is running with 4.1.0
2024-03-19 14:17:17,782 Preparing .lock file to avoid multiple instances with same expid.
2024-03-19 14:17:17,782 Exp ID: a001
2024-03-19 14:17:17,782 Save: False
2024-03-19 14:17:17,782 Final status: WAITING
2024-03-19 14:17:17,782 List of jobs to change: a001_20200101_fc0_285_SIM a001_20200101_fc0_284_SIM
2024-03-19 14:17:17,782 Chunks to change: None
2024-03-19 14:17:17,782 Status of jobs to change: None
2024-03-19 14:17:17,782 Sections to change: None
...

The workflow task logs are stored in the directory tmp/LOG_<EXPID>, tmp/LOG_a001/ in this example. The task logs are written on the remote platforms used in the experiment configuration (e.g. a cloud server, or HPC). These files are copied automatically by Autosubmit to the computer where the autosubmit command was issued at.

These log files, like the autosubmit commands logs described before, also come in pairs “.out” and “.err”. However, in this case the “.err” file contains the workflow task script source with the Bash Shell script generated by Autosubmit and the expanded parameters (produced with the Bash Shell attribute -x). The file name also contains a timestamp from when the job was started.

[INFO] JOBID=**6709774**
job_name_ptrn='/scratch/a001/LOG_a001/a001_20200101_fc0_337_SIM'
+ job_name_ptrn=/scratch/a001/LOG_a001/a001_20200101_fc0_337_SIM
echo $(date +%s) > ${job_name_ptrn}_STAT
++ date +%s
+ echo 1711509353
...

The .err and .out files both contain the JOBID data, which for remote platforms like HPC batch systems (e.g. Slurm) represent the Job ID. As well as any other output from the workflow task.

Users can also access the jobs data stored by Autosubmit in <AUTOSUBMIT>/metadata/data/job_data_a001.db, to query for information from previous jobs:

$ sqlite3 ~/job_data_a001.db "select job_id from job_data where job_name = 'a001_20200101_fc0_337_SIM';"
6709774
$ # Use sacct, scontrol, etc. in the remote platform to query the Job information