Welcome to autosubmit’s documentation!

Changelog

This page shows the main changes from AS3 to AS4.

Mayor mentions:

  • Python version has changed to 3.7.3 instead of 2.7.
  • Configuration language has changed to YAML.
  • All parameters are now unified into a single dictionary.
  • All sections are now uppercase.
  • All parameters, except for job related ones, have now an hierarchy.
  • An special key, FOR:, has been added. This key allows to create multiple jobs with almost the same configuration.
  • The configuration of autosubmit is now more flexible.
  • New command added, updateproj. This command will update all the scripts and autosubmit configuration.
  • Wrapper definition has changed.
  • Tasks dependencies system has changed.

Warning

updateproj may not translate all the scripts, we recommend to revise your scripts before run AS.

Configuration changes

Now autosubmit is composed by two kind of YAML configurations, the default ones, which are the same as always, and the custom ones.

The custom ones, allows to define custom configurations that will override the default ones, in order to do this, you only have to put the key in the custom configuration file. These custom ones, can be anywhere and have any name, by default they’re inside <expid>/conf but you can change this path in the expdef.yml file. DEFAULT.CUSTOM_CONFIG_DIR

Additionally, you must be aware of the following changes:

  • All sections keys are normalized to UPPERCASE, while values remain as the user put. Beware of the scripts that relies on %CURRENT_HPCARCH% and variables that refer to a platform because they will be always in UPPERCASE. Normalize the script.
  • To define a job, you must put them under the key jobs in any custom configuration file.
  • To define a platform, you must put them under the key platforms in any custom configuration file.
  • To define a loop, you must put the key “FOR” as the first key of the section.
  • You can put any %placeholder% in the proj.conf and custom files, and also you can put %ROOTDIR% in the expdef.yml.
  • All configuration is now based in an hierarchical structure, so to export a var, you must use the following syntax: %KEY.SUBKEY.SUBSUBKEY%. The same goes for override them.
  • YAML has into account the type.

Examples

List of example with the new configuration and the structure as follows

$/autosubmit/a00q/conf$ ls
autosubmit_a00q.yml  custom_conf  expdef_a00q.yml  jobs_a00q.yml  platforms_a00q.yml
$/autosubmit/a00q/conf/custom_conf ls
more_jobs.yml

Configuration

autosubmit_expid.yml

config:
  AUTOSUBMIT_VERSION: 4.0.0b
  MAXWAITINGJOBS: '3000'
  TOTALJOBS: '3000'
  SAFETYSLEEPTIME: 0
  RETRIALS: '10'
mail:
  NOTIFICATIONS: 'False'
  TO: daniel.beltran@bsc.es

expdef_expid.yml

DEFAULT:
  EXPID: a02u
  HPCARCH: local
  CUSTOM_CONFIG_DIR: %ROOTDIR%/conf/custom_conf
experiment:
  DATELIST: '20210811'
  MEMBERS: CompilationEfficiency HardwareBenchmarks WeakScaling StrongScaling
  CHUNKSIZEUNIT: hour
  CHUNKSIZE: '6'
  NUMCHUNKS: '2'
  CALENDAR: standard
rerun:
  RERUN: 'FALSE'
  CHUNKLIST: ''
project:
  PROJECT_TYPE: local
  PROJECT_DESTINATION: r_test
git:
  PROJECT_ORIGIN: https://earth.bsc.es/gitlab/ces/automatic_performance_profiling.git
  PROJECT_BRANCH: autosubmit-makefile1
  PROJECT_COMMIT: ''
svn:
  PROJECT_URL: ''
  PROJECT_REVISION: ''
local:
  PROJECT_PATH: /home/dbeltran/r_test
project_files:
  FILE_PROJECT_CONF: ''
  FILE_JOBS_CONF: ''

jobs_expid.yml

JOBS:
  LOCAL_SETUP:
    FILE: LOCAL_SETUP.sh
    PLATFORM: LOCAL
    RUNNING: "once"
  REMOTE_SETUP:
    FILE: REMOTE_SETUP.sh
    DEPENDENCIES: LOCAL_SETUP
    WALLCLOCK: '00:05'
    RUNNING: once
    NOTIFY_ON: READY SUBMITTED QUEUING COMPLETED
  INI:
    FILE: INI.sh
    DEPENDENCIES: REMOTE_SETUP
    RUNNING: member
    WALLCLOCK: '00:05'
    NOTIFY_ON: READY SUBMITTED QUEUING COMPLETED

  SIM:
    FOR:
      NAME: [20,40,80]
      PROCESSORS: [2,4,8]
      THREADS: [1,1,1]
      DEPENDENCIES: [INI SIM_20-1 CLEAN-2, INI SIM_40-1 CLEAN-2, INI SIM_80-1 CLEAN-2]
      NOTIFY_ON: READY SUBMITTED QUEUING COMPLETED

    FILE: SIM.sh
    DEPENDENCIES: INI SIM_20-1 CLEAN-2
    RUNNING: chunk
    WALLCLOCK: '00:05'
    TASKS: '1'
    NOTIFY_ON: READY SUBMITTED QUEUING COMPLETED

  POST:
    FOR:
      NAME: [ 20,40,80 ]
      PROCESSORS: [ 20,40,80 ]
      THREADS: [ 1,1,1 ]
      DEPENDENCIES: [ SIM_20 POST_20-1,SIM_40 POST_40-1,SIM_80 POST_80-1 ]
    FILE: POST.sh
    RUNNING: chunk
    WALLCLOCK: '00:05'
  CLEAN:
    FILE: CLEAN.sh
    DEPENDENCIES: POST_20 POST_40 POST_80
    RUNNING: chunk
    WALLCLOCK: '00:05'
  TRANSFER:
    FILE: TRANSFER.sh
    PLATFORM: LOCAL
    DEPENDENCIES: CLEAN
    RUNNING: member

platforms_expid.yml

Platforms:
  MaReNoStRuM4:
    TYPE: slurm
    HOST: bsc
    PROJECT: bsc32
    USER: bsc32070
    QUEUE: debug
    SCRATCH_DIR: /gpfs/scratch
    ADD_PROJECT_TO_HOST: False
    MAX_WALLCLOCK: '48:00'
    USER_TO: pr1enx13
    TEMP_DIR: ''
    SAME_USER: False
    PROJECT_TO: pr1enx00
    HOST_TO: bscprace
  marenostrum_archive:
    TYPE: ps
    HOST: dt02.bsc.es
    PROJECT: bsc32
    USER: bsc32070
    SCRATCH_DIR: /gpfs/scratch
    ADD_PROJECT_TO_HOST: 'False'
    TEST_SUITE: 'False'
    USER_TO: pr1enx13
    TEMP_DIR: /gpfs/scratch/bsc32/bsc32070/test_migrate
    SAME_USER: false
    PROJECT_TO: pr1enx00
    HOST_TO: transferprace
  transfer_node:
    TYPE: ps
    HOST: dt01.bsc.es
    PROJECT: bsc32
    USER: bsc32070
    ADD_PROJECT_TO_HOST: false
    SCRATCH_DIR: /gpfs/scratch
    USER_TO: pr1enx13
    TEMP_DIR: /gpfs/scratch/bsc32/bsc32070/test_migrate
    SAME_USER: false
    PROJECT_TO: pr1enx00
    HOST_TO: transferprace
  transfer_node_bscearth000:
    TYPE: ps
    HOST: bscearth000
    USER: dbeltran
    PROJECT: Earth
    ADD_PROJECT_TO_HOST: false
    QUEUE: serial
    SCRATCH_DIR: /esarchive/scratch
    USER_TO: dbeltran
    TEMP_DIR: ''
    SAME_USER: true
    PROJECT_TO: Earth
    HOST_TO: bscpraceearth000
  bscearth000:
    TYPE: ps
    HOST: bscearth000
    USER: dbeltran
    PROJECT: Earth
    ADD_PROJECT_TO_HOST: false
    QUEUE: serial
    SCRATCH_DIR: /esarchive/scratch
  nord3:
    TYPE: SLURM
    HOST: nord1.bsc.es
    PROJECT: bsc32
    USER: bsc32070
    QUEUE: debug
    SCRATCH_DIR: /gpfs/scratch
    MAX_WALLCLOCK: '48:00'
    USER_TO: pr1enx13
    TEMP_DIR: ''
    SAME_USER: true
    PROJECT_TO: pr1enx00
  ecmwf-xc40:
    TYPE: ecaccess
    VERSION: pbs
    HOST: cca
    USER: c3d
    PROJECT: spesiccf
    ADD_PROJECT_TO_HOST: false
    SCRATCH_DIR: /scratch/ms
    QUEUE: np
    SERIAL_QUEUE: ns
    MAX_WALLCLOCK: '48:00'

custom_conf/more_jobs.yml

jobs:
  Additional_job_1:
    FILE: extrajob.sh
    DEPENDENCIES: POST_20
    RUNNING: once
  additional_job_2:
    FILE: extrajob.sh
    RUNNING : once

Wrappers definition

To define a the wrappers:

wrappers:
  wrapper_sim20:
    TYPE: "vertical"
    JOBS_IN_WRAPPER: "SIM_20"
  wrapper_sim40:
    TYPE: "vertical"
    JOBS_IN_WRAPPER: "SIM_40"

Loops definition

To define a loop, you need to use the FOR key and also the NAME key.

In order to generate the following jobs:

POST_20:
      FILE: POST.sh
      RUNNING: chunk
      WALLCLOCK: '00:05'
      PROCESSORS: 20
      THREADS: 1
      DEPENDENCIES: SIM_20 POST_20-1
POST_40:
      FILE: POST.sh
      RUNNING: chunk
      WALLCLOCK: '00:05'
      PROCESSORS: 40
      THREADS: 1
      DEPENDENCIES: SIM_40 POST_40-1
POST_80:
      FILE: POST.sh
      RUNNING: chunk
      WALLCLOCK: '00:05'
      PROCESSORS: 80
      THREADS: 1
      DEPENDENCIES: SIM_80 POST_80-1

One can use now the following configuration:

POST:
    FOR:
      NAME: [ 20,40,80 ]
      PROCESSORS: [ 20,40,80 ]
      THREADS: [ 1,1,1 ]
      DEPENDENCIES: [ SIM_20 POST_20-1,SIM_40 POST_40-1,SIM_80 POST_80-1 ]
    FILE: POST.sh
    RUNNING: chunk
    WALLCLOCK: '00:05'

Warning

Only the parameters that changes must be included inside the FOR key.

DEPENDENCIES

The DEPENDENCIES key is used to define the dependencies of a job. It can be used in the following ways:

  • Basic: The dependencies are a list of jobs, separated by ” “, that runs before the current task is submitted.
  • New: The dependencies is a list of YAML sections, separated by “n”, that runs before the current job is submitted.
    • For each dependency section, you can designate the following keywords to control the current job-affected tasks:
      • DATES_FROM: Selects the job dates that you want to alter.
      • MEMBERS_FROM: Selects the job members that you want to alter.
      • CHUNKS_FROM: Selects the job chunks that you want to alter.
    • For each dependency section and *_FROM keyword, you can designate the following keywords to control the destination of the dependency:
      • DATES_TO: Links current selected tasks to the dependency tasks of the dates specified.
      • MEMBERS_TO: Links current selected tasks to the dependency tasks of the members specified.
      • CHUNKS_TO: Links current selected tasks to the dependency tasks of the chunks specified.
    • Important keywords for [DATES|MEMBERS|CHUNKS]_TO:
      • “natural”: Will keep the default linkage.
      • “all”: Will link selected tasks of the dependency with current selected tasks.
      • “none”: Will unlink selected tasks of the dependency with current selected tasks.

For the new format, consider that the priority is hierarchy and goes like this DATES_FROM -(includes)-> MEMBERS_FROM -(includes)-> CHUNKS_FROM.

  • You can define a DATES_FROM inside the DEPENDENCY.
  • You can define a MEMBERS_FROM inside the DEPENDENCY and DEPENDENCY.DATES_FROM.
  • You can define a CHUNKS_FROM inside the DEPENDENCY, DEPENDENCY.DATES_FROM, DEPENDENCY.MEMBERS_FROM, DEPENDENCY.DATES_FROM.MEMBERS_FROM

For the examples, we will consider that our experiment has the following configuration:

EXPERIMENT:
    DATELIST: 202201[01-02]
    MEMBERS: FC1 FC2
    NUMCHUNKS: 4

Basic

JOBS:
  JOB_1:
      FILE: job1.sh
      RUNNING: chunk
  JOB_2:
      FILE: job2.sh
      DEPENDENCIES: JOB_1
      RUNNING: chunk
  JOB_3:
      FILE: job3.sh
      DEPENDENCIES: JOB_2
      RUNNING: chunk
  SIM:
      FILE: sim.sh
      DEPENDENCIES: JOB_3 SIM-1
      RUNNING: chunk
  POST:
      FILE: post.sh
      DEPENDENCIES: SIM
      RUNNING: chunk
  TEST:
      FILE: test.sh
      DEPENDENCIES: POST
      RUNNING: chunk

New format

JOBS:
  JOB_1:
      FILE: job1.sh
      RUNNING: chunk
  JOB_2:
      FILE: job2.sh
      DEPENDENCIES:
          JOB_1:
              dates_to: "natural"
              members_to: "natural"
              chunks_to: "natural"
      RUNNING: chunk
  JOB_3:
      FILE: job3.sh
      DEPENDENCIES:
          JOB_2:
              dates_to: "natural"
              members_to: "natural"
              chunks_to: "natural"
      RUNNING: chunk
  SIM:
      FILE: sim.sh
      DEPENDENCIES:
          JOB_3:
              dates_to: "natural"
              members_to: "natural"
              chunks_to: "natural"
          SIM-1:
              dates_to: "natural"
              members_to: "natural"
              chunks_to: "natural"
      RUNNING: chunk
  POST:
      FILE: post.sh
      DEPENDENCIES:
          SIM:
              dates_to: "natural"
              members_to: "natural"
              chunks_to: "natural"
      RUNNING: chunk
  TEST:
      FILE: test.sh
      DEPENDENCIES:
          POST:
              dates_to: "natural"
              members_to: "natural"
              chunks_to: "natural"
      RUNNING: chunk

Example 1: New format with specific dependencies

JOBS:
    JOB_1:
        FILE: job1.sh
        RUNNING: chunk
    JOB_2:
        FILE: job2.sh
        DEPENDENCIES:
            JOB_1:
                dates_to: "natural"
                members_to: "natural"
                chunks_to: "natural"
        RUNNING: chunk
    JOB_3:
        FILE: job3.sh
        DEPENDENCIES:
            JOB_2:
                dates_to: "natural"
                members_to: "natural"
                chunks_to: "natural"
        RUNNING: chunk
    SIM:
        FILE: sim.sh
        DEPENDENCIES:
            JOB_3:
            SIM-1:
            SIM:
                MEMBERS_FROM:
                  FC2:
                    CHUNKS_FROM:
                     1:
                      dates_to: "all"
                      members_to: "FC1"
                      chunks_to: "4"
        RUNNING: chunk
    POST:
        FILE: post.sh
        DEPENDENCIES:
            SIM:
        RUNNING: chunk
    TEST:
        FILE: test.sh
        DEPENDENCIES:
            POST:
              members_to: "FC2"
              chunks_to: 4
        RUNNING: once

# too add img

Introduction

What is Autosubmit ?

Autosubmit is a python-based workflow manager to create, manage and monitor experiments by using Computing Clusters, HPC’s and Supercomputers remotely via ssh. It has support for experiments running in more than one HPC and for different workflow configurations.
Autosubmit is currently used at Barcelona Supercomputing Centre (BSC) to run models (EC-Earth, MONARCH, NEMO, CALIOPE, HERMES...), operational toolchains (S2S4E), data-download workflows (ECMWF MARS), and many other.
Autosubmit has been used to manage models running at supercomputers in BSC, ECMWF, IC3, CESGA, EPCC, PDC and OLCF.
Autosubmit is now available via PyPi package under the terms of GNU General Public License.

Get involved or contact us:

GitLab: https://earth.bsc.es/gitlab/es/autosubmit
Mail: support-autosubmit@bsc.es

Why is Autosubmit needed ?

Autosubmit is the only existing tool that satisfies the following requirements from the weather and climate community:

  • Automatisation Job submission to machines and dependencies between jobs are managed by Autosubmit. No user intervention is needed.
  • Data provenance Assigns unique identifiers for each experiment and stores information about model version, experiment configuration and computing facilities used in the whole process.
  • Failure tolerance Automatic retrials and ability to rerun chunks in case of corrupted or missing data.
  • Resource management Autosubmit manages supercomputer particularities, allowing users to run their experiments in the available machine without having to adapt the code. Autosubmit also allows to submit tasks from the same experiment to different platforms.

How does Autosubmit work ?

You can find help about how to use autosubmit and a list of available commands, just executing:

autosubmit -h

Execute autosubmit <command> -h for detailed help for each command:

autosubmit expid -h

Experiment creation

To create a new experiment, run the command:

autosubmit expid -H HPCname -d Description

HPCname is the name of the main HPC platform for the experiment: it will be the default platform for the tasks. Description is a brief experiment description.

This command assigns a unique four character identifier (xxxx, names starting from a letter, the other three characters) to the experiment and creates a new folder in experiments repository with structure shown in Figure 1.

experiment folder

Example of an experiment directory tree.

Experiment configuration

To configure the experiment, edit expdef_xxxx.conf, jobs_xxxx.conf and platforms_xxxx.conf in the conf folder of the experiment (see contents in Figure 2).

configuration files

Configuration files content

After that, you are expected to run the command:

autosubmit create xxxx

This command creates the experiment project in the proj folder. The experiment project contains the scripts specified in jobs_xxxx.conf and a copy of model source code and data specified in expdef_xxxx.conf.

Experiment run

To run the experiment, just execute the command:

autosubmit run xxxx

Autosubmit will start submitting jobs to the relevant platforms (both HPC and supporting computers) by using the scripts specified in jobs_xxxx.conf. Autosubmit will substitute variables present on scripts where handlers appear in %variable_name% format. Autosubmit provides variables for current chunk, start date, member, computer configuration and more, and also will replace variables form proj_xxxx.conf.

To monitor the status of the experiment, the command:

autosubmit monitor xxxx

is available. This will plot the workflow of the experiment and the current status.

experiment plot

Example of monitoring plot for EC-Earth run with Autosubmit for 1 start date, 1 member and 3 chunks.

Tutorial start guide

This tutorial is a starter’s guide to running a dummy experiment with Autosubmit.

Dummy experiments run workflows with non-expensive empty tasks and therefore are ideal for teaching and testing purposes.

Real experiments instead run workflows with complex tasks. To read information about how to develop parameterizable tasks for Autosubmit workflows, refer to Developing a project.

Pre-requisites

Autosubmit needs to establish password-less SSH connections in order to run and monitor workflows on remote platforms.

Ensure that you have a password-less connection to all platforms you want to use in your experiment. If you are unsure how to do this, please follow these instructions:

  • Open a terminal and prompt `ssh-keygen -t rsa -b 4096 -C "email@email.com" -m PEM`
  • Copy the resulting key to your platform of choice. Via SCP or ssh-copy-key.

Description of most used commands

Command Short description
expid Creates a new experiment and generates a new entry in the database by giving it a serial id composed of 4 letters. In addition, it also creates the folder experiment and the basic folder structure.
create <expid> Generates the experiment workflow.
run <expid> Euns the experiment workflow.
monitor <expid> Shows the experiment workflow structure and status.
inspect <expid> Generates Autosubmit scripts and batch scripts for inspection, by processing the tasks’ templates with the experiment parameters.
refresh <expid> Updates the project directory.
recovey <expid> Recovers the experiment workflow obtaining the last run complete jobs.
setstatus <expid> Sets one or multiple jobs status to a given value.

Create a new experiment

autosubmit expid -dm -H "local" -d "Tutorial"
  • -dm: Generates a dummy experiment.
  • -H: Sets the principal experiment platform.
  • -d: Sets a short description for the experiment.

The output of the command will show the expid of the experiment and generate the following directory structure:

Experiment folder Contains
conf Experiment configuration files.
pkl Workflow pkl files.
plot Visualization output files
tmp Logs, templates and misc files.
proj User scripts and project code. (empty)

Then, prompt autosubmit create <expid> -np and Autosubmit will generate the workflow graph.

Run and monitoring:

To run an experiment, use `autosubmit run <expid>`. Autosubmit run experiments performing the following operations:

  • First, it checks the experiment configuration. If it is wrong, it won’t proceed further.
  • Second, it runs the experiment while retrieving all logs from completed or failed tasks as they run.
  • Third, it manages all the workflow steps by following the dependencies defined by the user until all jobs are in COMPLETED or FAILED status. There can be jobs left in WAITING status if their dependencies are in FAILED status.

While the experiment is running, it can be visualized via autosubmit monitor <expid>.

experiment_view

illustrates the output of the autosubmit monitor. It describes all workflow jobs’ possible status and actual status.

At the same time, the <expid>/tmp gets filled with the cmd scripts generated by Autosubmit to run the local and remote tasks (in this case, they are sent and submitted to the remote platform(s)).

On the other hand, the ASLOGS and LOG_a000 folders are filling up with AS command logs and jobs logs.

Configuration summary:

In the folder <expid>/conf there are different files that define the actual experiment configuration.
File Content
expdef.conf
  • It contains the default platform, the one set with -H.
  • Allows changing the start dates, members and chunks.
  • Allows changing the experiment project source ( git, local, svn or dummy)
platforms.conf
  • It contains the list of platforms to use in the experiment.
  • This file contains the definitions for managing clusters, fat-nodes and support computers.
  • This file must be filled-up with the platform(s) configuration(s).
  • Several platforms can be defined and used in the same experiment.
jobs.conf
  • It contains the tasks’ definitions in sections. Depending on the parameters, one section can generate multiple similar tasks.
  • This file must be filled-up with the tasks’ definitions.
  • Several sections can be defined and used in the same experiment.
autosubmit.conf
  • This file contains the definitions that impact the workflow behavior.
  • It changes workflow behavior with parameters such as job limitations, remote_dependendies and retrials.
  • It extends autosubmit functionalities with parameters such as wrappers and mail notification.
proj.conf
  • This file contains the configuration used by the user scripts.
  • This file is fully customizable for the current experiment. Allows setting user- parameters that will be readable by the autosubmit jobs.

Final step: Modify and run

It is time to look into the configuration files of the dummy experiment and modify them with a remote platform to run a workflow with a few more chunks.

Open expdef.conf

[DEFAULT]
EXPID = a000 #<- don't change
HPCARCH = local # Change for your new main platform name, ej. marenostrum4

# Locate and  change these parameters, per ej. numchunks = 3
[experiment]
DATELIST = 20000101
MEMBERS = fc0
NUMCHUNKS = 1
(...)

Now open platforms.conf. Note: This will be an example for marenostrum4

[marenostrum4]
# Queue type. Options: ps, SGE, LSF, SLURM, PBS, eceaccess
TYPE = slurm # scheduler type
HOST = mn1.bsc.es,mn2.bsc.es,mn3.bsc.es
PROJECT = bsc32 # <- your project
USER = bsc32070 # <- your user
SCRATCH_DIR = /gpfs/scratch
ADD_PROJECT_TO_HOST = False
# use 72:00 if you are using a PRACE account, 48:00 for the bsc account
MAX_WALLCLOCK = 02:00
# use 19200 if you are using a PRACE account, 2400 for the bsc account
MAX_PROCESSORS = 2400
PROCESSORS_PER_NODE = 48
SERIAL_QUEUE = debug
QUEUE = debug

autosubmit create <expid>** (without -np) will generate the new workflow and autosubmit run <expid> will run the experiment with the latest changes.

Developing a project

This section contains some examples on how to develop a new project.

All files, with the exception of user-defined scripts, are located in the <expid>/conf directory.

Configuration files are written in ini format. In the other hand, the user-defined scripts are written in bash/python or R format.

To configure the experiment, edit autosubmit_cxxx.conf, expdef_cxxx.conf, jobs_cxxx.conf , platforms_cxxx.conf and proj_cxxx.conf` in the conf folder of the experiment.

Expdef configuration

vi <experiments_directory>/cxxx/conf/expdef_cxxx.conf
[DEFAULT]
# Experiment identifier
# No need to change
EXPID = cxxx
# HPC name.
# No need to change
HPCARCH = ithaca

[experiment]
# Supply the list of start dates. Available formats: YYYYMMDD YYYYMMDDhh YYYYMMDDhhmm
# Also you can use an abbreviated syntax for multiple dates with common parts:
# 200001[01 15] <=> 20000101 20000115
# DATELIST = 19600101 19650101 19700101
# DATELIST = 1960[0101 0201 0301]
DATELIST = 19900101
# Supply the list of members. LIST = fc0 fc1 fc2 fc3 fc4
MEMBERS = fc0
# Chunk size unit. STRING = hour, day, month, year
CHUNKSIZEUNIT = month
# Chunk size. NUMERIC = 4, 6, 12
CHUNKSIZE = 1
# Total number of chunks in experiment. NUMERIC = 30, 15, 10
NUMCHUNKS = 2
# Calendar used. LIST: standard, noleap
CALENDAR = standard
# List of members that can be included in this run. Optional.
# RUN_ONLY_MEMBERS = fc0 fc1 fc2 fc3 fc4
# RUN_ONLY_MEMBERS = fc[0-4]
RUN_ONLY_MEMBERS =

[rerun]
# Is a rerun or not? [Default: Do set FALSE]. BOOLEAN = TRUE, FALSE
RERUN = FALSE
# If RERUN = TRUE then supply the list of jobs to rerun
RERUN_JOBLIST =

[project]
# Select project type. STRING = git, svn, local, none
# If PROJECT_TYPE is set to none, Autosubmit self-contained dummy templates will be used
PROJECT_TYPE = git
# Destination folder name for project. type = STRING, default = leave empty,
PROJECT_DESTINATION = model

# If PROJECT_TYPE is not git, no need to change
[git]
# Repository URL  STRING = 'https://github.com/torvalds/linux.git'
PROJECT_ORIGIN = https://gitlab.cfu.local/cfu/auto-ecearth3.git
# Select branch or tag, STRING, default = 'master',
# help = {'master' (default), 'develop', 'v3.1b', ...}
PROJECT_BRANCH = develop
# type = STRING, default = leave empty, help = if model branch is a TAG leave empty
PROJECT_COMMIT =

# If PROJECT_TYPE is not svn, no need to change
[svn]
# type = STRING, help = 'https://svn.ec-earth.org/ecearth3'
PROJECT_URL =
# Select revision number. NUMERIC = 1778
PROJECT_REVISION =

# If PROJECT_TYPE is not local, no need to change
[local]
# type = STRING, help = /foo/bar/ecearth
PROJECT_PATH =

# If PROJECT_TYPE is none, no need to change
[project_files]
# Where is PROJECT CONFIGURATION file location relative to project root path
FILE_PROJECT_CONF = templates/ecearth3/ecearth3.conf
# Where is JOBS CONFIGURATION file location relative to project root path
FILE_JOBS_CONF = templates/common/jobs.conf

Autosubmit configuration

vi <experiments_directory>/cxxx/conf/autosubmit_cxxx.conf
[config]
# Experiment identifier
# No need to change
EXPID =
# No need to change.
# Autosubmit version identifier
AUTOSUBMIT_VERSION =
# Default maximum number of jobs to be waiting in any platform
# Default = 3
MAXWAITINGJOBS = 3
# Default maximum number of jobs to be running at the same time at any platform
# Can be set at platform level on the platform_cxxx.conf file
# Default = 6
TOTALJOBS = 6
# Time (seconds) between connections to the HPC queue scheduler to poll already submitted jobs status
# Default = 10
SAFETYSLEEPTIME = 10
# Number of retrials if a job fails. Can ve override at job level
# Default = 0
RETRIALS = 0
##  Allows to put a delay between retries, of retrials if a job fails. If not specified, it will be static
# DELAY_RETRY_TIME = 11
# DELAY_RETRY_TIME = +11 # will wait 11,22,33,44...
# DELAY_RETRY_TIME = *11 # will wait 11,110,1110,11110...
# Default output type for CREATE, MONITOR, SET STATUS, RECOVERY. Available options: pdf, svg, png, ps, txt
# Default = pdf
OUTPUT = pdf
# [wrappers]

Jobs configuration

vi <experiments_directory>/cxxx/conf/jobs_cxxx.conf
# Example job with all options specified

## Job name
# [JOBNAME]
## Script to execute. If not specified, job will be omitted from workflow.
## Path relative to the project directory
# FILE =
## Platform to execute the job. If not specified, defaults to HPCARCH in expedf file.
## LOCAL is always defined and refers to current machine
# PLATFORM =
## Queue to add the job to. If not specified, uses PLATFORM default.
# QUEUE =
## Defines dependencies from job as a list of parents jobs separated by spaces.
## Dependencies to jobs in previous chunk, member o startdate, use -(DISTANCE)
# DEPENDENCIES = INI SIM-1 CLEAN-2
## Define if jobs runs once, once per stardate, once per member or once per chunk. Options: once, date, member, chunk.
## If not specified, defaults to once
# RUNNING = once
## Specifies that job has only to be run after X dates, members or chunk. A job will always be created for the last
## If not specified, defaults to 1
# FREQUENCY = 3
## On a job with FREQUENCY > 1, if True, the dependencies are evaluated against all
## jobs in the frequency interval, otherwise only evaluate dependencies against current
## iteration.
## If not specified, defaults to True
# WAIT = False
## Defines if job is only to be executed in reruns. If not specified, defaults to false.
# RERUN_ONLY = False
## Wallclock to be submitted to the HPC queue in format HH:MM
# WALLCLOCK = 00:05

## Processors number to be submitted to the HPC. If not specified, defaults to 1.
## Wallclock chunk increase (WALLCLOCK will be increased according to the formula WALLCLOCK + WCHUNKINC * (chunk - 1)).
## Ideal for sequences of jobs that change their expected running time according to the current chunk.
# WCHUNKINC = 00:01
# PROCESSORS = 1
## Threads number to be submitted to the HPC. If not specified, defaults to 1.
# THREADS = 1
## Enables hyper-threading. If not specified, defaults to false.
# HYPERTHREADING = false
## Tasks number to be submitted to the HPC. If not specified, defaults to 1.
# Tasks = 1
## Memory requirements for the job in MB
# MEMORY = 4096
##  Number of retrials if a job fails. If not specified, defaults to the value given on experiment's autosubmit.conf
# RETRIALS = 4
##  Allows to put a delay between retries, of retrials if a job fails. If not specified, it will be static
# DELAY_RETRY_TIME = 11
# DELAY_RETRY_TIME = +11 # will wait 11,22,33,44...
# DELAY_RETRY_TIME = *11 # will wait 11,110,1110,11110...
## Some jobs can not be checked before running previous jobs. Set this option to false if that is the case
# CHECK = False
## Select the interpreter that will run the job. Options: bash, python, r Default: bash
# TYPE = bash
## Specify the path to the interpreter. If empty, use system default based on job type  . Default: empty
# EXECUTABLE = /my_python_env/python3


[LOCAL_SETUP]
FILE = LOCAL_SETUP.sh
PLATFORM = LOCAL

[REMOTE_SETUP]
FILE = REMOTE_SETUP.sh
DEPENDENCIES = LOCAL_SETUP
WALLCLOCK = 00:05

[INI]
FILE = INI.sh
DEPENDENCIES = REMOTE_SETUP
RUNNING = member
WALLCLOCK = 00:05

[SIM]
FILE = SIM.sh
DEPENDENCIES = INI SIM-1 CLEAN-2
RUNNING = chunk
WALLCLOCK = 00:05
PROCESSORS = 2
THREADS = 1

[POST]
FILE = POST.sh
DEPENDENCIES = SIM
RUNNING = chunk
WALLCLOCK = 00:05

[CLEAN]
FILE = CLEAN.sh
DEPENDENCIES = POST
RUNNING = chunk
WALLCLOCK = 00:05

[TRANSFER]
FILE = TRANSFER.sh
PLATFORM = LOCAL
DEPENDENCIES = CLEAN
RUNNING = member

Platform configuration

vi <experiments_directory>/cxxx/conf/platforms_cxxx.conf
# Example platform with all options specified

## Platform name
# [PLATFORM]
## Queue type. Options: PBS, SGE, PS, LSF, ecaccess, SLURM
# TYPE =
## Version of queue manager to use. Needed only in PBS (options: 10, 11, 12) and ecaccess (options: pbs, loadleveler)
# VERSION =
## Hostname of the HPC
# HOST =
## Project for the machine scheduler
# PROJECT =
## Budget account for the machine scheduler. If omitted, takes the value defined in PROJECT
# BUDGET =
## Option to add project name to host. This is required for some HPCs
# ADD_PROJECT_TO_HOST = False
## User for the machine scheduler
# USER =
## Path to the scratch directory for the machine
# SCRATCH_DIR = /scratch
## If true, autosubmit test command can use this queue as a main queue. Defaults to false
# TEST_SUITE = False
## If given, autosubmit will add jobs to the given queue
# QUEUE =
## If specified, autosubmit will run jobs with only one processor in the specified platform.
# SERIAL_PLATFORM = SERIAL_PLATFORM_NAME
## If specified, autosubmit will run jobs with only one processor in the specified queue.
## Autosubmit will ignore this configuration if SERIAL_PLATFORM is provided
# SERIAL_QUEUE = SERIAL_QUEUE_NAME
## Default number of processors per node to be used in jobs
# PROCESSORS_PER_NODE =
## Default Maximum number of jobs to be waiting in any platform queue
## Default = 3
# MAX_WAITING_JOBS = 3
## Default maximum number of jobs to be running at the same time at the platform.
## Applies at platform level. Considers QUEUEING + RUNNING jobs.
## Ideal for configurations where some remote platform has a low upper limit of allowed jobs per user at the same time.
## Default = 6
# TOTAL_JOBS = 6

[ithaca]
# Queue type. Options: ps, SGE, LSF, SLURM, PBS, eceaccess
TYPE = SGE
HOST = ithaca
PROJECT = cfu
ADD_PROJECT_TO_HOST = true
USER = dbeltran
SCRATCH_DIR = /scratch/cfu
TEST_SUITE = True

Proj configuration

After filling the experiment configuration and promt autosubmit create cxxx -np create, user can go into proj which has a copy of the model.

The experiment project contains the scripts specified in jobs_cxxx.conf and a copy of model source code and data specified in expdef_xxxx.conf.

To configure experiment project parameters for the experiment, edit proj_cxxx.conf.

proj_cxxx.conf contains:
  • The project dependant experiment variables that Autosubmit will substitute in the scripts to be run.

Warning

The proj_xxxx.conf has to be defined in INI style so it should has section headers. At least one.

Example:

vi <experiments_directory>/cxxx/conf/proj_cxxx.conf
[common]
# No need to change.
MODEL = ecearth
# No need to change.
VERSION = v3.1
# No need to change.
TEMPLATE_NAME = ecearth3
# Select the model output control class. STRING = Option
# listed under the section : https://earth.bsc.es/wiki/doku.php?id=overview_outclasses
OUTCLASS = specs
# After transferring output at /cfunas/exp remove a copy available at permanent storage of HPC
# [Default: Do set "TRUE"]. BOOLEAN = TRUE, FALSE
MODEL_output_remove = TRUE
# Activate cmorization [Default: leave empty]. BOOLEAN = TRUE, FALSE
CMORIZATION = TRUE
# Essential if cmorization is activated.
# STRING =  (http://www.specs-fp7.eu/wiki/images/1/1c/SPECS_standard_output.pdf)
CMORFAMILY =
# Supply the name of the experiment associated (if there is any) otherwise leave it empty.
# STRING (with space) = seasonal r1p1, seaiceinit r?p?
ASSOCIATED_EXPERIMENT =
# Essential if cmorization is activated (Forcing). STRING = Nat,Ant (Nat and Ant is a single option)
FORCING =
# Essential if cmorization is activated (Initialization description). STRING = N/A
INIT_DESCR =
# Essential if cmorization is activated (Physics description). STRING = N/A
PHYS_DESCR =
# Essential if cmorization is activated (Associated model). STRING = N/A
ASSOC_MODEL =

[grid]
# AGCM grid resolution, horizontal (truncation T) and vertical (levels L).
# STRING = T159L62, T255L62, T255L91, T511L91, T799L62 (IFS)
IFS_resolution = T511L91
# OGCM grid resolution. STRING = ORCA1L46, ORCA1L75, ORCA025L46, ORCA025L75 (NEMO)
NEMO_resolution = ORCA025L75

[oasis]
# Coupler (OASIS) options.
OASIS3 = yes
# Number of pseudo-parallel cores for coupler [Default: Do set "7"]. NUMERIC = 1, 7, 10
OASIS_nproc = 7
# Handling the creation of coupling fields dynamically [Default: Do set "TRUE"].
# BOOLEAN = TRUE, FALSE
OASIS_flds = TRUE

[ifs]
# Atmospheric initial conditions ready to be used.
# STRING = ID found here : https://earth.bsc.es/wiki/doku.php?id=initial_conditions:atmospheric
ATM_ini =
# A different IC member per EXPID member ["PERT"] or which common IC member
# for all EXPID members ["fc0" / "fc1"]. String = PERT/fc0/fc1...
ATM_ini_member =
# Set timestep (in sec) w.r.t resolution.
# NUMERIC = 3600 (T159), 2700 (T255), 900 (T511), 720 (T799)
IFS_timestep = 900
# Number of parallel cores for AGCM component. NUMERIC = 28, 100
IFS_nproc = 640
# Coupling frequency (in hours) [Default: Do set "3"]. NUMERIC = 3, 6
RUN_coupFreq = 3
# Post-processing frequency (in hours) [Default: Do set "6"]. NUMERIC = 3, 6
NFRP = 6
# [Default: Do set "TRUE"]. BOOLEAN = TRUE, FALSE
LCMIP5 = TRUE
# Choose RCP value [Default: Do set "2"]. NUMERIC = 0, 1=3-PD, 2=4.5, 3=6, 4=8.5
NRCP = 0
# [Default: Do set "TRUE"]. BOOLEAN = TRUE, FALSE
LHVOLCA = TRUE
# [Default: Do set "0"]. NUMERIC = 1850, 2005
NFIXYR = 0
# Save daily output or not [Default: Do set "FALSE"]. BOOLEAN = TRUE, FALSE
SAVEDDA = FALSE
# Save reduced daily output or not [Default: Do set "FALSE"]. BOOLEAN = TRUE, FALSE
ATM_REDUCED_OUTPUT = FALSE
# Store grib codes from SH files [User need to refer defined  ppt* files for the experiment]
ATM_SH_CODES =
# Store levels against "ATM_SH_CODES" e.g: level1,level2,level3, ...
ATM_SH_LEVELS =
# Store grib codes from GG files [User need to refer defined  ppt* files for the experiment]
ATM_GG_CODES =
# Store levels against "ATM_GG_CODES" (133.128, 246.128, 247.128, 248.128)
# e.g: level1,level2,level3, ...
ATM_GG_LEVELS =
# SPPT stochastic physics active or not [Default: set "FALSE"]. BOOLEAN = TRUE, FALSE
LSPPT = FALSE
# Write the perturbation patterns for SPPT or not [Default: set "FALSE"].
# BOOLEAN = TRUE, FALSE
LWRITE_ARP =
# Number of scales for SPPT [Default: set 3]. NUMERIC = 1, 2, 3
NS_SPPT =
# Standard deviations of each scale [Default: set 0.50,0.25,0.125]
# NUMERIC values separated by ,
SDEV_SPPT =
# Decorrelation times (in seconds) for each scale [Default: set 2.16E4,2.592E5,2.592E6]
# NUMERIC values separated by ,
TAU_SPPT =
# Decorrelation lengths (in meters) for each scale [Default: set 500.E3,1000.E3,2000.E3]
# NUMERIC values separated by ,
XLCOR_SPPT =
# Clipping ratio (number of standard deviations) for SPPT [Default: set 2] NUMERIC
XCLIP_SPPT =
# Stratospheric tapering in SPPT [Default: set "TRUE"]. BOOLEAN = TRUE, FALSE
LTAPER_SPPT =
# Top of stratospheric tapering layer in Pa [Default: set to 50.E2] NUMERIC
PTAPER_TOP =
# Bottom of stratospheric tapering layer in Pa [Default: set to 100.E2] NUMERIC
PTAPER_BOT =
## ATMOSPHERIC NUDGING PARAMETERS ##
# Atmospheric nudging towards re-interpolated ERA-Interim data. BOOLEAN = TRUE, FALSE
ATM_NUDGING = FALSE
# Atmospheric nudging reference data experiment name. [T255L91: b0ir]
ATM_refnud =
# Nudge vorticity. BOOLEAN = TRUE, FALSE
NUD_VO =
# Nudge divergence. BOOLEAN = TRUE, FALSE
NUD_DI =
# Nudge temperature. BOOLEAN = TRUE, FALSE
NUD_TE =
# Nudge specific humidity. BOOLEAN = TRUE, FALSE
NUD_Q =
# Nudge liquid water content. BOOLEAN = TRUE, FALSE
NUD_QL =
# Nudge ice water content. BOOLEAN = TRUE, FALSE
NUD_QI =
# Nudge cloud fraction. BOOLEAN = TRUE, FALSE
NUD_QC =
# Nudge log of surface pressure. BOOLEAN = TRUE, FALSE
NUD_LP =
# Relaxation coefficient for vorticity. NUMERIC in ]0,inf[;
# 1 means half way between model value and ref value
ALPH_VO =
# Relaxation coefficient for divergence. NUMERIC in ]0,inf[;
# 1 means half way between model value and ref value
ALPH_DI =
# Relaxation coefficient for temperature. NUMERIC in ]0,inf[;
# 1 means half way between model value and ref value
ALPH_TE =
# Relaxation coefficient for specific humidity. NUMERIC in ]0,inf[;
# 1 means half way between model value and ref value
ALPH_Q =
# Relaxation coefficient for log surface pressure. NUMERIC in ]0,inf[;
# 1 means half way between model value and ref value
ALPH_LP =
# Nudging area Northern limit [Default: Do set "90"]
NUD_NLAT =
# Nudging area Southern limit [Default: Do set "-90"]
NUD_SLAT =
# Nudging area Western limit NUMERIC in [0,360] [Default: Do set "0"]
NUD_WLON =
# Nudging area Eastern limit NUMERIC in [0,360] [Default: Do set "360"; E<W will span Greenwich]
NUD_ELON =
# Nudging vertical levels : lower level [Default: Do set "1"]
NUD_VMIN =
# Nudging vertical levels : upper level [Default: Do set to number of vertical levels]
NUD_VMAX =

[nemo]
# Ocean initial conditions ready to be used. [Default: leave empty].
# STRING = ID found here : https://earth.bsc.es/wiki/doku.php?id=initial_conditions:oceanic
OCEAN_ini =
# A different IC member per EXPID member ["PERT"] or which common IC member
# for all EXPID members ["fc0" / "fc1"]. String = PERT/fc0/fc1...
OCEAN_ini_member =
# Set timestep (in sec) w.r.t resolution. NUMERIC = 3600 (ORCA1), 1200 (ORCA025)
NEMO_timestep = 1200
# Number of parallel cores for OGCM component. NUMERIC = 16, 24, 36
NEMO_nproc = 960
# Ocean Advection Scheme [Default: Do set "tvd"]. STRING = tvd, cen2
ADVSCH = cen2
# Nudging activation. BOOLEAN = TRUE, FALSE
OCEAN_NUDGING = FALSE
# Toward which data to nudge; essential if "OCEAN_NUDGING" is TRUE.
# STRING = fa9p, s4, glorys2v1
OCEAN_NUDDATA = FALSE
# Rebuild and store restarts to HSM for an immediate prediction experiment.
# BOOLEAN = TRUE, FALSE
OCEAN_STORERST = FALSE

[ice]
# Sea-Ice Model [Default: Do set "LIM2"]. STRING = LIM2, LIM3
ICE = LIM3
# Sea-ice initial conditions ready to be used. [Default: leave empty].
# STRING = ID found here : https://earth.bsc.es/wiki/doku.php?id=initial_conditions:sea_ice
ICE_ini =
# A different IC member per EXPID member ["PERT"] or which common IC member
# for all EXPID members ["fc0" / "fc1"]. String = PERT/fc0/fc1...
ICE_ini_member =
# Set timestep (in sec) w.r.t resolution. NUMERIC = 3600 (ORCA1), 1200 (ORCA025)
LIM_timestep = 1200

[pisces]
# Activate PISCES (TRUE) or not (FALSE) [Default: leave empty]
PISCES = FALSE
# PISCES initial conditions ready to be used. [Default: leave empty].
# STRING = ID found here : https://earth.bsc.es/wiki/doku.php?id=initial_conditions:biogeochemistry
PISCES_ini =
# Set timestep (in sec) w.r.t resolution. NUMERIC = 3600 (ORCA1), 3600 (ORCA025)
PISCES_timestep = 3600

Proj configuration:: Full example

This section contains a full example of a valid proj file with a valid user script.

Configuration of proj.conf

vi <expid>/conf/proj_cxxx.conf
PROJECT_ROOT = /gpfs/scratch/bsc32/bsc32070/a000/automatic_perfomance_profile
REFRESH_GIT_REPO = false

Write your original script in the user project directory:

vi <expid>/proj/template/autosubmit/remote_setup.sh
cd %CURRENT_ROOTDIR% # This comes from autosubmit.
# Clone repository to the remote for needed files
# if exist or force refresh is true
if [ ! -d %PROJECT_ROOT% ] || [ %REFRESH_GIT_REPO% == true ];
then
    chmod +w -R %PROJECT_ROOT% || :
    rm -rf %PROJECT_ROOT% || :
    git clone (...)
fi
(...)

Final script, which is generated by autosubmit run or autosubmit inspect

cat <experiments_directory>/cxxx/tmp/remote_setup.cmd
cd /gpfs/scratch/bsc32/bsc32070/a000
# Clone repository to the remote for needed files
# if exist or force refresh is true
if [ ! -d /gpfs/scratch/bsc32/bsc32070/a000/automatic_performance_profile ] || [ false == true ];
then
    chmod +w -R /gpfs/scratch/bsc32/bsc32070/a000/automatic_performance_profile || :
    rm -rf /gpfs/scratch/bsc32/bsc32070/a000/automatic_performance_profile || :
    git clone (...)
fi
(...)

Detailed platform configuration

In this section, we describe the platform configuration using -QOS and also PARTITION

vi <expid>/conf/platform_cxxx.conf
[marenostrum0]
TYPE = ps
HOST = mn0.bsc.es
PROJECT = bsc32
USER = bsc32070
ADD_PROJECT_TO_HOST = false
SCRATCH_DIR = /gpfs/scratch

[marenostrum4]
# Queue type. Options: ps, SGE, LSF, SLURM, PBS, eceaccess
TYPE = slurm
HOST = mn1.bsc.es,mn2.bsc.es,mn3.bsc.es
PROJECT = bsc32
USER = bsc32070
SCRATCH_DIR = /gpfs/scratch
ADD_PROJECT_TO_HOST = False
# use 72:00 if you are using a PRACE account, 48:00 for the bsc account
MAX_WALLCLOCK = 02:00
# use 19200 if you are using a PRACE account, 2400 for the bsc account
MAX_PROCESSORS = 2400
PROCESSORS_PER_NODE = 48
#SERIAL_QUEUE = debug
#QUEUE = debug
CUSTOM_DIRECTIVES = ["#SBATCH -p small", "#SBATCH --no-requeue", "#SBATCH --usage"]

[marenostrum_archive]
TYPE = ps
HOST = dt02.bsc.es
PROJECT = bsc32
USER = bsc32070
SCRATCH_DIR = /gpfs/scratch
ADD_PROJECT_TO_HOST = False
TEST_SUITE = False

[power9]
TYPE = slurm
HOST = plogin1.bsc.es
PROJECT = bsc32
USER = bsc32070
SCRATCH_DIR = /gpfs/scratch
ADD_PROJECT_TO_HOST = False
TEST_SUITE = False
SERIAL_QUEUE = debug
QUEUE = debug

[nord3]
TYPE = lsf
HOST = nord1.bsc.es
PROJECT = bsc32
USER = bsc32070
ADD_PROJECT_TO_HOST = False
SCRATCH_DIR = /gpfs/scratch
TEST_SUITE = False
MAX_WALLCLOCK = 48:00
MAX_PROCESSORS = 1024
PROCESSORS_PER_NODE = 16

[transfer_node]
TYPE = ps
HOST = dt01.bsc.es
PROJECT = bsc32
USER = bsc32070
ADD_PROJECT_TO_HOST = false
SCRATCH_DIR = /gpfs/scratch

[transfer_node_bscearth000]
TYPE = ps
HOST = bscearth000
USER = dbeltran
PROJECT = Earth
ADD_PROJECT_TO_HOST = false
QUEUE = serial
SCRATCH_DIR = /esarchive/scratch

[bscearth000]
TYPE = ps
HOST = bscearth000
PROJECT = Earth
USER = dbeltran
SCRATCH_DIR = /esarchive/scratch

Warning

The TYPE field is mandatory. The HOST field is mandatory. The PROJECT field is mandatory. The USER field is mandatory. The SCRATCH_DIR field is mandatory. The ADD_PROJECT_TO_HOST field is mandatory.

Warning

The TEST_SUITE field is optional. The MAX_WALLCLOCK field is optional. The MAX_PROCESSORS field is optional. The PROCESSORS_PER_NODE field is optional.

Warning

The SERIAL_QUEUE and QUEUE field are used for specify a -QOS. For specify a partition, you must use CUSTOM_DIRECTIVES. For specify the memory usage you must use MEMORY but only in jobs.conf.

The custom directives can be used for multiple parameters at the same time using the follow syntax.

vi <expid>/conf/platform_cxxx.conf
[puhti]
#Check your partition ( test/small/large])
CUSTOM_DIRECTIVES = ["#SBATCH -p test", "#SBATCH --no-requeue", "#SBATCH --usage"]
### Batch job system / queue at HPC
TYPE = slurm
### Hostname of the HPC
HOST = puhti
### Project name-ID at HPC (WEATHER)
PROJECT = project_test
### User name at HPC
USER = dbeltran
### Path to the scratch directory for the project at HPC
SCRATCH_DIR = /scratch
# Should've false already, just in case it is not
ADD_PROJECT_TO_HOST = False

#Check your partition ( test[00:15]/small[72:00]/large[72:00]) max_wallclock
MAX_WALLCLOCK = 00:15
# [test [80] // small [40] // large [1040]
MAX_PROCESSORS = 80
# test [40] / small [40] // large [40]
PROCESSORS_PER_NODE = 40

Installation

How to install

The Autosubmit code is maintained in PyPi, the main source for python packages.

  • Pre-requisites: bash, python2, sqlite3, git-scm > 1.8.2, subversion, dialog, curl, python-tk(tkinter in centOS), python2-dev, graphviz >= 2.41, pip2

Important

(SYSTEM) Graphviz version must be >= 2.38 except 2.40(not working). You can check the version using dot -v.

  • Python dependencies: argparse, python-dateutil, pyparsing, numpy, pydotplus, matplotlib, paramiko, python2-pythondialog, portalocker, requests, typing, six >= 1.10

Important

dot -v command should contain “dot”,pdf,png,svg,xlib in device section.

Important

The host machine has to be able to access HPC’s/Clusters via password-less ssh. Make sure that the ssh key is in PEM format ssh-keygen -t rsa -b 4096 -C “email@email.com” -m PEM.

To install autosubmit just execute:

pip install autosubmit

or download, unpack and:

python setup.py install

Hint

To check if autosubmit has been installed run autosubmit -v. This command will print autosubmit’s current version

Hint

To read autosubmit’s readme file, run autosubmit readme

Hint

To see the changelog, use autosubmit changelog

How to configure

After installation, you have to configure database and path for Autosubmit. In order to use the default settings, just create a directory called autosubmit in your home directory before running the configure command. The experiments will be created in this folder, and the database named autosubmit.db in your home directory.

autosubmit configure

For advanced options you can add --advanced to the configure command. It will allow you to choose different directories (they must exist) for the experiments and database, as well as configure SMTP server and an email account in order to use the email notifications feature.

autosubmit configure --advanced

Hint

The dialog (GUI) library is optional. Otherwise the configuration parameters will be prompted (CLI). Use autosubmit configure -h to see all the allowed options.

For installing the database for Autosubmit on the configured folder, when no database is created on the given path, execute:

autosubmit install

Danger

Be careful ! autosubmit install will create a blank database.

Lastly, if autosubmit configure doesn’t work for you or you need to configure additional info create:

Create or modify /etc/autosubmitrc file or ~/.autosubmitrc with the information as follows:

[database]
path = path to autosubmit db
filename = autosubmit.db

[local]
path = path to experiment folders

[conf]
jobs = path to any experiment  jobs conf # If not working on esarchive, you must create one from scratch check the how to.
platforms = path to any experiment  platform conf # If not working on esarchive, you must create one from scratch check the how to.

[mail]
smtp_server = mail.bsc.es
mail_from = automail@bsc.es

[structures]
path =  path to experiment folders

[globallogs]
path =  path to global logs (for expid,delete and migrate commands)

[historicdb]
path = <experiment_folder>/historic

[autosubmitapi]
url = url of Autosubmit API (The API is provided inside the BSC network)
# Autosubmit API provides extra information for some Autosubmit functions. It is not mandatory to have access to it to use Autosubmit.

[hosts]
authorized = [run bscearth000,bscesautosubmit01,bscesautosubmit02] [stats,clean,describe,check,report,dbfix,pklfix,updatedescript,updateversion all]
forbidden = [expìd,create,recovery,delete,inspect,monitor,recovery,migrate,configure,setstatus,testcase,test,refresh,archive,unarchive bscearth000,bscesautosubmit01,bscesautosubmit02]

Hosts: From 3.14+ onwards, autosubmit commands can be tailored to run on specific machines. Previously, only run was affected by the deprecated whitelist parameter.

  • authorized: [<command1,commandN> <machine1,machineN>] list of machines that can run given autosubmit commands.
  • forbidden: [<command1,commandN> <machine1,machineN>] list of machines that cannot run given autosubmit commands.
  • If no commands are defined, all commands are authorized.
  • If no machines are defined, all machines are authorized.

Now you are ready to use Autosubmit !

Examples

Sequence of instructions to install Autosubmit and its dependencies in Ubuntu.

# Update repositories
apt update

# Avoid interactive stuff
export DEBIAN_FRONTEND=noninteractive

# Dependencies
apt install wget curl python2 python-tk python2-dev graphviz -y -q

# Additional dependencies related with pycrypto
apt install build-essential libssl-dev libffi-dev -y -q

# Download get pip script and launch it
wget https://bootstrap.pypa.io/pip/2.7/get-pip.py
python2 get-pip.py

# Install autosubmit using pip
pip2 install autosubmit

# Check that we can execute autosubmit commands
autosubmit -h

# Configure
autosubmit configure

# Install
autosubmit install

# Get expid
autosubmit expid -H TEST -d "Test exp."

# Create with -np
# Since it was a new install the expid will be a000
autosubmit create a000 -np

Sequence of instructions to install Autosubmit and its dependencies with conda.

# Download conda
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh
# Launch it
chmod +x ./Miniconda3-py39_4.12.0-Linux-x86_64.sh ; ./Miniconda3-py39_4.12.0-Linux-x86_64.sh
# Download git
apt install git -y -q
# Download autosubmit
git clone https://earth.bsc.es/gitlab/es/autosubmit.git -b v3.14.0
cd autosubmit
# Create conda environment
conda env update -f environment.yml -n autosubmit python=2
# Activate env
source activate autosubmit
# Test autosubmit
autosubmit -v
# Configure autosubmitrc and install database as indicated in this doc

Usage

Command list

-expid Create a new experiment
-create Create specified experiment workflow
-check Check configuration for specified experiment
-describe Show details for specified experiment
-run Run specified experiment
-inspect Generate cmd files
-test Test experiment
-testcase Test case experiment
-monitor Plot specified experiment
-stats Plot statistics for specified experiment
-setstatus Sets job status for an experiment
-recovery Recover specified experiment
-clean Clean specified experiment
-refresh Refresh project directory for an experiment
-delete Delete specified experiment
-configure Configure database and path for autosubmit
-install Install database for Autosubmit on the configured folder
-archive Clean, compress and remove from the experiments’ folder a finalized experiment
-unarchive Restores an archived experiment
-migrate_exp Migrates an experiment from one user to another
-report extract experiment parameters
-updateversion Updates the Autosubmit version of your experiment with the current version of the module you are using
-dbfix Fixes the database malformed error in the historical database of your experiment
-pklfix Fixed the blank pkl error of your experiment
-updatedescrip Updates the description of your experiment (See: How to update the description of your experiment)

Defining the workflow

One of the most important step that you have to do when planning to use autosubmit for an experiment is the definition of the workflow the experiment will use. In this section you will learn about the workflow definition syntax so you will be able to exploit autosubmit’s full potential

Warning

This section is NOT intended to show how to define your jobs. Please go to Tutorial start guide section for a comprehensive list of job options.

Simple workflow

The simplest workflow that can be defined it is a sequence of two jobs, with the second one triggering at the end of the first. To define it, we define the two jobs and then add a DEPENDENCIES attribute on the second job referring to the first one.

It is important to remember when defining workflows that DEPENDENCIES on autosubmit always refer to jobs that should be finished before launching the job that has the DEPENDENCIES attribute.

JOBS:
    One:
      FILE: "one.sh"
    Two:
      FILE: "two.sh"
      DEPENDENCIES: "One"

The resulting workflow can be seen in Figure 5

simple workflow plot

Example showing a simple workflow with two sequential jobs

Running jobs once per startdate, member or chunk

Autosubmit is capable of running ensembles made of various startdates and members. It also has the capability to divide member execution on different chunks.

To set at what level a job has to run you have to use the RUNNING attribute. It has four possible values: once, date, member and chunk corresponding to running once, once per startdate, once per member or once per chunk respectively.

JOBS:
    once:
      FILE: "Once.sh"
    date:
      FILE: "date.sh"
      DEPENDENCIES: "once"
      RUNNING: "date"
    member:
      FILE: "Member.sh"
      DEPENDENCIES: "date"
      RUNNING: "member"
    chunk:
      FILE: "Chunk.sh"
      DEPENDENCIES: "member"
      RUNNING: "chunk"

The resulting workflow can be seen in Figure 6 for a experiment with 2 startdates, 2 members and 2 chunks.

simple workflow plot

Example showing how to run jobs once per startdate, member or chunk.

Dependencies

Dependencies on autosubmit were introduced on the first example, but in this section you will learn about some special cases that will be very useful on your workflows.

Dependencies with previous jobs

Autosubmit can manage dependencies between jobs that are part of different chunks, members or startdates. The next example will show how to make a simulation job wait for the previous chunk of the simulation. To do that, we add sim-1 on the DEPENDENCIES attribute. As you can see, you can add as much dependencies as you like separated by spaces

JOBS:
    ini:
      FILE: "ini.sh"
      RUNNING: "member"
    sim:
      FILE: "sim.sh"
      DEPENDENCIES: "ini sim-1"
      RUNNING: "chunk"
    postprocess:
      FILE: "postprocess.sh"
      DEPENDENCIES: "sim"
      RUNNING: "chunk"

The resulting workflow can be seen in Figure 7

Warning

Autosubmit simplifies the dependencies, so the final graph usually does not show all the lines that you may expect to see. In this example you can see that there are no lines between the ini and the sim jobs for chunks 2 to 5 because that dependency is redundant with the one on the previous sim

simple workflow plot

Example showing dependencies between sim jobs on different chunks.

Dependencies between running levels

On the previous examples we have seen that when a job depends on a job on a higher level (a running chunk job depending on a member running job) all jobs wait for the higher running level job to be finished. That is the case on the ini sim dependency on the next example.

In the other case, a job depending on a lower running level job, the higher level job will wait for ALL the lower level jobs to be finished. That is the case of the postprocess combine dependency on the next example.

JOBS:
    ini:
      FILE: "ini.sh"
      RUNNING: "member"
    sim:
      FILE: "sim.sh"
      DEPENDENCIES: "ini sim-1"
      RUNNING: "chunk"
    postprocess:
      FILE: "postprocess.sh"
      DEPENDENCIES: "sim"
      RUNNING: "chunk"
    combine:
      FILE: "combine.sh"
      DEPENDENCIES: "postprocess"
      RUNNING: "member"

The resulting workflow can be seen in Figure dependencies

simple workflow plot

Example showing dependencies between jobs running at different levels.

Job frequency

Some times you just don’t need a job to be run on every chunk or member. For example, you may want to launch the postprocessing job after various chunks have completed. This behaviour can be achieved using the FREQUENCY attribute. You can specify an integer I for this attribute and the job will run only once for each I iterations on the running level.

Hint

You don’t need to adjust the frequency to be a divisor of the total jobs. A job will always execute at the last iteration of its running level

JOBS:
    ini:
      FILE: "ini.sh"
      RUNNING: "member"
    sim:
      FILE: "sim.sh"
      DEPENDENCIES: "ini sim-1"
      RUNNING: "chunk"
    postprocess:
      FILE: "postprocess.sh"
      DEPENDENCIES: "sim"
      RUNNING: "chunk"
      FREQUENCY: "3"
    combine:
      FILE: "combine.sh"
      DEPENDENCIES: "postprocess"
      RUNNING: "member"

The resulting workflow can be seen in Figure 9

simple workflow plot

Example showing dependencies between jobs running at different frequencies.

Job synchronize

For jobs running at chunk level, and this job has dependencies, you could want not to run a job for each experiment chunk, but to run once for all member/date dependencies, maintaining the chunk granularity. In this cases you can use the SYNCHRONIZE job parameter to determine which kind of synchronization do you want. See the below examples with and without this parameter.

Hint

This job parameter works with jobs with RUNNING parameter equals to ‘chunk’.

JOBS:
    ini:
      FILE: "ini.sh"
      RUNNING: "member"
    sim:
      FILE: "sim.sh"
      DEPENDENCIES: "INI SIM-1"
      RUNNING: "chunk"
    ASIM:
      FILE: "asim.sh"
      DEPENDENCIES: "SIM"
      RUNNING: "chunk"

The resulting workflow can be seen in Figure 10

simple workflow plot

Example showing dependencies between chunk jobs running without synchronize.

JOBS:
    ASIM:
        SYNCHRONIZE: member

The resulting workflow of setting SYNCHRONIZE parameter to ‘member’ can be seen in Figure 11

simple workflow plot

Example showing dependencies between chunk jobs running with member synchronize.

JOBS:
    ASIM:
        SYNCHRONIZE: member

The resulting workflow of setting SYNCHRONIZE parameter to ‘date’ can be seen in Figure 12

simple workflow plot

Example showing dependencies between chunk jobs running with date synchronize.

Job split

For jobs running at chunk level, it may be useful to split each chunk into different parts. This behaviour can be achieved using the SPLITS attribute to specify the number of parts. It is possible to define dependencies to specific splits within [], as well as to a list/range of splits, in the format [1:3,7,10] or [1,2,3]

Hint

This job parameter works with jobs with RUNNING parameter equals to ‘chunk’.

JOBS:
    ini:
      FILE: "ini.sh"
      RUNNING: "member"
    sim:
      FILE: "sim.sh"
      DEPENDENCIES: "ini sim-1"
      RUNNING: "chunk"
    asim:
      FILE: "asim.sh"
      DEPENDENCIES: "sim"
      RUNNING: "chunk"
      SPLITS: "3"
    post:
      FILE: "post.sh"
      RUNNING: "chunk"
      DEPENDENCIES: "asim[1] asim[1]+1"

The resulting workflow can be seen in Figure 13

simple workflow plot

Example showing the job ASIM divided into 3 parts for each chunk.

Job delay

Some times you need a job to be run after a certain number of chunks. For example, you may want to launch the asim job after various chunks have completed. This behaviour can be achieved using the DELAY attribute. You can specify an integer N for this attribute and the job will run only after N chunks.

Hint

This job parameter works with jobs with RUNNING parameter equals to ‘chunk’.

JOBS:
    ini:
      FILE: "ini.sh"
      RUNNING: "member"
    sim:
      FILE: "sim.sh"
      DEPENDENCIES: "ini sim-1"
      RUNNING: "chunk"
    asim:
      FILE: "asim.sh"
      DEPENDENCIES: "sim asim-1"
      RUNNING: "chunk"
      DELAY: "2"
    post:
      FILE: "post.sh"
      DEPENDENCIES: "sim asim"
      RUNNING: "chunk"

The resulting workflow can be seen in Figure 14

simple workflow with delay option

Example showing the asim job starting only from chunk 3.

Frequent Questions and Answers

The latest version of Autosubmit implements a code system that guides you through the process of fixing some of the common problems you might find. Consequently, the FAQ section has been replaced by Error codes and solutions, where you will find the list of error codes, their descriptions, and solutions.

Troubleshooting

How to change the job status stopping autosubmit

Review How to change the job status stopping autosubmit.

How to change the job status without stopping autosubmit

Review How to change the job status without stopping autosubmit.

My project parameters are not being substituted in the templates

Explanation: If there is a duplicated section or option in any other side of autosubmit, including proj files It won’t be able to recognize which option pertains to what section in which file.

Solution: Don’t repeat section names and parameters names until Autosubmit 4.0 release.

Unable to recover remote logs files.

Explanation: If there are limitations on the remote platform regarding multiple connections, Solution: You can try DISABLE_RECOVERY_THREADS = TRUE under the [platform_name] section in the platform.conf.

Error on create caused by a configuration parsing error

When running create you can come across an error similar to:

[ERROR] Trace: '%' must be followed by '%' or '(', found: u'%HPCROOTDIR%/remoteconfig/%CURRENT_ARCH%_launcher.sh'

The important part of this error is the message '%' must be followed by '%'. It indicated that the source of the error is the configparser library. This library is included in the python common libraries, so you shouldn’t have any other version of it installed in your environment. Execute pip list, if you see configparser in the list, then run pip uninstall configparser. Then, try to create your experiment again.

Other possible errors

I see the `database malformed` error on my experiment log.

Explanation: The latest version of autosubmit uses a database to efficiently track changes in the jobs of your experiment. It might happen that this small database gets corrupted.

Solution: run autosubmit dbfix expid where expid is the identifier of your experiment. This function will rebuild the database saving as much information as possible (usually all of it).

The pkl file of my experiment is empty but there is a job_list_%expid%_backup.pkl file that seems to be the real one.

Solution: run autosubmit pklfix expid, it will restore the backup file if possible.

Error codes and solutions

Experiment Locked - Critical Error 7000

Code | Details | Solution
7000 Experiment is locked due another instance of Autosubmit using it Halt other experiment instances //Delete <expid>/tmp/autosubmit.lock

Database Issues - Critical Error codes [7001-7005]

Code | Details | Solution
7001 Connection to the db could not be established Check if database exist
7002 Wrong version Check system sqlite version
7003 DB doesn’t exist Check if database exist
7004 Can’t create a new database Check your user permissions
7005 AS database is corrupted or locked Please, open a new issue ASAP. (If you are on BSC environment)

Default Solution

These issues are usually from server side, please, ask first in Autosubmit git if you don’t have a custom installation.


Wrong User Input - Critical Error codes [7010-7030]

Code Details Solution
7010 Experiment has been halted in a manual way
7011 Wrong arguments for an specific command Check the command section for more info
7012 Insufficient permissions for an specific experiment. Check if you have enough permissions, experiment exist or specified expid has a typo
7013 Pending commits You must commit/synchronize pending changes in the experiment proj folder.
7014 Wrong configuration Check your experiment/conf files, also take a look to the ASLOG/command.log detailed output

Default Solution

These issues are usually mistakes from the user input, check the available logs and git resolved issues. Alternative, you can ask for help to Autosubmit team.


Platform issues - Critical Error codes. Local [7040-7050] and remote [7050-7060]

Code | Details | Solution
7040 Invalid experiment pkl/db likely due a local platform failure Should be recovered automatically, if not check if there is a backup file and do it manually
7041 Weird job status Weird Job status, try to recover experiment(check the recovery how-to for more info) if this issue persist please, report it to gitlab
7050 Connection can’t be established. Check your experiment platform configuration
7050 Failure after a restart, connection can’t be restored. Check or ask (manually) if the remote platforms have any known issues
7051 Invalid ssh configuration. Check .ssh/config file. Additionally, Check if you can perform a password less connection to that platform.
7052 Scheduler is not installed or correctly configured. Check if there is a scheduler installed in the remote machine.

Default Solution

Check autosubmit log for detailed information, there will be additional error codes.


Uncatalogued codes - Critical Error codes [7060+]

Code | Details | Solution
7060 Display issues during monitoring Use a different output or txt.
7061 Stat command failed Check Aslogs command output, open a git issue.
7062 Svn issues Check, in expdef, if url exist.
7063 cp/rsync issues Check if destination path exist.
7064 Git issues Check that the proj folder is a well configured git folder. Also, check [GIT] expdef config.
7065 Wrong git configuration Invalid git url. Check [GIT] expdef config. If issue persists, check if proj folder is a well configured git folder.
7066 Pre-submission feature issues | New feature, this message shouldn’t be prompt. Please report it to Git.
7067 Historical Database not found Configure [historicdb] PATH = <file_path>.
7068 Monitor output can’t be loaded Try another output method// Check if the experiment is reachable.
7069 Monitor output format invalid Try another output method.
7070 Bug in code Contact us via git/e-mail output.
7071 AS can’t run in this host If you think that this is an error, check the .autosubmitrc and modify the allowed/forbidden directives.

Default Solution

Check autosubmit log for detailed information, there will be additional error codes.


Minor errors - Error codes [6000+]

Code Details Solution
6001 Failed to retrieve log files Automatically, if there aren’t bigger issues
6002 Failed reconnection | Automatically, if there aren’t bigger issues
6003 Failed connection, wrong configuration Check your platform.conf file
6004 Input output issues Automatically, if there aren’t bigger issues
6005 Unable to execute the command Automatically, if there aren’t bigger issues
6006 Failed command Check err output for more info, command worked but some issue was detected
6007 Broken sFTP connection Automatically, if there aren’t bigger issues
6008 Inconsistent/unexpected ,job status Automatically, if there aren’t bigger issues
6009 Failed job checker Automatically, if there aren’t bigger issues
6010 Corrupted job_list using backup Automatically, if it fails, Perform mv <expid>/pkl/job_list_backup.pkl <expid>/pkl/job_list.pkl
6011 Incorrect mail notifier configuration Double check your mail configuration on job.conf (job status) and autosubmit.conf (email)
6012 Migrate , archive/unarchive I/O issues Check migrate how-to configuration
6013 Configuration issues Check log output for more info
6014 Git Can’t clone repository submodule | Check submodule url, perform a refresh
6015 Submission failed Automatically, if there aren’t bigger issues

Developing a project

Autosubmit is used at BSC to run EC-Earth. To do that, a git repository has been created that contains the model source code and the scripts used to run the tasks.

EC-Earth experiment

Example of monitoring plot for EC-Earth run with Autosubmit for 1 start date, 1 member and 3 chunks.

The workflow is defined using seven job types, as shown in the figure above. These job types are:

  • Local_setup: prepares a patch for model changes and copies it to HPC.
  • Remote_setup: creates a model copy and applies the patch to it.
  • Ini: prepares model to start the simulation of one member.
  • Sim: runs a simulation chunk (usually 1 to 3 months).
  • Post: post-process outputs for one simulation chunk.
  • Clean: removes unnecessary outputs from the simulated chunk.
  • Transfer: transfers post-processed outputs to definitive storage.

Since Autosubmit 2.2 the user can select the desired source repository for the experiment project and using a given concrete branch is possible. This introduce a better version control system for project and more options to create new experiments based on different developments by the user. The different projects contain the shell script to run, for each job type (local setup, remote setup, ini, sim, post, clean and transfer) that are platform independent. Additionally the user can modify the sources under proj folder. The executable scripts are created at runtime so the modifications on the sources can be done on the fly.

Warning

Autosubmit automatically adds small shell script code blocks in the header and the tailer of your scripts, to control the workflow. Please, remove any exit command in the end of your scripts, e.g. exit 0.

Important

For a complete reference on how to develop an EC-Earth project, please have a look in the following wiki page: https://earth.bsc.es/wiki/doku.php?id=models:models

Variables reference

Autosubmit uses a variable substitution system to facilitate the development of the templates. This variables can be used on the template in the form %VARIABLE_NAME%.

Job variables

This variables are relatives to the current job.

  • TASKTYPE: type of the job, as given on job configuration file.
  • JOBNAME: current job full name.
  • FAIL_COUNT: number of failed attempts to run this job.
  • SDATE: current startdate.
  • MEMBER: current member.
  • CHUNK: current chunk.
  • SPLIT: current split.
  • DELAY: current delay.
  • DAY_BEFORE: day before the startdate
  • Chunk_End_IN_DAYS: chunk’s length in days
  • Chunk_START_DATE: chunk’s start date
  • Chunk_START_YEAR: chunk’s start year
  • Chunk_START_MONTH: chunk’s start month
  • Chunk_START_DAY: chunk’s start day
  • Chunk_START_HOUR: chunk’s start hout
  • Chunk_END_DATE: chunk’s end date
  • Chunk_END_YEAR: chunk’s end year
  • Chunk_END_MONTH: chunk’s end month
  • Chunk_END_DAY: chunk’s end day
  • Chunk_END_HOUR: chunk’s end hour
  • PREV: days since startdate at the chunk’s start
  • Chunk_FIRST: True if the current chunk is the first, false otherwise.
  • Chunk_LAST: True if the current chunk is the last, false otherwise.
  • NUMPROC: Number of processors that the job will use.
  • NUMTHREADS: Number of threads that the job will use.
  • NUMTASKS: Number of tasks that the job will use.
  • HYPERTHREADING: Detects if hyperthreading is enabled or not.
  • WALLCLOCK: Number of processors that the job will use.
  • SCRATCH_FREE_SPACE: Percentage of free space required on the scratch.
  • NOTIFY_ON: Determine the job statuses you want to be notified.
  • WRAPPER: Wrapper type, None if wrapper is not being used

Platform variables

This variables are relative to the platforms defined on the jobs conf. A full set of the next variables are defined for each platform defined on the platforms configuration file, substituting {PLATFORM_NAME} for each platform’s name. Also, a suite of variables is defined for the current platform where {PLATFORM_NAME} is substituted by CURRENT.

  • {PLATFORM_NAME}_ARCH: Platform name
  • {PLATFORM_NAME}_HOST: Platform url
  • {PLATFORM_NAME}_USER: Platform user
  • {PLATFORM_NAME}_PROJ: Platform project
  • {PLATFORM_NAME}_BUDG: Platform budget
  • {PLATFORM_NAME}_RESERVATION: You can configure your reservation id for the given platform.
  • {PLATFORM_NAME}_EXCLUSIVITY: True if you want to request exclusivity nodes.
  • {PLATFORM_NAME}_TYPE: Platform scheduler type
  • {PLATFORM_NAME}_VERSION: Platform scheduler version
  • {PLATFORM_NAME}_SCRATCH_DIR: Platform’s scratch folder path
  • {PLATFORM_NAME}_ROOTDIR: Platform’s experiment folder path
  • {PLATFORM_NAME}_CUSTOM_DIRECTIVES: Platform’s custom directives for the resource manager.

Hint

The variables _USER, _PROJ and _BUDG has no value on the LOCAL platform.

Hint

Until now, the variables _RESERVATION and _EXCLUSIVITY are only available for MN.

It is also defined a suite of variables for the experiment’s default platform:

  • HPCARCH: Default HPC platform name
  • HPCHOST: Default HPC platform url
  • HPCUSER: Default HPC platform user
  • HPCPROJ: Default HPC platform project
  • HPCBUDG: Default HPC platform budget
  • HPCTYPE: Default HPC platform scheduler type
  • HPCVERSION: Default HPC platform scheduler version
  • SCRATCH_DIR: Default HPC platform scratch folder path
  • HPCROOTDIR: Default HPC platform experiment’s folder path

Project variables

  • NUMMEMBERS: number of members of the experiment
  • NUMCHUNKS: number of chunks of the experiment
  • CHUNKSIZE: size of each chunk
  • CHUNKSIZEUNIT: unit of the chuk size. Can be hour, day, month or year.
  • CALENDAR: calendar used for the experiment. Can be standard or noleap.
  • ROOTDIR: local path to experiment’s folder
  • PROJDIR: local path to experiment’s proj folder

Performance Metrics

Currently, these variables apply only to the report function of Autosubmit. See How to extract information about the experiment parameters.

  • SYPD: Simulated years per day.
  • ASYPD: Actual simulated years per day.
  • RSYPD: Raw simulated years per day.
  • CHSY: Core hours per simulated year.
  • JPSY: Joules per simulated year.
  • Parallelization: Number of cores requested for the simulation job.

For more information about these metrics please visit:

https://earth.bsc.es/gitlab/wuruchi/autosubmitreact/-/wikis/Performance-Metrics.

Module documentation

autosubmit

class autosubmit.autosubmit.Autosubmit

Bases: object

Interface class for autosubmit.

static archive(expid, noclean=True, uncompress=True)

Archives an experiment: call clean (if experiment is of version 3 or later), compress folder to tar.gz and moves to year’s folder

Parameters:
  • clean,compress
  • expid (str) – experiment identifier
Returns:

static change_status(final, final_status, job, save)

Set job status to final

Parameters:
  • final
  • final_status
  • job
static check(experiment_id, notransitive=False)

Checks experiment configuration and warns about any detected error or inconsistency.

Parameters:experiment_id (str) – experiment identifier:
static clean(expid, project, plot, stats)

Clean experiment’s directory to save storage space. It removes project directory and outdated plots or stats.

Parameters:
  • expid (str) – identifier of experiment to clean
  • project (bool) – set True to delete project directory
  • plot (bool) – set True to delete outdated plots
  • stats (bool) – set True to delete outdated stats
static configure(advanced, database_path, database_filename, local_root_path, platforms_conf_path, jobs_conf_path, smtp_hostname, mail_from, machine, local)

Configure several paths for autosubmit: database, local root and others. Can be configured at system, user or local levels. Local level configuration precedes user level and user level precedes system configuration.

Parameters:
  • database_path (str) – path to autosubmit database
  • database_filename (str) – database filename
  • local_root_path (str) – path to autosubmit’s experiments’ directory
  • platforms_conf_path (str) – path to platforms conf file to be used as model for new experiments
  • jobs_conf_path (str) – path to jobs conf file to be used as model for new experiments
  • machine (bool) – True if this configuration has to be stored for all the machine users
  • local (bool) – True if this configuration has to be stored in the local path
  • mail_from (str) –
  • smtp_hostname (str) –
static configure_dialog()

Configure several paths for autosubmit interactively: database, local root and others. Can be configured at system, user or local levels. Local level configuration precedes user level and user level precedes system configuration.

static create(expid, noplot, hide, output='pdf', group_by=None, expand=[], expand_status=[], notransitive=False, check_wrappers=False, detail=False)

Creates job list for given experiment. Configuration files must be valid before executing this process.

Parameters:
  • expid (str) – experiment identifier
  • noplot – if True, method omits final plotting of the jobs list. Only needed on large experiments when

plotting time can be much larger than creation time. :type noplot: bool :return: True if successful, False if not :rtype: bool :param hide: hides plot window :type hide: bool :param hide: hides plot window :type hide: bool :param output: plot’s file format. It can be pdf, png, ps or svg :type output: str

static database_fix(expid)

Database methods. Performs a sql dump of the database and restores it.

Parameters:expid (str) – experiment identifier
Returns:
Return type:
static delete(expid, force)

Deletes and experiment from database and experiment’s folder

Parameters:
  • expid (str) – identifier of the experiment to delete
  • force (bool) – if True, does not ask for confirmation
Returns:

True if succesful, False if not

Return type:

bool

static describe(experiment_id)

Show details for specified experiment

Parameters:experiment_id (str) – experiment identifier:
experiment_data

Get the current voltage.

static expid(hpc, description, copy_id='', dummy=False, test=False, operational=False, root_folder='')

Creates a new experiment for given HPC

Parameters:
  • operational (bool) – if true, creates an operational experiment
  • hpc (str) – name of the main HPC for the experiment
  • description (str) – short experiment’s description.
  • copy_id (str) – experiment identifier of experiment to copy
  • dummy (bool) – if true, writes a default dummy configuration for testing
  • test – if true, creates an experiment for testing
Returns:

experiment identifier. If method fails, returns ‘’.

Return type:

str

static generate_scripts_andor_wrappers(as_conf, job_list, jobs_filtered, packages_persistence, only_wrappers=False)
Parameters:
  • as_conf (AutosubmitConfig() Object) – Class that handles basic configuration parameters of Autosubmit.
  • job_list (JobList() Object) – Representation of the jobs of the experiment, keeps the list of jobs inside.
  • jobs_filtered (List() of Job Objects) – list of jobs that are relevant to the process.
  • packages_persistence (JobPackagePersistence() Object) – Object that handles local db persistence.
  • only_wrappers (Boolean) – True when coming from Autosubmit.create(). False when coming from Autosubmit.inspect(),
Returns:

Nothing

Return type:

static inspect(expid, lst, filter_chunks, filter_status, filter_section, notransitive=False, force=False, check_wrapper=False)

Generates cmd files experiment.

Parameters:expid (str) – identifier of experiment to be run
Returns:True if run to the end, False otherwise
Return type:bool
static install()

Creates a new database instance for autosubmit at the configured path

static migrate(experiment_id, offer, pickup, only_remote)

Migrates experiment files from current to other user. It takes mapping information for new user from config files.

Parameters:
  • experiment_id – experiment identifier:
  • pickup
  • offer
  • only_remote
static monitor(expid, file_format, lst, filter_chunks, filter_status, filter_section, hide, txt_only=False, group_by=None, expand='', expand_status=[], hide_groups=False, notransitive=False, check_wrapper=False, txt_logfiles=False, detail=False)

Plots workflow graph for a given experiment with status of each job coded by node color. Plot is created in experiment’s plot folder with name <expid>_<date>_<time>.<file_format>

Parameters:
  • expid (str) – identifier of the experiment to plot
  • file_format (str) – plot’s file format. It can be pdf, png, ps or svg
  • lst (str) – list of jobs to change status
  • filter_chunks (str) – chunks to change status
  • filter_status (str) – current status of the jobs to change status
  • filter_section (str) – sections to change status
  • hide (bool) – hides plot window
  • txt_only (bool) – workflow will only be written as text
  • group_by (bool) – workflow will only be written as text
  • expand (str) – Filtering of jobs for it’s visualization
  • expand_status (str) – Filtering of jobs for it’s visualization
  • hide_groups (bool) – Simplified workflow illustration by encapsulating the jobs.
  • notransitive (bool) – workflow will only be written as text
  • check_wrapper (bool) – Shows a preview of how the wrappers will look
  • notransitive – Some dependencies will be omitted
  • detail (bool) – better text format representation but more expensive
static parse_args()

Parse arguments given to an executable and start execution of command given

static pkl_fix(expid)

Tries to find a backup of the pkl file and restores it. Verifies that autosubmit is not running on this experiment.

Parameters:expid (str) – experiment identifier
Returns:
Return type:
static recovery(expid, noplot, save, all_jobs, hide, group_by=None, expand=[], expand_status=[], notransitive=False, no_recover_logs=False, detail=False, force=False)

Method to check all active jobs. If COMPLETED file is found, job status will be changed to COMPLETED, otherwise it will be set to WAITING. It will also update the jobs list.

Parameters:
  • expid (str) – identifier of the experiment to recover
  • save (bool) – If true, recovery saves changes to the jobs list
  • all_jobs (bool) – if True, it tries to get completed files for all jobs, not only active.
  • hide (bool) – hides plot window
  • force (bool) – Allows to restore the workflow even if there are running jobs
static refresh(expid, model_conf, jobs_conf)

Refresh project folder for given experiment

Parameters:
  • model_conf (bool) –
  • jobs_conf (bool) –
  • expid (str) – experiment identifier
static report(expid, template_file_path='', show_all_parameters=False, folder_path='', placeholders=False)

Show report for specified experiment :param expid: experiment identifier :type expid: str :param template_file_path: path to template file :type template_file_path: str :param show_all_parameters: show all parameters :type show_all_parameters: bool :param folder_path: path to folder :type folder_path: str :param placeholders: show placeholders :type placeholders: bool

static rerun_recovery(expid, job_list, rerun_list, as_conf)

Method to check all active jobs. If COMPLETED file is found, job status will be changed to COMPLETED, otherwise it will be set to WAITING. It will also update the jobs list.

Parameters:
  • expid (str) – identifier of the experiment to recover
  • job_list (JobList) – job list to update
  • rerun_list (list) – list of jobs to rerun
  • as_conf (AutosubmitConfig) – AutosubmitConfig object
Returns:

static run_experiment(expid, notransitive=False, update_version=False, start_time=None, start_after=None, run_members=None)

Runs and experiment (submitting all the jobs properly and repeating its execution in case of failure).

Parameters:expid (str) – identifier of experiment to be run
Returns:True if run to the end, False otherwise
Return type:bool
static set_status(expid, noplot, save, final, lst, filter_chunks, filter_status, filter_section, filter_type_chunk, hide, group_by=None, expand=[], expand_status=[], notransitive=False, check_wrapper=False, detail=False)

Set status

Parameters:
  • expid (str) – experiment identifier
  • save (bool) – if true, saves the new jobs list
  • final (str) – status to set on jobs
  • lst (str) – list of jobs to change status
  • filter_chunks (str) – chunks to change status
  • filter_status (str) – current status of the jobs to change status
  • filter_section (str) – sections to change status
  • hide (bool) – hides plot window
static statistics(expid, filter_type, filter_period, file_format, hide, notransitive=False)

Plots statistics graph for a given experiment. Plot is created in experiment’s plot folder with name <expid>_<date>_<time>.<file_format>

Parameters:
  • expid (str) – identifier of the experiment to plot
  • filter_type – type of the jobs to plot
  • filter_period – period to plot
  • file_format (str) – plot’s file format. It can be pdf, png, ps or svg
  • hide (bool) – hides plot window
  • notransitive – Reduces workflow linkage complexity
static submit_ready_jobs(as_conf, job_list, platforms_to_test, packages_persistence, inspect=False, only_wrappers=False, hold=False)

Gets READY jobs and send them to the platforms if there is available space on the queues

Parameters:
  • as_conf (AutosubmitConfig object) – autosubmit config object
  • job_list (JobList object) – job list to check
  • platforms_to_test (set of Platform Objects, e.g. SgePlatform(), LsfPlatform()) – platforms used
  • packages_persistence (JobPackagePersistence object) – Handles database per experiment.
  • inspect (Boolean) – True if coming from generate_scripts_andor_wrappers().
  • only_wrappers (Boolean) – True if it comes from create -cw, False if it comes from inspect -cw.
Returns:

True if at least one job was submitted, False otherwise

Return type:

Boolean

static test(expid, chunks, member=None, start_date=None, hpc=None, branch=None)

Method to conduct a test for a given experiment. It creates a new experiment for a given experiment with a given number of chunks with a random start date and a random member to be run on a random HPC.

Parameters:
  • expid (str) – experiment identifier
  • chunks (int) – number of chunks to be run by the experiment
  • member (str) – member to be used by the test. If None, it uses a random one from which are defined on the experiment.
  • start_date (str) – start date to be used by the test. If None, it uses a random one from which are defined on the experiment.
  • hpc (str) – HPC to be used by the test. If None, it uses a random one from which are defined on the experiment.
  • branch (str) – branch or revision to be used by the test. If None, it uses configured branch.
Returns:

True if test was succesful, False otherwise

Return type:

bool

static testcase(copy_id, description, chunks=None, member=None, start_date=None, hpc=None, branch=None)

Method to create a test case. It creates a new experiment whose id starts by ‘t’.

Parameters:
  • copy_id (str) – experiment identifier
  • description (str) – test case experiment description
  • chunks (int) – number of chunks to be run by the experiment. If None, it uses configured chunk(s).
  • member (str) – member to be used by the test. If None, it uses configured member(s).
  • start_date (str) – start date to be used by the test. If None, it uses configured start date(s).
  • hpc (str) – HPC to be used by the test. If None, it uses configured HPC.
  • branch (str) – branch or revision to be used by the test. If None, it uses configured branch.
Returns:

test case id

Return type:

str

static unarchive(experiment_id, uncompressed=True)

Unarchives an experiment: uncompress folder from tar.gz and moves to experiments root folder

Parameters:
  • experiment_id (str) – experiment identifier
  • uncompressed (bool) – if True, the tar file is uncompressed
static update_version(expid)

Refresh experiment version with the current autosubmit version :param expid: experiment identifier :type expid: str

class autosubmit.autosubmit.MyParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True)

Bases: argparse.ArgumentParser

add_argument(dest, ..., name=value, ...)

add_argument(option_string, option_string, …, name=value, …)

error(message: string)

Prints a usage message incorporating the message to stderr and exits.

If you override this in a subclass, it should not return – it should either exit or raise an exception.

autosubmit.autosubmit.signal_handler(signal_received, frame)

Used to handle interrupt signals, allowing autosubmit to clean before exit

Parameters:
  • signal_received
  • frame
autosubmit.autosubmit.signal_handler_create(signal_received, frame)

Used to handle KeyboardInterrupt signals while the create method is being executed

Parameters:
  • signal_received
  • frame

autosubmit.config

autosubmit.config.basicConfig

class autosubmit.config.basicConfig.BasicConfig

Bases: object

Class to manage configuration for Autosubmit path, database and default values for new experiments

static read()

Reads configuration from .autosubmitrc files, first from /etc., then for user directory and last for current path.

autosubmit.config.config_common

class autosubmit.config.config_common.AutosubmitConfig(expid, basic_config, parser_factory)

Bases: object

Class to handle experiment configuration coming from file or database

Parameters:expid (str) – experiment identifier
check_autosubmit_conf()

Checks experiment’s autosubmit configuration file.

Returns:True if everything is correct, False if it founds any error
Return type:bool
check_conf_files(running_time=False, first_load=True)

Checks configuration files (autosubmit, experiment jobs and platforms), looking for invalid values, missing required options. Print results in log

Returns:True if everything is correct, False if it finds any error
Return type:bool
check_expdef_conf()

Checks experiment’s experiment configuration file.

Returns:True if everything is correct, False if it founds any error
Return type:bool
check_jobs_conf()

Checks experiment’s jobs configuration file.

Returns:True if everything is correct, False if it founds any error
Return type:bool
check_platforms_conf()

Checks experiment’s queues configuration file.

check_proj()

Checks project config file

Returns:True if everything is correct, False if it founds any error
Return type:bool
check_proj_file()

Add a section header to the project’s configuration file (if not exists)

deep_normalize(data)

normalize a nested dictionary or similar mapping to uppercase. Modify source in place.

deep_parameters_export(data)

Export all variables of this experiment. Resultant format will be Section.{subsections1…subsectionN} = Value. In other words, it plain the dictionary into one level

deep_read_loops(data, for_keys=[], long_key='')

Update a nested dictionary or similar mapping. Modify source in place.

deep_update(unified_config, new_dict)

Update a nested dictionary or similar mapping. Modify source in place.

experiment_file

Returns experiment’s config file name

file_modified(file, prev_mod_time)

Function to check if a file has been modified. :param file: path :return: bool,new_time

get_chunk_ini(default=1)

Returns the first chunk from where the experiment will start

Parameters:default
Returns:initial chunk
Return type:int
get_chunk_size(default=1)

Chunk Size as defined in the expdef file.

Returns:Chunksize, 1 as default.
Return type:int
get_chunk_size_unit()

Unit for the chunk length

Returns:Unit for the chunk length Options: {hour, day, month, year}
Return type:str
get_communications_library()

Returns the communications library from autosubmit’s config file. Paramiko by default.

Returns:communications library
Return type:str
get_copy_remote_logs()

Returns if the user has enabled the logs local copy from autosubmit’s config file

Returns:if logs local copy
Return type:str
get_current_host(section)

Returns the user to be changed from platform config file.

Returns:migrate user to
Return type:str
get_current_project(section)

Returns the project to be changed from platform config file.

Returns:migrate user to
Return type:str
get_current_user(section)

Returns the user to be changed from platform config file.

Returns:migrate user to
Return type:str
get_custom_directives(section)

Gets custom directives needed for the given job type :param section: job type :type section: str :return: custom directives needed :rtype: str

get_date_list()

Returns startdates list from experiment’s config file

Returns:experiment’s startdates
Return type:list
get_default_job_type()

Returns the default job type from experiment’s config file

Returns:default type such as bash, python, r…
Return type:str
get_delay_retry_time()

Returns delay time from autosubmit’s config file

Returns:safety sleep time
Return type:int
get_dependencies(section='None')

Returns dependencies list from jobs config file

Returns:experiment’s members
Return type:list
get_disable_recovery_threads(section)

Returns FALSE/TRUE :return: recovery_threads_option :rtype: str

get_export(section)

Gets command line for being submitted with :param section: job type :type section: str :return: wallclock time :rtype: str

get_extensible_wallclock(wrapper={})

Gets extend_wallclock for the given wrapper

Parameters:wrapper (dict) – wrapper
Returns:extend_wallclock
Return type:int
get_fetch_single_branch()

Returns fetch single branch from experiment’s config file Default is -single-branch :return: fetch_single_branch(Y/N) :rtype: str

get_file_jobs_conf()

Returns path to project config file from experiment config file

Returns:path to project config file
Return type:str
get_file_project_conf()

Returns path to project config file from experiment config file

Returns:path to project config file
Return type:str
get_full_config_as_json()

Return config as json object

get_git_project_branch()

Returns git branch from experiment’s config file

Returns:git branch
Return type:str
get_git_project_commit()

Returns git commit from experiment’s config file

Returns:git commit
Return type:str
get_git_project_origin()

Returns git origin from experiment config file

Returns:git origin
Return type:str
get_git_remote_project_root()

Returns remote machine ROOT PATH

Returns:git commit
Return type:str
get_jobs_sections()

Returns the list of sections defined in the job’s config file

Returns:sections
Return type:list
get_local_project_path()

Gets path to origin for local project

Returns:path to local project
Return type:str
get_mails_to()

Returns the address where notifications will be sent from autosubmit’s config file

Returns:mail address
Return type:[str]
get_max_processors()

Returns max processors from autosubmit’s config file

Return type:str
get_max_waiting_jobs()

Returns max number of waiting jobs from autosubmit’s config file

Returns:main platforms
Return type:int
get_max_wallclock()

Returns max wallclock

Return type:str
get_max_wrapped_jobs(wrapper={})

Returns the maximum number of jobs that can be wrapped together as configured in autosubmit’s config file

Returns:maximum number of jobs (or total jobs)
Return type:int
get_max_wrapped_jobs_horizontal(wrapper={})

Returns the maximum number of jobs that can be wrapped together as configured in autosubmit’s config file

Returns:maximum number of jobs (or total jobs)
Return type:int
get_max_wrapped_jobs_vertical(wrapper={})

Returns the maximum number of jobs that can be wrapped together as configured in autosubmit’s config file

Returns:maximum number of jobs (or total jobs)
Return type:int
get_member_list(run_only=False)

Returns members list from experiment’s config file

Returns:experiment’s members
Return type:list
get_memory(section)

Gets memory needed for the given job type :param section: job type :type section: str :return: memory needed :rtype: str

get_memory_per_task(section)

Gets memory per task needed for the given job type :param section: job type :type section: str :return: memory per task needed :rtype: str

get_migrate_duplicate(section)

Returns the user to change to from platform config file.

Returns:migrate user to
Return type:str
get_migrate_host_to(section)

Returns the host to change to from platform config file.

Returns:host_to
Return type:str
get_migrate_project_to(section)

Returns the project to change to from platform config file.

Returns:migrate project to
Return type:str
get_migrate_user_to(section)

Returns the user to change to from platform config file.

Returns:migrate user to
Return type:str
get_min_wrapped_jobs(wrapper={})
Returns the minium number of jobs that can be wrapped together as configured in autosubmit’s config file
Returns:minim number of jobs (or total jobs)
Return type:int
get_min_wrapped_jobs_horizontal(wrapper={})

Returns the maximum number of jobs that can be wrapped together as configured in autosubmit’s config file

Returns:maximum number of jobs (or total jobs)
Return type:int
get_min_wrapped_jobs_vertical(wrapper={})

Returns the maximum number of jobs that can be wrapped together as configured in autosubmit’s config file

Returns:maximum number of jobs (or total jobs)
Return type:int
get_notifications()

Returns if the user has enabled the notifications from autosubmit’s config file

Returns:if notifications
Return type:string
get_notifications_crash()

Returns if the user has enabled the notifications from autosubmit’s config file

Returns:if notifications
Return type:string
get_num_chunks()

Returns number of chunks to run for each member

Returns:number of chunks
Return type:int
get_output_type()

Returns default output type, pdf if none

Returns:output type
Return type:string
get_parse_two_step_start()

Returns two-step start jobs

Returns:jobs_list
Return type:str
static get_parser(parser_factory, file_path)

Gets parser for given file

Parameters:
  • parser_factory
  • file_path (Path) – path to file to be parsed
Returns:

parser

Return type:

YAMLParser

get_platform()

Returns main platforms from experiment’s config file

Returns:main platforms
Return type:str
get_processors(section)

Gets processors needed for the given job type :param section: job type :type section: str :return: wallclock time :rtype: str

get_project_destination()

Returns git commit from experiment’s config file

Returns:git commit
Return type:str
get_project_dir()

Returns experiment’s project directory

Returns:experiment’s project directory
Return type:str
get_project_type()

Returns project type from experiment config file

Returns:project type
Return type:str
get_remote_dependencies()

Returns if the user has enabled the PRESUBMISSION configuration parameter from autosubmit’s config file

Returns:if remote dependencies
Return type:string
get_rerun()

Returns startdates list from experiment’s config file

Returns:rerurn value
Return type:bool
get_rerun_jobs()

Returns rerun jobs

Returns:jobs_list
Return type:str
get_retrials()

Returns max number of retrials for job from autosubmit’s config file

Returns:safety sleep time
Return type:int
get_safetysleeptime()

Returns safety sleep time from autosubmit’s config file

Returns:safety sleep time
Return type:int
get_scratch_free_space(section)

Gets scratch free space needed for the given job type :param section: job type :type section: str :return: percentage of scratch free space needed :rtype: int

get_section(section, d_value='', must_exists=False)

Gets any section if it exists within the dictionary, else returns None or error if must exist. :param section: section to get :type section: list :param d_value: default value to return if section does not exist :type d_value: str :param must_exists: if true, error is raised if section does not exist :type must_exists: bool :return: section value :rtype: str

get_storage_type()

Returns the storage system from autosubmit’s config file. Pkl by default.

Returns:communications library
Return type:str
get_submodules_list()

Returns submodules list from experiment’s config file Default is –recursive :return: submodules to load :rtype: list

get_svn_project_revision()

Get revision for subversion project

Returns:revision for subversion project
Return type:str
get_svn_project_url()

Gets subversion project url

Returns:subversion project url
Return type:str
get_synchronize(section)

Gets wallclock for the given job type :param section: job type :type section: str :return: wallclock time :rtype: str

get_tasks(section)

Gets tasks needed for the given job type :param section: job type :type section: str :return: tasks (processes) per host :rtype: str

get_threads(section)

Gets threads needed for the given job type :param section: job type :type section: str :return: threads needed :rtype: str

get_total_jobs()

Returns max number of running jobs from autosubmit’s config file

Returns:max number of running jobs
Return type:int
get_version()

Returns version number of the current experiment from autosubmit’s config file

Returns:version
Return type:str
get_wallclock(section)

Gets wallclock for the given job type :param section: job type :type section: str :return: wallclock time :rtype: str

get_wchunkinc(section)

Gets the chunk increase to wallclock :param section: job type :type section: str :return: wallclock increase per chunk :rtype: str

get_wrapper_check_time(wrapper=None)

Returns time to check the status of jobs in the wrapper

Returns:wrapper check time
Return type:int
get_wrapper_export(wrapper={})

Returns modules variable from wrapper

Returns:string
Return type:string
get_wrapper_jobs(wrapper=None)

Returns the jobs that should be wrapped, configured in the autosubmit’s config

Returns:expression (or none)
Return type:string
get_wrapper_machinefiles(wrapper={})

Returns the strategy for creating the machinefiles in wrapper jobs

Returns:machinefiles function to use
Return type:string
get_wrapper_method(wrapper={})

Returns the method of make the wrapper

Returns:method
Return type:string
get_wrapper_policy(wrapper={})

Returns what kind of policy (flexible, strict, mixed ) the user has configured in the autosubmit’s config

Returns:wrapper type (or none)
Return type:string
get_wrapper_queue(wrapper={})

Returns the wrapper queue if not defined, will be the one of the first job wrapped

Returns:expression (or none)
Return type:string
get_wrapper_retrials(wrapper={})

Returns max number of retrials for job from autosubmit’s config file

Returns:safety sleep time
Return type:int
get_wrapper_type(wrapper={})

Returns what kind of wrapper (VERTICAL, MIXED-VERTICAL, HORIZONTAL, HYBRID, MULTI NONE) the user has configured in the autosubmit’s config

Returns:wrapper type (or none)
Return type:string
get_wrappers()

Returns the jobs that should be wrapped, configured in the autosubmit’s config

Returns:expression
Return type:dict
get_x11(section)

Active X11 for this section :param section: job type :type section: str :return: false/true :rtype: str

get_x11_jobs()

Returns the jobs that should support x11, configured in the autosubmit’s config

Returns:expression (or none)
Return type:string
jobs_file

Returns project’s jobs file name

load_parameters()

Load all experiment data :return: a dictionary containing tuples [parameter_name, parameter_value] :rtype: dict

load_platform_parameters()

Load parameters from platform config files.

Returns:a dictionary containing tuples [parameter_name, parameter_value]
Return type:dict
load_section_parameters(job_list, as_conf, submitter)

Load parameters from job config files.

Returns:a dictionary containing tuples [parameter_name, parameter_value]
Return type:dict
normalize_variables(data)

Apply some memory internal variables to normalize it format. (right now only dependencies)

platforms_file

Returns experiment’s platforms config file name

Returns:platforms config file’s name
Return type:str
platforms_parser

Returns experiment’s platforms parser object

Returns:platforms config parser object
Return type:SafeConfigParser
project_file

Returns project’s config file name

reload(first_load=False)

Creates parser objects for configuration files

set_expid(exp_id)

Set experiment identifier in autosubmit and experiment config files

Parameters:exp_id (str) – experiment identifier to store
set_git_project_commit(as_conf)

Function to register in the configuration the commit SHA of the git project version. :param as_conf: Configuration class for exteriment :type as_conf: AutosubmitConfig

set_new_host(section, new_host)

Sets new host for given platform :param new_host: :param section: platform name :type: str

set_new_project(section, new_project)

Sets new project for given platform :param new_project: :param section: platform name :type: str

set_new_user(section, new_user)

Sets new user for given platform :param new_user: :param section: platform name :type: str

set_platform(hpc)

Sets main platforms in experiment’s config file

Parameters:hpc – main platforms
Type:str
set_safetysleeptime(sleep_time)

Sets autosubmit’s version in autosubmit’s config file

Parameters:sleep_time (int) – value to set
set_version(autosubmit_version)

Sets autosubmit’s version in autosubmit’s config file

Parameters:autosubmit_version (str) – autosubmit’s version
unify_conf()

Unifies all configuration files into a single dictionary. Custom files will be able to override the default configuration.

autosubmit.database

Module containing functions to manage autosubmit’s database.

exception autosubmit.database.db_common.DbException(message)

Exception class for database errors

autosubmit.database.db_common.check_db()

Checks if database file exist

Returns:None if exists, terminates program if not
autosubmit.database.db_common.check_experiment_exists(name, error_on_inexistence=True)

Checks if exist an experiment with the given name. Anti-lock version.

Parameters:
  • error_on_inexistence (bool) – if True, adds an error log if experiment does not exist
  • name (str) – Experiment name
Returns:

If experiment exists returns true, if not returns false

Return type:

bool

autosubmit.database.db_common.close_conn(conn, cursor)

Commits changes and close connection to database

Parameters:
  • conn (sqlite3.Connection) – connection to close
  • cursor (sqlite3.Cursor) – cursor to close
autosubmit.database.db_common.create_db(qry)

Creates a new database for autosubmit

Parameters:qry (str) – query to create the new database
autosubmit.database.db_common.delete_experiment(experiment_id)

Removes experiment from database. Anti-lock version.

Parameters:experiment_id (str) – experiment identifier
Returns:True if delete is succesful
Return type:bool
autosubmit.database.db_common.get_autosubmit_version(expid)

Get the minimun autosubmit version needed for the experiment. Anti-lock version.

Parameters:expid (str) – Experiment name
Returns:If experiment exists returns the autosubmit version for it, if not returns None
Return type:str
autosubmit.database.db_common.last_name_used(test=False, operational=False)

Gets last experiment identifier used. Anti-lock version.

Parameters:
  • test (bool) – flag for test experiments
  • operational – flag for operational experiments
Returns:

last experiment identifier used, ‘empty’ if there is none

Return type:

str

autosubmit.database.db_common.open_conn(check_version=True)

Opens a connection to database

Parameters:check_version (bool) – If true, check if the database is compatible with this autosubmit version
Returns:connection object, cursor object
Return type:sqlite3.Connection, sqlite3.Cursor
autosubmit.database.db_common.save_experiment(name, description, version)

Stores experiment in database. Anti-lock version.

Parameters:
  • version (str) –
  • name (str) – experiment’s name
  • description (str) – experiment’s description
autosubmit.database.db_common.update_experiment_descrip_version(name, description=None, version=None)

Updates the experiment’s description and/or version. Anti-lock version.

Parameters:
  • name – experiment name (expid)
  • description – experiment new description
  • version – experiment autosubmit version
Rtype name:

str

Rtype description:
 

str

Rtype version:

str

Returns:

If description has been update, True; otherwise, False.

Return type:

bool

autosubmit.git

class autosubmit.git.autosubmit_git.AutosubmitGit(expid)

Class to handle experiment git repository

Parameters:expid (str) – experiment identifier
static check_commit(as_conf)

Function to check uncommited changes

Parameters:as_conf (autosubmit.config.AutosubmitConfig) – experiment configuration
static clean_git(as_conf)

Function to clean space on BasicConfig.LOCAL_ROOT_DIR/git directory.

Parameters:as_conf (autosubmit.config.AutosubmitConfig) – experiment configuration
static clone_repository(as_conf, force, hpcarch)

Clones a specified git repository on the project folder

Parameters:
  • as_conf (autosubmit.config.AutosubmitConfig) – experiment configuration
  • force (bool) – if True, it will overwrite any existing clone
  • hpcarch – current main platform
Returns:

True if clone was successful, False otherwise

autosubmit.job

Main module for Autosubmit. Only contains an interface class to all functionality implemented on Autosubmit

class autosubmit.job.job.Job(name, job_id, status, priority)

Class to handle all the tasks with Jobs at HPC. A job is created by default with a name, a jobid, a status and a type. It can have children and parents. The inheritance reflects the dependency between jobs. If Job2 must wait until Job1 is completed then Job2 is a child of Job1. Inversely Job1 is a parent of Job2

Parameters:
  • name (str) – job’s name
  • job_id (int) – job’s id
  • status (Status) – job initial status
  • priority (int) – job’s priority
add_edge_info(parent_name, special_variables)

Adds edge information to the job

Parameters:
  • parent_name (str) – parent name
  • special_variables (dict) – special variables
add_parent(*parents)

Add parents for the job. It also adds current job as a child for all the new parents

Parameters:parents (*Job) – job’s parents to add
check_completion(default_status=-1, over_wallclock=False)

Check the presence of COMPLETED file. Change status to COMPLETED if COMPLETED file exists and to FAILED otherwise. :param default_status: status to set if job is not completed. By default, is FAILED :type default_status: Status

check_end_time()

Returns end time from stat file

Returns:date and time
Return type:str
check_retrials_end_time()

Returns list of end datetime for retrials from total stats file

Returns:date and time
Return type:list[int]
check_retrials_start_time()

Returns list of start datetime for retrials from total stats file

Returns:date and time
Return type:list[int]
check_retrials_submit_time()

Returns list of submit datetime for retrials from total stats file

Returns:date and time
Return type:list[int]
check_running_after(date_limit)

Checks if the job was running after the given date :param date_limit: reference date :type date_limit: datetime.datetime :return: True if job was running after the given date, false otherwise :rtype: bool

check_script(as_conf, parameters, show_logs=False)

Checks if script is well-formed

Parameters:
  • parameters (dict) – script parameters
  • as_conf (AutosubmitConfig) – configuration file
  • show_logs (Bool) – Display output
Returns:

true if not problem has been detected, false otherwise

Return type:

bool

check_start_time()

Returns job’s start time

Returns:start time
Return type:str
check_started_after(date_limit)

Checks if the job started after the given date :param date_limit: reference date :type date_limit: datetime.datetime :return: True if job started after the given date, false otherwise :rtype: bool

children

Returns a list containing all children of the job

Returns:child jobs
Return type:set
children_names_str

Comma separated list of children’s names

compare_by_id(other)

Compare jobs by ID

Parameters:other (Job) – job to compare
Returns:comparison result
Return type:bool
compare_by_name(other)

Compare jobs by name

Parameters:other (Job) – job to compare
Returns:comparison result
Return type:bool
compare_by_status(other)

Compare jobs by status value

Parameters:other (Job) – job to compare
Returns:comparison result
Return type:bool
create_script(as_conf)

Creates script file to be run for the job

Parameters:as_conf (AutosubmitConfig) – configuration object
Returns:script’s filename
Return type:str
delete_child(child)

Removes a child from the job

Parameters:child (Job) – child to remove
delete_parent(parent)

Remove a parent from the job

Parameters:parent (Job) – parent to remove
get_last_retrials()

Returns the retrials of a job, including the last COMPLETED run. The selection stops, and does not include, when the previous COMPLETED job is located or the list of registers is exhausted.

Returns:list of dates of retrial [submit, start, finish] in datetime format
Return type:list of list
has_children()

Returns true if job has any children, else return false

Returns:true if job has any children, otherwise return false
Return type:bool
has_parents()

Returns true if job has any parents, else return false

Returns:true if job has any parent, otherwise return false
Return type:bool
inc_fail_count()

Increments fail count

static is_a_completed_retrial(fields)

Returns true only if there are 4 fields: submit start finish status, and status equals COMPLETED.

is_ancestor(job)

Check if the given job is an ancestor :param job: job to be checked if is an ancestor :return: True if job is an ancestor, false otherwise :rtype bool

is_over_wallclock(start_time, wallclock)

Check if the job is over the wallclock time, it is an alternative method to avoid platform issues :param start_time: :param wallclock: :return:

is_parent(job)

Check if the given job is a parent :param job: job to be checked if is a parent :return: True if job is a parent, false otherwise :rtype bool

log_job()

Prints job information in log

long_name

Job’s long name. If not setted, returns name

Returns:long name
Return type:str
parents

Returns parent jobs list

Returns:parent jobs
Return type:set
platform

Returns the platform to be used by the job. Chooses between serial and parallel platforms

:return HPCPlatform object for the job to use :rtype: HPCPlatform

print_job()

Prints debug information about the job

print_parameters()

Print sjob parameters in log

queue

Returns the queue to be used by the job. Chooses between serial and parallel platforms

:return HPCPlatform object for the job to use :rtype: HPCPlatform

remove_redundant_parents()

Checks if a parent is also an ancestor, if true, removes the link in both directions. Useful to remove redundant dependencies.

status_str

String representation of the current status

total_processors

Number of processors requested by job. Reduces ‘:’ separated format if necessary.

update_content(as_conf)

Create the script content to be run for the job

Parameters:as_conf (config) – config
Returns:script code
Return type:str
update_parameters(as_conf, parameters, default_parameters={'M': '%M%', 'M_': '%M_%', 'Y': '%Y%', 'Y_': '%Y_%', 'd': '%d%', 'd_': '%d_%', 'm': '%m%', 'm_': '%m_%'})

Refresh parameters value

Parameters:
  • default_parameters (dict) –
  • as_conf (AutosubmitConfig) –
  • parameters (dict) –
update_status(as_conf, failed_file=False)

Updates job status, checking COMPLETED file if needed

Parameters:
  • copy_remote_logs – boolean, if True, copies remote logs to local
  • failed_file – boolean, if True, checks if the job failed
Returns:

write_end_time(completed, enabled=False)

Writes ends date and time to TOTAL_STATS file :param completed: True if job was completed successfully, False otherwise :type completed: bool

write_start_time(enabled=False)

Writes start date and time to TOTAL_STATS file :return: True if succesful, False otherwise :rtype: bool

write_submit_time(enabled=False, hold=False)

Writes submit date and time to TOTAL_STATS file. It doesn’t write if hold == True.

write_total_stat_by_retries(total_stats, first_retrial=False)

Writes all data to TOTAL_STATS file :param total_stats: data gathered by the wrapper :type total_stats: dict :param first_retrial: True if this is the first retry, False otherwise :type first_retrial: bool

class autosubmit.job.job.WrapperJob(name, job_id, status, priority, job_list, total_wallclock, num_processors, platform, as_config, hold)

Defines a wrapper from a package.

Calls Job constructor.

Parameters:
  • name (String) – Name of the Package
  • job_id (Integer) – ID of the first Job of the package
  • status (String) – ‘READY’ when coming from submit_ready_jobs()
  • priority (Integer) – 0 when coming from submit_ready_jobs()
  • job_list (List() of Job() objects) – List of jobs in the package
  • total_wallclock (String Formatted) – Wallclock of the package
  • num_processors (Integer) – Number of processors for the package
  • platform (Platform Object. e.g. EcPlatform()) – Platform object defined for the package
  • as_config (AutosubmitConfig object) – Autosubmit basic configuration object
class autosubmit.job.job_common.StatisticsSnippetBash

Class to handle the statistics snippet of a job. It contains header and tailer for local and remote jobs

class autosubmit.job.job_common.StatisticsSnippetEmpty

Class to handle the statistics snippet of a job. It contains header and footer for local and remote jobs

class autosubmit.job.job_common.StatisticsSnippetPython(version='3')

Class to handle the statistics snippet of a job. It contains header and tailer for local and remote jobs

class autosubmit.job.job_common.StatisticsSnippetR

Class to handle the statistics snippet of a job. It contains header and tailer for local and remote jobs

class autosubmit.job.job_common.Status

Class to handle the status of a job

class autosubmit.job.job_common.Type

Class to handle the status of a job

autosubmit.job.job_common.increase_wallclock_by_chunk(current, increase, chunk)

Receives the wallclock times an increases it according to a quantity times the number of the current chunk. The result cannot be larger than 48:00. If Chunk = 0 then no increment.

Parameters:
  • current (str) – WALLCLOCK HH:MM
  • increase (str) – WCHUNKINC HH:MM
  • chunk (int) – chunk number
Returns:

HH:MM wallclock

Return type:

str

autosubmit.job.job_common.parse_output_number(string_number)

Parses number in format 1.0K 1.0M 1.0G

Parameters:string_number (str) – String representation of number
Returns:number in float format
Return type:float
class autosubmit.job.job_list.JobList(expid, config, parser_factory, job_list_persistence, as_conf)

Class to manage the list of jobs to be run by autosubmit

add_logs(logs)

add logs to the current job_list :return: logs :rtype: dict(tuple)

backup_load()

Recreates a stored job list from the persistence

Returns:loaded job list object
Return type:JobList
backup_save()

Persists the job list

check_scripts(as_conf)

When we have created the scripts, all parameters should have been substituted. %PARAMETER% handlers not allowed

Parameters:as_conf (AutosubmitConfig) – experiment configuration
expid

Returns the experiment identifier

Returns:experiment’s identifier
Return type:str
generate(date_list, member_list, num_chunks, chunk_ini, parameters, date_format, default_retrials, default_job_type, wrapper_type=None, wrapper_jobs={}, new=True, notransitive=False, update_structure=False, run_only_members=[], show_log=True, jobs_data={}, as_conf='')

Creates all jobs needed for the current workflow

Parameters:
  • default_job_type (str) – default type for jobs
  • date_list (list) – start dates
  • member_list (list) – members
  • num_chunks (int) – number of chunks to run
  • chunk_ini (int) – the experiment will start by the given chunk
  • parameters (dict) – experiment parameters
  • date_format (str) – option to format dates
  • default_retrials (int) – default retrials for ech job
  • new (bool) – is it a new generation?
  • wrapper_type – Type of wrapper defined by the user in autosubmit_.yml [wrapper] section.
  • wrapper_jobs (String) – Job types defined in autosubmit_.yml [wrapper sections] to be wrapped.
get_active(platform=None, wrapper=False)

Returns a list of active jobs (In platforms queue + Ready)

Parameters:platform (HPCPlatform) – job platform
Returns:active jobs
Return type:list
get_all(platform=None, wrapper=False)

Returns a list of all jobs

Parameters:platform (HPCPlatform) – job platform
Returns:all jobs
Return type:list
get_chunk_list()

Get inner chunk list

Returns:chunk list
Return type:list
get_completed(platform=None, wrapper=False)

Returns a list of completed jobs

Parameters:platform (HPCPlatform) – job platform
Returns:completed jobs
Return type:list
get_date_list()

Get inner date list

Returns:date list
Return type:list
get_delayed(platform=None)

Returns a list of delayed jobs

Parameters:platform (HPCPlatform) – job platform
Returns:delayed jobs
Return type:list
get_failed(platform=None, wrapper=False)

Returns a list of failed jobs

Parameters:platform (HPCPlatform) – job platform
Returns:failed jobs
Return type:list
get_finished(platform=None, wrapper=False)

Returns a list of jobs finished (Completed, Failed)

Parameters:platform (HPCPlatform) – job platform
Returns:finished jobs
Return type:list
get_held_jobs(platform=None)

Returns a list of jobs in the platforms (Held)

Parameters:platform (HPCPlatform) – job platform
Returns:jobs in platforms
Return type:list
get_in_queue(platform=None, wrapper=False)

Returns a list of jobs in the platforms (Submitted, Running, Queuing, Unknown,Held)

Parameters:platform (HPCPlatform) – job platform
Returns:jobs in platforms
Return type:list
get_job_by_name(name)

Returns the job that its name matches parameter name

Parameters:name (str) – name to look for
Returns:found job
Return type:job
get_job_list()

Get inner job list

Returns:job list
Return type:list
get_job_names(lower_case=False)

Returns a list of all job names :param: lower_case: if true, returns lower case job names :type: lower_case: bool

Returns:all job names
Return type:list
Parameters:
  • select_jobs_by_name – job name
  • select_all_jobs_by_section – section name
  • filter_jobs_by_section – section, date , member? , chunk?
Returns:

jobs_list names

Return type:

list

get_jobs_by_section(section_list)

Returns the job that its name matches parameter section :parameter section_list: list of sections to look for :type section_list: list :return: found job :rtype: job

get_logs()

Returns a dict of logs by jobs_name jobs

Returns:logs
Return type:dict(tuple)
get_member_list()

Get inner member list

Returns:member list
Return type:list
get_not_in_queue(platform=None, wrapper=False)

Returns a list of jobs NOT in the platforms (Ready, Waiting)

Parameters:platform (HPCPlatform) – job platform
Returns:jobs not in platforms
Return type:list
get_ordered_jobs_by_date_member(section)

Get the dictionary of jobs ordered according to wrapper’s expression divided by date and member

Returns:jobs ordered divided by date and member
Return type:dict
get_prepared(platform=None)

Returns a list of prepared jobs

Parameters:platform (HPCPlatform) – job platform
Returns:prepared jobs
Return type:list
get_queuing(platform=None, wrapper=False)

Returns a list of jobs queuing

Parameters:platform (HPCPlatform) – job platform
Returns:queuedjobs
Return type:list
get_ready(platform=None, hold=False, wrapper=False)

Returns a list of ready jobs

Parameters:platform (HPCPlatform) – job platform
Returns:ready jobs
Return type:list
get_running(platform=None, wrapper=False)

Returns a list of jobs running

Parameters:platform (HPCPlatform) – job platform
Returns:running jobs
Return type:list
get_skipped(platform=None)

Returns a list of skipped jobs

Parameters:platform (HPCPlatform) – job platform
Returns:skipped jobs
Return type:list
get_submitted(platform=None, hold=False, wrapper=False)

Returns a list of submitted jobs

Parameters:platform (HPCPlatform) – job platform
Returns:submitted jobs
Return type:list
get_suspended(platform=None, wrapper=False)

Returns a list of jobs on unknown state

Parameters:platform (HPCPlatform) – job platform
Returns:unknown state jobs
Return type:list
get_uncompleted(platform=None, wrapper=False)

Returns a list of completed jobs

Parameters:platform (HPCPlatform) – job platform
Returns:completed jobs
Return type:list
get_uncompleted_and_not_waiting(platform=None, wrapper=False)

Returns a list of completed jobs and waiting

Parameters:platform (HPCPlatform) – job platform
Returns:completed jobs
Return type:list
get_unknown(platform=None, wrapper=False)

Returns a list of jobs on unknown state

Parameters:platform (HPCPlatform) – job platform
Returns:unknown state jobs
Return type:list
get_unsubmitted(platform=None, wrapper=False)

Returns a list of unsummited jobs

Parameters:platform (HPCPlatform) – job platform
Returns:all jobs
Return type:list
get_waiting(platform=None, wrapper=False)

Returns a list of jobs waiting

Parameters:platform (HPCPlatform) – job platform
Returns:waiting jobs
Return type:list
get_waiting_remote_dependencies(platform_type='slurm')

Returns a list of jobs waiting on slurm scheduler :param platform_type: platform type :type platform_type: str :return: waiting jobs :rtype: list

graph

Returns the graph

Returns:graph
Return type:networkx graph
load()

Recreates a stored job list from the persistence

Returns:loaded job list object
Return type:JobList
static load_file(filename)

Recreates a stored joblist from the pickle file

Parameters:filename (str) – pickle file to load
Returns:loaded joblist object
Return type:JobList
parameters

List of parameters common to all jobs :return: parameters :rtype: dict

print_with_status(statusChange=None, nocolor=False, existingList=None)

Returns the string representation of the dependency tree of the Job List

Parameters:
  • statusChange (List of strings) – List of changes in the list, supplied in set status
  • nocolor (Boolean) – True if the result should not include color codes
  • existingList (List of Job Objects) – External List of Jobs that will be printed, this excludes the inner list of jobs.
Returns:

String representation

Return type:

String

remove_rerun_only_jobs(notransitive=False)

Removes all jobs to be run only in reruns

rerun(job_list_unparsed, monitor=False)

Updates job list to rerun the jobs specified by a job list :param job_list_unparsed: list of jobs to rerun :type job_list_unparsed: list :param monitor: if True, the job list will be monitored :type monitor: bool

static retrieve_packages(BasicConfig, expid, current_jobs=None)

Retrieves dictionaries that map the collection of packages in the experiment

Parameters:
  • BasicConfig (Configuration Object) – Basic configuration
  • expid (String) – Experiment ID
  • current_jobs (list) – list of names of current jobs
Returns:

job to package, package to job, package to package_id, package to symbol

Return type:

Dictionary(Job Object, Package), Dictionary(Package, List of Job Objects), Dictionary(String, String), Dictionary(String, String)

static retrieve_times(status_code, name, tmp_path, make_exception=False, job_times=None, seconds=False, job_data_collection=None)

Retrieve job timestamps from database. :param status_code: Code of the Status of the job :type status_code: Integer :param name: Name of the job :type name: String :param tmp_path: Path to the tmp folder of the experiment :type tmp_path: String :param make_exception: flag for testing purposes :type make_exception: Boolean :param job_times: Detail from as_times.job_times for the experiment :type job_times: Dictionary Key: job name, Value: 5-tuple (submit time, start time, finish time, status, detail id) :return: minutes the job has been queuing, minutes the job has been running, and the text that represents it :rtype: int, int, str

save()

Persists the job list

sort_by_id()

Returns a list of jobs sorted by id

Returns:jobs sorted by ID
Return type:list
sort_by_name()

Returns a list of jobs sorted by name

Returns:jobs sorted by name
Return type:list
sort_by_status()

Returns a list of jobs sorted by status

Returns:job sorted by status
Return type:list
sort_by_type()

Returns a list of jobs sorted by type

Returns:job sorted by type
Return type:list
update_from_file(store_change=True)

Updates jobs list on the fly from and update file :param store_change: if True, renames the update file to avoid reloading it at the next iteration

update_genealogy(new=True, notransitive=False, update_structure=False)

When we have created the job list, every type of job is created. Update genealogy remove jobs that have no templates :param new: if it is a new job list or not :type new: bool

update_list(as_conf, store_change=True, fromSetStatus=False, submitter=None, first_time=False)

Updates job list, resetting failed jobs and changing to READY all WAITING jobs with all parents COMPLETED

Parameters:as_conf (AutosubmitConfig) – autosubmit config object
Returns:True if job status were modified, False otherwise
Return type:bool

autosubmit.monitor

class autosubmit.monitor.monitor.Monitor

Class to handle monitoring of Jobs at HPC.

static clean_plot(expid)

Function to clean space on BasicConfig.LOCAL_ROOT_DIR/plot directory. Removes all plots except last two.

Parameters:expid (str) – experiment’s identifier
static clean_stats(expid)

Function to clean space on BasicConfig.LOCAL_ROOT_DIR/plot directory. Removes all stats’ plots except last two.

Parameters:expid (str) – experiment’s identifier
static color_status(status)

Return color associated to given status

Parameters:status (Status) – status
Returns:color
Return type:str
create_tree_list(expid, joblist, packages, groups, hide_groups=False)

Create graph from joblist

Parameters:
  • expid (str) – experiment’s identifier
  • joblist (JobList) – joblist to plot
Returns:

created graph

Return type:

pydotplus.Dot

generate_output(expid, joblist, path, output_format='pdf', packages=None, show=False, groups={}, hide_groups=False, job_list_object=None)

Plots graph for joblist and stores it in a file

Parameters:
  • expid (str) – experiment’s identifier
  • joblist (List of Job objects) – list of jobs to plot
  • output_format (str (png, pdf, ps)) – file format for plot
  • show (bool) – if true, will open the new plot with the default viewer
  • job_list_object (JobList object) – Object that has the main txt generation method
generate_output_stats(expid, joblist, output_format='pdf', period_ini=None, period_fi=None, show=False, queue_time_fixes=None)

Plots stats for joblist and stores it in a file

Parameters:
  • expid (str) – experiment’s identifier
  • joblist (JobList) – joblist to plot
  • output_format (str (png, pdf, ps)) – file format for plot
  • period_ini (datetime) – initial datetime of filtered period
  • period_fi (datetime) – final datetime of filtered period
  • show (bool) – if true, will open the new plot with the default viewer
generate_output_txt(expid, joblist, path, classictxt=False, job_list_object=None)

Function that generates a representation of the jobs in a txt file :param expid: experiment’s identifier :type expid: str :param joblist: experiment’s list of jobs :type joblist: list :param job_list_object: Object that has the main txt generation method :type job_list_object: JobList object

static get_general_stats(expid)

Returns all the options in the sections of the %expid%_GENERAL_STATS. Options with values larger than GENERAL_STATS_OPTION_MAX_LENGTH characters are not added.

Parameters:expid (str) – experiment’s identifier
Returns:list of tuples (section, ‘’), (option, value), (option, value), (section, ‘’), (option, value), …
Return type:list

autosubmit.platform

class autosubmit.platforms.ecplatform.EcPlatform(expid, name, config, scheduler)

Bases: autosubmit.platforms.paramiko_platform.ParamikoPlatform

Class to manage queues with ecaccess

Parameters:
  • expid (str) – experiment’s identifier
  • scheduler (str (pbs, loadleveler)) – scheduler to use
check_Alljobs(job_list, as_conf, retries=5)

Checks jobs running status :param job_list: list of jobs :type job_list: list :param as_conf: autosubmit configuration :type as_conf: autosubmit.config.config.Config :param retries: retries :type retries: int :return: list of jobs with their status :rtype: list

connect()

In this case, it does nothing because connection is established for each command

Returns:True
Return type:bool
delete_file(filename)

Deletes a file from this platform

Parameters:filename (str) – file name
Returns:True if successful or file does no exist
Return type:bool
get_checkjob_cmd(job_id)

Returns command to check job status on remote platforms

Parameters:
  • job_id – id of job to check
  • job_id – int
Returns:

command to check job status

Return type:

str

get_file(filename, must_exist=True, relative_path='', ignore_log=False, wrapper_failed=False)

Copies a file from the current platform to experiment’s tmp folder

Parameters:
  • filename (str) – file name
  • must_exist (bool) – If True, raises an exception if file can not be copied
  • relative_path (str) – path inside the tmp folder
Returns:

True if file is copied successfully, false otherwise

Return type:

bool

get_mkdir_cmd()

Gets command to create directories on HPC

Returns:command to create directories on HPC
Return type:str
get_ssh_output()

Gets output from last command executed

Returns:output from last command
Return type:str
get_submit_cmd(job_script, job, hold=False, export='')

Get command to add job to scheduler

Parameters:
  • job_type
  • job_script – path to job script
  • job_script – str
  • hold – submit a job in a held status
  • hold – boolean
  • export – modules that should’ve downloaded
  • export – string
Returns:

command to submit job to platforms

Return type:

str

get_submitted_job_id(output, x11=False)

Parses submit command output to extract job id :param output: output to parse :type output: str :return: job id :rtype: str

jobs_in_queue()

Returns empty list because ecacces does not support this command

Returns:empty list
Return type:list
move_file(src, dest, must_exist=False)

Moves a file on the platform (includes .err and .out) :param src: source name :type src: str :param dest: destination name :param must_exist: ignore if file exist or not :type dest: str

parse_job_output(output)

Parses check job command output, so it can be interpreted by autosubmit

Parameters:output (str) – output to parse
Returns:job status
Return type:str
restore_connection()

In this case, it does nothing because connection is established for each command

Returns:True
Return type:bool
send_command(command, ignore_log=False, x11=False)

Sends given command to HPC

Parameters:command (str) – command to send
Returns:True if executed, False if failed
Return type:bool
send_file(filename, check=True)

Sends a local file to the platform :param filename: name of the file to send :type filename: str

test_connection()

In this case, it does nothing because connection is established for each command

Returns:True
Return type:bool
update_cmds()

Updates commands for platforms

class autosubmit.platforms.lsfplatform.LsfPlatform(expid, name, config)

Bases: autosubmit.platforms.paramiko_platform.ParamikoPlatform

Class to manage jobs to host using LSF scheduler

Parameters:expid (str) – experiment’s identifier
check_Alljobs(job_list, as_conf, retries=5)

Checks jobs running status :param job_list: list of jobs :type job_list: list :param as_conf: autosubmit configuration :type as_conf: autosubmit.config.config.Config :param retries: retries :type retries: int :return: list of jobs with their status :rtype: list

get_checkjob_cmd(job_id)

Returns command to check job status on remote platforms

Parameters:
  • job_id – id of job to check
  • job_id – int
Returns:

command to check job status

Return type:

str

get_mkdir_cmd()

Gets command to create directories on HPC

Returns:command to create directories on HPC
Return type:str
get_submit_cmd(job_script, job, export='')

Get command to add job to scheduler

Parameters:
  • job_type
  • job_script – path to job script
  • job_script – str
  • hold – submit a job in a held status
  • hold – boolean
  • export – modules that should’ve downloaded
  • export – string
Returns:

command to submit job to platforms

Return type:

str

get_submitted_job_id(output, x11=False)

Parses submit command output to extract job id :param output: output to parse :type output: str :return: job id :rtype: str

parse_job_output(output)

Parses check job command output, so it can be interpreted by autosubmit

Parameters:output (str) – output to parse
Returns:job status
Return type:str
update_cmds()

Updates commands for platforms

class autosubmit.platforms.pbsplatform.PBSPlatform(expid, name, config, version)

Bases: autosubmit.platforms.paramiko_platform.ParamikoPlatform

Class to manage jobs to host using PBS scheduler

Parameters:
  • expid (str) – experiment’s identifier
  • version (str) – scheduler version
check_Alljobs(job_list, as_conf, retries=5)

Checks jobs running status :param job_list: list of jobs :type job_list: list :param as_conf: autosubmit configuration :type as_conf: autosubmit.config.config.Config :param retries: retries :type retries: int :return: list of jobs with their status :rtype: list

get_checkjob_cmd(job_id)

Returns command to check job status on remote platforms

Parameters:
  • job_id – id of job to check
  • job_id – int
Returns:

command to check job status

Return type:

str

get_mkdir_cmd()

Gets command to create directories on HPC

Returns:command to create directories on HPC
Return type:str
get_submit_cmd(job_script, job, export='')

Get command to add job to scheduler

Parameters:
  • job_type
  • job_script – path to job script
  • job_script – str
  • hold – submit a job in a held status
  • hold – boolean
  • export – modules that should’ve downloaded
  • export – string
Returns:

command to submit job to platforms

Return type:

str

get_submitted_job_id(output, x11=False)

Parses submit command output to extract job id :param output: output to parse :type output: str :return: job id :rtype: str

parse_job_output(output)

Parses check job command output, so it can be interpreted by autosubmit

Parameters:output (str) – output to parse
Returns:job status
Return type:str
update_cmds()

Updates commands for platforms

class autosubmit.platforms.sgeplatform.SgePlatform(expid, name, config)

Bases: autosubmit.platforms.paramiko_platform.ParamikoPlatform

Class to manage jobs to host using SGE scheduler

Parameters:expid (str) – experiment’s identifier
check_Alljobs(job_list, as_conf, retries=5)

Checks jobs running status :param job_list: list of jobs :type job_list: list :param as_conf: autosubmit configuration :type as_conf: autosubmit.config.config.Config :param retries: retries :type retries: int :return: list of jobs with their status :rtype: list

connect()

In this case, it does nothing because connection is established for each command

Returns:True
Return type:bool
get_checkjob_cmd(job_id)

Returns command to check job status on remote platforms

Parameters:
  • job_id – id of job to check
  • job_id – int
Returns:

command to check job status

Return type:

str

get_mkdir_cmd()

Gets command to create directories on HPC

Returns:command to create directories on HPC
Return type:str
get_submit_cmd(job_script, job, export='')

Get command to add job to scheduler

Parameters:
  • job_type
  • job_script – path to job script
  • job_script – str
  • hold – submit a job in a held status
  • hold – boolean
  • export – modules that should’ve downloaded
  • export – string
Returns:

command to submit job to platforms

Return type:

str

get_submitted_job_id(output, x11=False)

Parses submit command output to extract job id :param output: output to parse :type output: str :return: job id :rtype: str

parse_job_output(output)

Parses check job command output, so it can be interpreted by autosubmit

Parameters:output (str) – output to parse
Returns:job status
Return type:str
restore_connection()

In this case, it does nothing because connection is established for each command

Returns:True
Return type:bool
test_connection()

In this case, it does nothing because connection is established for each command

Returns:True
Return type:bool
update_cmds()

Updates commands for platforms

class autosubmit.platforms.slurmplatform.SlurmPlatform(expid, name, config)

Bases: autosubmit.platforms.paramiko_platform.ParamikoPlatform

Class to manage jobs to host using SLURM scheduler

Parameters:expid (str) – experiment’s identifier
get_checkAlljobs_cmd(jobs_id)

Returns command to check jobs status on remote platforms

Parameters:
  • jobs_id – id of jobs to check
  • jobs_id – str
Returns:

command to check job status

Return type:

str

get_checkjob_cmd(job_id)

Returns command to check job status on remote platforms

Parameters:
  • job_id – id of job to check
  • job_id – int
Returns:

command to check job status

Return type:

str

get_mkdir_cmd()

Gets command to create directories on HPC

Returns:command to create directories on HPC
Return type:str
get_submit_cmd(job_script, job, hold=False, export='')

Get command to add job to scheduler

Parameters:
  • job_type
  • job_script – path to job script
  • job_script – str
  • hold – submit a job in a held status
  • hold – boolean
  • export – modules that should’ve downloaded
  • export – string
Returns:

command to submit job to platforms

Return type:

str

get_submitted_job_id(outputlines, x11=False)

Parses submit command output to extract job id :param output: output to parse :type output: str :return: job id :rtype: str

open_submit_script()

Opens Submit script file

parse_Alljobs_output(output, job_id)

Parses check jobs command output, so it can be interpreted by autosubmit :param output: output to parse :param job_id: select the job to parse :type output: str :return: job status :rtype: str

parse_job_finish_data(output, packed)

Parses the context of the sacct query to SLURM for a single job. Only normal jobs return submit, start, finish, joules, ncpus, nnodes.

When a wrapper has finished, capture finish time.

Parameters:
  • output (str) – The sacct output
  • packed (bool) – true if job belongs to package
Returns:

submit, start, finish, joules, ncpus, nnodes, detailed_data

Return type:

int, int, int, int, int, int, json object (str)

parse_job_output(output)

Parses check job command output, so it can be interpreted by autosubmit

Parameters:output (str) – output to parse
Returns:job status
Return type:str
submit_Script(hold=False)

Sends a Submit file Script, execute it in the platform and retrieves the Jobs_ID of all jobs at once.

Parameters:hold (bool) – if True, the job will be held
Returns:job id for submitted jobs
Return type:list(str)
update_cmds()

Updates commands for platforms

class autosubmit.platforms.locplatform.LocalPlatform(expid, name, config)

Bases: autosubmit.platforms.paramiko_platform.ParamikoPlatform

Class to manage jobs to localhost

Parameters:expid (str) – experiment’s identifier
check_Alljobs(job_list, as_conf, retries=5)

Checks jobs running status :param job_list: list of jobs :type job_list: list :param as_conf: autosubmit configuration :type as_conf: autosubmit.config.config.Config :param retries: retries :type retries: int :return: list of jobs with their status :rtype: list

check_file_exists(src, wrapper_failed=False)

Moves a file on the platform :param src: source name :type src: str :param: wrapper_failed: if True, the wrapper failed. :type wrapper_failed: bool

connect()

Creates ssh connection to host

Returns:True if connection is created, False otherwise
Return type:bool
delete_file(filename, del_cmd=False)

Deletes a file from this platform

Parameters:filename (str) – file name
Returns:True if successful or file does no exist
Return type:bool
get_checkjob_cmd(job_id)

Returns command to check job status on remote platforms

Parameters:
  • job_id – id of job to check
  • job_id – int
Returns:

command to check job status

Return type:

str

get_file(filename, must_exist=True, relative_path='', ignore_log=False, wrapper_failed=False)

Copies a file from the current platform to experiment’s tmp folder

Parameters:
  • filename (str) – file name
  • must_exist (bool) – If True, raises an exception if file can not be copied
  • relative_path (str) – path inside the tmp folder
Returns:

True if file is copied successfully, false otherwise

Return type:

bool

get_logs_files(exp_id, remote_logs)

Overriding the parent’s implementation. Do nothing because the log files are already in the local platform (redundancy).

Parameters:
  • exp_id (str) – experiment id
  • remote_logs ((str, str)) – names of the log files
get_mkdir_cmd()

Gets command to create directories on HPC

Returns:command to create directories on HPC
Return type:str
get_ssh_output()

Gets output from last command executed

Returns:output from last command
Return type:str
get_submit_cmd(job_script, job, hold=False, export='')

Get command to add job to scheduler

Parameters:
  • job_type
  • job_script – path to job script
  • job_script – str
  • hold – submit a job in a held status
  • hold – boolean
  • export – modules that should’ve downloaded
  • export – string
Returns:

command to submit job to platforms

Return type:

str

get_submitted_job_id(output, x11=False)

Parses submit command output to extract job id :param output: output to parse :type output: str :return: job id :rtype: str

move_file(src, dest, must_exist=False)

Moves a file on the platform (includes .err and .out) :param src: source name :type src: str :param dest: destination name :param must_exist: ignore if file exist or not :type dest: str

parse_job_output(output)

Parses check job command output, so it can be interpreted by autosubmit

Parameters:output (str) – output to parse
Returns:job status
Return type:str
send_command(command, ignore_log=False, x11=False)

Sends given command to HPC

Parameters:command (str) – command to send
Returns:True if executed, False if failed
Return type:bool
send_file(filename)

Sends a local file to the platform :param filename: name of the file to send :type filename: str

test_connection()

Test if the connection is still alive, reconnect if not.

update_cmds()

Updates commands for platforms

Autosubmit GUI

Autosubmit GUI Main Page

Inside the Barcelona Supercomputing Internal Network you can find the latest version of Autosubmit GUI deployed for BSC users. It can be accessed by following the url http://bscesweb04.bsc.es/autosubmitapp/ or https://earth.bsc.es/autosubmitapp/. This is a graphic user interface that allows you to easily monitor your experiments and those of your colleagues. This Web App introduces many useful features for experiment monitoring, and we are continuously improving it.

Note

The Web App can also be accessed through the VPN Client provided by BSC.

When you enter the site, you will be presented with the following page:

autosubmit guide

Welcome page

Here you can search for any ongoing or past experiment by typing some text in the Search input box and pressing Search: the search engine will look for coincidences between your input string and any of the description, owner or name of the experiment fields. The results will be shown below ordered by status, experiments RUNNING will be shown in the first rows. You can also click on the Running button, and all the experiments that are currently running will be listed. The results will look like:

If you click on Show Detailed Data, summary data for each experiment (result) will be loaded. These are data details from the experiment run, useful to see its status at a glance. Progress bars and status will use different colors to highlight the important information.

result search plus

Search Result plus Detailed Data

For each experiment, you see the following data:

result search plus description

Description of Detailed Data

  1. Experiment Name
  2. Progress Bar: Shows completed jobs / total jobs. It turns red when there are failed jobs in the experiment, but Show Detailed Data should have been requested.
  3. Experiment Status: RUNNING or NOT RUNNING.
  4. Owner
  5. Experiment Description
  6. Refresh button: It will say Summary when the detailed data has not been requested. If it says Summary and you click on it, it will load detailed data for that experiment, otherwise it will refresh the existing detailed data.
  7. More button: Opens the Experiment Page.
  8. Average Queue Time for all jobs.
  9. Average Run Time for all jobs.
  10. Number of Running Jobs
  11. Number of Queuing Jobs
  12. Number of Submitted Jobs
  13. Number of Suspended Jobs
  14. Number of Failed Jobs: If there are Failed jobs, a list of the names of those jobs will be displayed.
result search plus description + sim

Average Times Feature

In experiments that include SIM jobs, you will also see the average queuing and running time for these jobs. In the latest version the time format has been updated to HH:mm:ss. The text for the SIM average follows the format avg. queue HH:mm:ss (M) | run HH:mm:ss (N) where M is the number of jobs considered for the avg. queue calculation and N is the number of jobs considered for run calculation.

After clicking on the MORE button, you will be presented with the Experiment Page, which is the main view that Autosubmit provides. These are its main components:

Experiment Information

This component offers the main information about your experiment.

experiment_view

Experiment Information

At the top left you see the Autosubmit Searcher home link that will take you back to the Autosubmit GUI Main Page, next to it you see the Home link that serves the same purpose, then you see the About link that takes you to a page with important information about the application (including the link to this documentation). Then you see the experiment name and status, which is updated every 5 minutes. Next, you see the run history button, this button opens a panel that shows information about previous runs of the experiment, only works for experiments running the latest version of Autosubmit. Then, you see the esarchive status badge, it shows information about the current status of the esarchive file system.

At the bottom you see some relevant metadata, including the branch of the model that was used in the experiment, the HPC name targeted by the experiment, the owner, the version that this experiment us running on, the DB version of Autosubmit, and the number of jobs in the experiment.

On the center you see the Tree Representation, which is loaded automatically when you open this page.

Tree Representation

The Tree Representation offers a structured view of the experiment.

Experiment Tree 1

Experiment Tree Representation

The view is organized in groups by date, and date-member. Each group has a folder icon, and next to the icon you can find the progress of the group as completed / total jobs (when all the jobs in a group have been completed, a check symbol will appear); then, an indicator of how many jobs inside that group are RUNNING, QUEUING, or have FAILED. Furthermore, if wrappers exist in the experiment, independent groups will be added for each wrapper that will contain the list of jobs included in the corresponding wrapper. This implies that a job can be repeated: once inside its date-member group and once in its wrapper group.

Inside each group you will find the list of jobs that belong to that group. The jobs are shown following this format: job name + # job status + ( + queuing time + ) + running time. Jobs that belong to a wrapper have also a badge with the code of the wrapper.

When you click on a Job, you can see on the right panel (Selection Panel) the following information:

  • Start: Starting date.
  • End: Ending date.
  • Section: Also known as job type.
  • Member
  • Chunk
  • Platform: Remote platform.
  • Id: Id in the remote platform.
  • Processors: Number of processors required by the job.
  • Wallclock: Time requested by the job.
  • Queue: Time spent in queue, in minutes.
  • Run: Time spent running, in minutes.
  • Status: Job status.
  • Out: Button that opens a list of jobs that depend on the one selected.
  • In: Button that opens a list of jobs on which the selected job depends.
  • out path: Path to the .out log file.
  • err path: Path to the .err log file.
  • Submit: Submit time of the job (If applicable).
  • Start: Start time of the job (If applicable).
  • Finish: Finish time of the job (If applicable).

Important

Next to the out and err paths, you see the a Copy out/err button that copies the path to your clipboard. Then you see an eye symbol button, that when clicked will show that last 150 lines of the out/err file.

Selection

When you click on a job in the tree view, a Change Status button will appear in the top bar, if you click, you will be presented with the option to generate a change status command that can be run on autosubmit, or to generate a format that can be used to change the status of the job while the experiment is running.

You can select many jobs at the same time by maintaining CTRL pressed and clicking on the jobs, then the generated command will include all these jobs.

Monitoring

If the experiment status is RUNNING, you will see a button called Refresh at the top right corner. This button will update the information of the jobs in the tree if necessary. Next to this button, you will see the button Start Job Monitor. When you click on it, a live Job Monitor will be initialized and the status of the jobs and wrappers will be queried every minute, any change will be updated in the Tree View. Also, if the Job Monitor is running, the detected changes will be listed in a panel Monitor Panel below the Selection Panel. You can stop this process by clicking on the button Stop Job Monitor.

The button Clear Tree View will clear the Tree Representation. It is also a valid way to refresh the Tree View.

Filter

At the top left you can find the Filter text input box. Insert any string and the list will show only those jobs whose description coincides with that string. For example #COMPLETED will show only completed jobs, Wrapped will show only those jobs that belong to a wrapper, _fc0_ will show only those jobs that belong to the fc0 member. Press Clear to reset the filter. On the right side of this bar, you will see the total number of jobs, and the chunk unit used in the experiment.

Advanced Filter

It is possible to use the key char * to separate keywords in the name of the job, in order. For example:

  • 1850*fc0*_1_: List all the jobs that have the string 1850 and then at least 1 occurrence of the string fc0 and then at least 1 occurrence of the string _1_. This will effectively list all the jobs for the DATE that starts with 1850 for the member fc0 and the chunk _1_.
  • 000*_5: List all the jobs that have the string 000 followed by at least one occurrence of the string _5. This will effectively list all the jobs that have member 000 and chunk number that starts with the digit 5.
  • 000*_5*PREPROCVAR: It will also add the filter for jobs of type PREPROCVAR.

As you might infer, the logic is fairly straightforward: Start your string with the word or part of the word you are looking for, then add * and the word or part of the word that follows, and so on. The algorithm will split your string by * and then search for each part in order, once it finds the part in the title of the job, it takes a substring of the job title to not repeat the next search in the same string, it continues looking for the next part in the new reduced string, and so on.

You can extend this functionality considering that date, member, section, chunk names start with the symbol _ and finish with the same symbol.

Important

This view is designed to show a structured view of your experiment, if you want a more dependency oriented view that shows better the execution sequence of your jobs, you can refer to Graph Representation.

Graph Representation

The Graph Representation of the experiment offers a dependency oriented view.

Experiment Graph 1

Experiment Graph Representation

This view offers a graph representation of the experiments where a node represents a job and an edge represents a directed dependency relationship between nodes. To open it you must click on the button Classic, which is the basic representation that uses either GraphViz or an heuristic approach depending on experiment complexity; we explain the other options later.

Once the graph representation is loaded, it will focus on a relevant node according to some established rules. The color of each node represents the status of the job it represents: you can see a color guide at the bottom of the page in the form of buttons. If you click in any of those buttons, the graph will focus on the last node with that status, except in the case of WAITING where the graph will focus on the first one. You can navigate the graph in this way, but there are other navigation buttons at the left and right corners of the graph canvas. You can also use your mouse or trackpad to navigate the graph, zoom in or zoom out. Below each node you can see the job name of the job it represents.

Important

For some experiments you will get a well distributed and generally good looking graph representation, for others you get a more straightforward representation. It depends on the size and dependency complexity of your experiments, not all experiments can be modeled as a good looking graph in reasonable time.

When you click on a node, you can see on the right panel (Selection Panel) the following information:

  • Start: Starting date.
  • End: Ending date.
  • Section: Also known as job type.
  • Member
  • Chunk
  • Platform: Remote platform.
  • Id: Id in the remote platform.
  • Processors: Number of processors required by the job.
  • Wallclock: Time requested by the job.
  • Queue: Time spent in queue, in minutes.
  • Run: Time spent running, in minutes.
  • Status: Job status.
  • Out: Button that opens a list of jobs that depend on the one selected.
  • In: Button that opens a list of jobs on which the selected job depends.
  • out path: Path to the .out log file.
  • err path: Path to the .err log file.
  • Submit: Submit time of the job (If applicable).
  • Start: Start time of the job (If applicable).
  • Finish: Finish time of the job (If applicable).

Important

Next to the out and err paths, you see the a Copy out/err button that copies the path to your clipboard. Then you see an eye symbol button, that when clicked will show that last 150 lines of the out/err file.

Selection

When you click on a node in the tree view, a Change Status button will appear in the top bar, if you click, you will be presented with the option to generate a change status command that can be run on autosubmit, or to generate a format that can be used to change the status of the job while the experiment is running.

You can select many nodes at the same time by maintaining CTRL pressed and clicking on the nodes, then the generated command will include all these jobs.

Wrappers Representation

Wrappers are an important feature of Autosubmit, and as such, it should be possible to visualize them in the graph representation.

Experiment Graph Wrapper

Wrapper Graph Representation

Wrappers are represented by nodes that have dashed border, hexagon or square shape (no difference between them), and that share green background edges. On the right side of the graph you can find the Wrappers Tab and it will display a list of the existing wrappers as buttons. If you click on any of these buttons, the nodes that belong to that wrapper will be highlighted.

Monitoring

If the experiment is RUNNING you will see at the top right corner the button Start Job Monitor. When you click on it, a live Job Monitor will be initialized and the status of the jobs and wrappers will be queried every minute, any change will be updated in the graph. Also, if the Job Monitor is running, the detected changes will be listed in a panel Monitor Panel below the Selection Panel. You can stop this process by clicking on the button Stop Job Monitor.

Important

While this is a good option to monitor the progress of your experiment, you can also use the Autosubmit Log.

Grouped by Date Member

By clicking on the button Grouped by D-M you get a graph representation where the nodes are clustered by date and member. For example, if your experiment has only one starting date and one member, then you will have only one cluster in this view. These clusters are represented by rectangular boxes whose color gives a general idea of the status of the jobs inside it.

Important

You can double click on any cluster to “open” it, meaning that the nodes that belong to that cluster will be freed and positioned individually.

Grouped by Status

By clicking on the button Grouped by Status you get a graph representation where the nodes are clustered by status into 3 clusters: WAITING, COMPLETED, and SUSPENDED. Same rules mentioned for Grouped by Date Member apply.

Laplacian

By clicking on the button Laplacian you get a graph representation where the (x,y) coordinates of each node are calculated based on the second and third smallest eigenvector of the Graph Laplacian. All functionality is supported.

Autosubmit Log

When you click on the Log tab, you will see the button Show Log:

Experiment Log 1

Experiment Log

Important

The main Autosubmit log is usually stored in the folder /tmp/ of your experiment, and this is the first path the system will scan.

When you click on the Show Log button, the last 150 lines of the log will be displayed:

Experiment Log 2

Experiment Log Open

At the top of the log you will see the name of the log file that is being displayed along with the timestamp of the last time the log was requested, and to the right you see this timestamp in datetime format.

If the experiment is currently running, the log will be updated periodically to keep you up with recent updates in the experiment execution. It is possible to scroll this view.

If you click on Hide Log the log will be cleared and the periodic updates will stop.

Performance Metrics

The Performance Metrics tabs offers a set of metrics that try to measure the efficiency of the simulation performed, and other aspects of it.

Performance Metrics 1

Performance Metrics Tab

On the left you have the values of the main performance metrics that apply to the experiment. Then, on the right, you see the list of jobs considered for this calculation with their data, also, SYPD and ASYPD are calculated individually for these jobs. This list is scrollable.

You can also access a Show warnings button that opens a list of important information that might affect the calculation of the metrics. You can click again on this button to close the list.

Further information about the metrics is included in the tab.

Autosubmit Statistics

When you click on the Statistics tab, you will see two input boxes: Section and Hours, followed by the button Get Statistics:

Experiment Stat 1

Experiment Statistics

There is also a brief explanation of the input fields and expected result. Basically, Section allows you to narrow your search to certain job types, and Hours allows you to set a time limit to look into the past in hours.

In this example we have queried 3 hours into the past:

Experiment Stat 2

Experiment Statistics (Last 3 hours)

Click on the button Clear Statistics to clear the results and submit another query.

Important

For more details about Autosubmit statistics, refer to: How to monitor job statistics.

Important

To improve response times, Autosubmit GUI will try to store the dependency structure of your experiment in a database filed called structure_expid.db where expid is the name of your experiment. This file will be located in /esarchive/autosubmit/expid/pkl/.