Defining the workflow#

One of the most important step that you have to do when planning to use autosubmit for an experiment is the definition of the workflow the experiment will use. In this section you will learn about the workflow definition syntax so you will be able to exploit autosubmit’s full potential

Warning

This section is NOT intended to show how to define your jobs. Please go to Getting Started section for a comprehensive list of job options.

Simple workflow#

The simplest workflow that can be defined it is a sequence of two jobs, with the second one triggering at the end of the first. To define it, we define the two jobs and then add a DEPENDENCIES attribute on the second job referring to the first one.

It is important to remember when defining workflows that DEPENDENCIES on autosubmit always refer to jobs that should be finished before launching the job that has the DEPENDENCIES attribute.

One:
    FILE: one.sh

Two:
    FILE: two.sh
    DEPENDENCIES: One

The resulting workflow can be seen in Fi165gure 1

simple workflow plot — 1 Example showing a simple workflow with two sequential jobs#

Running jobs once per startdate, member or chunk#

Autosubmit is capable of running ensembles made of various startdates and members. It also has the capability to divide member execution on different chunks.

To set at what level a job has to run you have to use the RUNNING attribute. It has four possible values: once, date, member and chunk corresponding to running once, once per startdate, once per member or once per chunk respectively.

once:
    FILE: Once.sh

date:
    FILE: date.sh
    DEPENDENCIES: once
    RUNNING: date

member:
    FILE: Member.sh
    DEPENDENCIES: date
    RUNNING: member

chunk:
    FILE: Chunk.sh
    DEPENDENCIES: member
    RUNNING: chunk

The resulting workflow can be seen in Figure 2 for a experiment with 2 startdates, 2 members and 2 chunks.

Dependencies#

Dependencies on autosubmit were introduced on the first example, but in this section you will learn about some special cases that will be very useful on your workflows.

Dependencies with previous jobs#

Autosubmit can manage dependencies between jobs that are part of different chunks, members or startdates. The next example will show how to make a simulation job wait for the previous chunk of the simulation. To do that, we add sim-1 on the DEPENDENCIES attribute. As you can see, you can add as much dependencies as you like separated by spaces

ini:
    FILE: ini.sh
    RUNNING: member

sim:
    FILE: sim.sh
    DEPENDENCIES: ini sim-1
    RUNNING: chunk

postprocess:
    FILE: postprocess.sh
    DEPENDENCIES: sim
    RUNNING: chunk

The resulting workflow can be seen in Figure 3

Warning

Autosubmit simplifies the dependencies, so the final graph usually does not show all the lines that you may expect to see. In this example you can see that there are no lines between the ini and the sim jobs for chunks 2 to 5 because that dependency is redundant with the one on the previous sim

Dependencies between running levels#

On the previous examples we have seen that when a job depends on a job on a higher level (a running chunk job depending on a member running job) all jobs wait for the higher running level job to be finished. That is the case on the ini sim dependency on the next example.

In the other case, a job depending on a lower running level job, the higher level job will wait for ALL the lower level jobs to be finished. That is the case of the postprocess combine dependency on the next example.

ini:
    FILE: ini.sh
    RUNNING: member

sim:
    FILE: sim.sh
    DEPENDENCIES: ini sim-1
    RUNNING: chunk

postprocess:
    FILE: postprocess.sh
    DEPENDENCIES: sim
    RUNNING: chunk

combine:
    FILE: combine.sh
    DEPENDENCIES: postprocess
    RUNNING: member

The resulting workflow can be seen in Figure 4

Dependencies rework#

The DEPENDENCIES key is used to define the dependencies of a job. It can be used in the following ways:

Basic: The dependencies are a list of jobs, separated by “ “, that runs before the current task is submitted.
New: The dependencies is a list of YAML sections, separated by “n”, that runs before the current job is submitted.
- For each dependency section, you can designate the following keywords to control the current job-affected tasks:
  - DATES_FROM: Selects the job dates that you want to alter.
  - MEMBERS_FROM: Selects the job members that you want to alter.
  - CHUNKS_FROM: Selects the job chunks that you want to alter.
- For each dependency section and *_FROM keyword, you can designate the following keywords to control the destination of the dependency:
  - DATES_TO: Links current selected tasks to the dependency tasks of the dates specified.
  - MEMBERS_TO: Links current selected tasks to the dependency tasks of the members specified.
  - CHUNKS_TO: Links current selected tasks to the dependency tasks of the chunks specified.
- Important keywords for [DATES|MEMBERS|CHUNKS]_TO:
  - “natural”: Will keep the default linkage. Will link if it would be normally. Example, SIM_FC00_CHUNK_1 -> DA_FC00_CHUNK_1.
  - “all”: Will link all selected tasks of the dependency with current selected tasks. Example, SIM_FC00_CHUNK_1 -> DA_FC00_CHUNK_1, DA_FC00_CHUNK_2, DA_FC00_CHUNK_3…
  - “none”: Will unlink selected tasks of the dependency with current selected tasks.

For the new format, consider that the priority is hierarchy and goes like this DATES_FROM -(includes)-> MEMBERS_FROM -(includes)-> CHUNKS_FROM.

You can define a DATES_FROM inside the DEPENDENCY.
You can define a MEMBERS_FROM inside the DEPENDENCY and DEPENDENCY.DATES_FROM.
You can define a CHUNKS_FROM inside the DEPENDENCY, DEPENDENCY.DATES_FROM, DEPENDENCY.MEMBERS_FROM, DEPENDENCY.DATES_FROM.MEMBERS_FROM

Start conditions#

Sometimes you want to run a job only when a certain condition is met. For example, you may want to run a job only when a certain task is running. This can be achieved using the START_CONDITIONS feature based on the dependencies rework.

Start conditions are achieved by adding the keyword STATUS and optionally FROM_STEP keywords into any dependency that you want.

The STATUS keyword can be used to select the status of the dependency that you want to check. The possible values ( case-insensitive ) are:

“WAITING”: The task is waiting for its dependencies to be completed.
“DELAYED”: The task is delayed by a delay condition.
“PREPARED”: The task is prepared to be submitted.
“READY”: The task is ready to be submitted.
“SUBMITTED”: The task is submitted.
“HELD”: The task is held.
“QUEUING”: The task is queuing.
“RUNNING”: The task is running.
“SKIPPED”: The task is skipped.
“FAILED”: The task is failed.
“UNKNOWN”: The task is unknown.
“COMPLETED”: The task is completed. # Default
“SUSPENDED”: The task is suspended.

The status are ordered, so if you select “RUNNING” status, the task will be run if the parent is in any of the following statuses: “RUNNING”, “QUEUING”, “HELD”, “SUBMITTED”, “READY”, “PREPARED”, “DELAYED”, “WAITING”.

ini:
    FILE: ini.sh
    RUNNING: member

sim:
    FILE: sim.sh
    DEPENDENCIES: ini sim-1
    RUNNING: chunk

postprocess:
    FILE: postprocess.sh
    DEPENDENCIES:
        SIM:
            STATUS: "RUNNING"
    RUNNING: chunk

The FROM_STEP keyword can be used to select the internal step of the dependency that you want to check. The possible value is an integer. Additionally, the target dependency, must call to %AS_CHECKPOINT% inside their scripts. This will create a checkpoint that will be used to check the amount of steps processed.

A:
  FILE: a.sh
  RUNNING: once
  SPLITS: 2
A_2:
  FILE: a_2.sh
  RUNNING: once
  DEPENDENCIES:
    A:
      SPLIT_TO: "2"
      STATUS: "RUNNING"
      FROM_STEP: 2

There is now a new function that is automatically added in your scripts which is called as_checkpoint. This is the function that is generating the checkpoint file. You can see the function below:

###################
# AS CHECKPOINT FUNCTION
###################
# Creates a new checkpoint file upon call based on the current numbers of calls to the function

AS_CHECKPOINT_CALLS=0
function as_checkpoint {
    AS_CHECKPOINT_CALLS=$((AS_CHECKPOINT_CALLS+1))
    touch ${job_name_ptrn}_CHECKPOINT_${AS_CHECKPOINT_CALLS}
}

And what you would have to include in your target dependency or dependencies is the call to this function which in this example is a.sh.

The amount of calls is strongly related to the FROM_STEP value.

$expid/proj/$projname/as.sh

##compute somestuff
as_checkpoint
## compute some more stuff
as_checkpoint

To select an specific task, you have to combine the STATUS and CHUNKS_TO , MEMBERS_TO and DATES_TO, SPLITS_TO keywords.

A:
  FILE: a
  RUNNING: once
  SPLITS: 1
B:
  FILE: b
  RUNNING: once
  SPLITS: 2
  DEPENDENCIES: A
C:
  FILE: c
  RUNNING: once
  SPLITS: 1
  DEPENDENCIES: B
RECOVER_B_2:
  FILE: fix_b
  RUNNING: once
  DEPENDENCIES:
    B:
      SPLIT_TO: "2"
      STATUS: "RUNNING"

Job frequency#

Some times you just don’t need a job to be run on every chunk or member. For example, you may want to launch the postprocessing job after various chunks have completed. This behaviour can be achieved using the FREQUENCY attribute. You can specify an integer I for this attribute and the job will run only once for each I iterations on the running level.

Hint

You don’t need to adjust the frequency to be a divisor of the total jobs. A job will always execute at the last iteration of its running level

ini:
    FILE: ini.sh
    RUNNING: member

sim:
    FILE: sim.sh
    DEPENDENCIES: ini sim-1
    RUNNING: chunk

postprocess:
    FILE: postprocess.sh
    DEPENDENCIES: sim
    RUNNING: chunk
    FREQUENCY: 3

combine:
    FILE: combine.sh
    DEPENDENCIES: postprocess
    RUNNING: member

The resulting workflow can be seen in Figure 5

Job synchronize#

For jobs running at chunk level, and this job has dependencies, you could want not to run a job for each experiment chunk, but to run once for all member/date dependencies, maintaining the chunk granularity. In this cases you can use the SYNCHRONIZE job parameter to determine which kind of synchronization do you want. See the below examples with and without this parameter.

Hint

This job parameter works with jobs with RUNNING parameter equals to ‘chunk’.

ini:
    FILE: ini.sh
    RUNNING: member

sim:
    FILE: sim.sh
    DEPENDENCIES: INI SIM-1
    RUNNING: chunk

ASIM:
    FILE: asim.sh
    DEPENDENCIES: SIM
    RUNNING: chunk

The resulting workflow can be seen in Figure 6

ASIM:
    SYNCHRONIZE: member

The resulting workflow of setting SYNCHRONIZE parameter to ‘member’ can be seen in Figure 7

ASIM:
    SYNCHRONIZE: date

The resulting workflow of setting SYNCHRONIZE parameter to ‘date’ can be seen in Figure 8

Job split#

For jobs running at any level, it may be useful to split each task into different parts. This behaviour can be achieved using the SPLITS attribute to specify the number of parts.

It is also possible to specify the splits for each task using the SPLITS_FROM and SPLITS_TO attributes.

There is also an special character ‘*’ that can be used to specify that the split is 1-to-1 dependency. In order to use this character, you have to specify both SPLITS_FROM and SPLITS_TO attributes.

ini:
    FILE: ini.sh
    RUNNING: once

sim:
    FILE: sim.sh
    DEPENDENCIES: ini sim-1
    RUNNING: once

asim:
    FILE: asim.sh
    DEPENDENCIES: sim
    RUNNING: once
    SPLITS: 3

post:
    FILE: post.sh
    RUNNING: once
    DEPENDENCIES:
        asim:
            SPLITS_FROM:
                2,3: # [2:3] is also valid
                    splits_to: 1,2*,3* # 1,[2:3]* is also valid, you can also specify the step with [2:3:step]
    SPLITS: 3

In this example:

Post job will be split into 2 parts. Each part will depend on the 1st part of the asim job. The 2nd part of the post job will depend on the 2nd part of the asim job. The 3rd part of the post job will depend on the 3rd part of the asim job.

Example2: N-to-1 dependency

TEST:
  FILE: TEST.sh
  RUNNING: once
  SPLITS: '4'
TEST2:
  FILE: TEST2.sh
  DEPENDENCIES:
    TEST:
      SPLITS_FROM:
        "[1:2]":
          SPLITS_TO: "[1:4]*\\2"
  RUNNING: once
  SPLITS: '2'

Example3: 1-to-N dependency

TEST:
  FILE: TEST.sh
  RUNNING: once
  SPLITS: '2'
TEST2:
  FILE: TEST2.sh
  DEPENDENCIES:
    TEST:
      SPLITS_FROM:
        "[1:4]":
          SPLITS_TO: "[1:2]*\\2"
  RUNNING: once
  SPLITS: '4'

Job delay#

Some times you need a job to be run after a certain number of chunks. For example, you may want to launch the asim job after various chunks have completed. This behaviour can be achieved using the DELAY attribute. You can specify an integer N for this attribute and the job will run only after N chunks.

Hint

This job parameter works with jobs with RUNNING parameter equals to ‘chunk’.

ini:
    FILE: ini.sh
    RUNNING: member

sim:
    FILE: sim.sh
    DEPENDENCIES: ini sim-1
    RUNNING: chunk

asim:
    FILE: asim.sh
    DEPENDENCIES:  sim asim-1
    RUNNING:  chunk
    DELAY:  2

post:
    FILE:  post.sh
    DEPENDENCIES:  sim asim
    RUNNING:  chunk

The resulting workflow can be seen in Figure 9

simple workflow with delay option — 9 Example showing the asim job starting only from chunk 3.#

Workflow examples:#

Example 1: How to select an specific chunk#

Warning

This example illustrates the old select_chunk.

SIM:
    FILE: templates/sim.tmpl.sh
    DEPENDENCIES: INI SIM-1 POST-1 CLEAN-5
        INI:
        SIM-1:
        POST-1:
          CHUNKS_FROM:
            all:
                chunks_to: 1
        CLEAN-5:
    RUNNING: chunk
    WALLCLOCK: 0:30
    PROCESSORS: 768

Example 2: SKIPPABLE#

In this workflow you can see an illustrated example of SKIPPABLE parameter used in an dummy workflow.

JOBS:
    SIM:
        FILE: sim.sh
        DEPENDENCIES: INI POST-1
        WALLCLOCK: 00:15
        RUNNING: chunk
        QUEUE: debug
        SKIPPABLE: TRUE

    POST:
        FILE: post.sh
        DEPENDENCIES: SIM
        WALLCLOCK: 00:05
        RUNNING: member
        #QUEUE: debug

Example 3: Weak dependencies#

In this workflow you can see an illustrated example of weak dependencies.

Weak dependencies, work like this way:

X job only has one parent. X job parent can have “COMPLETED or FAILED” as status for current job to run.
X job has more than one parent. One of the X job parent must have “COMPLETED” as status while the rest can be “FAILED or COMPLETED”.

JOBS:
    GET_FILES:
        FILE: templates/fail.sh
        RUNNING: chunk

    IT:
        FILE: templates/work.sh
        RUNNING: chunk
        QUEUE: debug

    CALC_STATS:
        FILE: templates/work.sh
        DEPENDENCIES: IT GET_FILES?
        RUNNING: chunk
        SYNCHRONIZE: member

Example 4: Select Member#

In this workflow you can see an illustrated example of select member. Using 4 members 1 datelist and 4 different job sections.

Expdef:

experiment:
    DATELIST: 19600101
    MEMBERS: "00 01 02 03"
    CHUNKSIZE: 1
    NUMCHUNKS: 2

Jobs_conf:

JOBS:
    SIM:
        ...
        RUNNING: chunk
        QUEUE: debug

    DA:
        ...
        DEPENDENCIES:
            SIM:
                members_from:
                    all:
                        members_to: 00,01,02
        RUNNING: chunk
        SYNCHRONIZE: member

    REDUCE:
        ...
        DEPENDENCIES:
            SIM:
                members_from:
                    all:
                        members_to: 03
        RUNNING: member
        FREQUENCY: 4

    REDUCE_AN:
        ...
        FILE: templates/05b_sim.sh
        DEPENDENCIES: DA
        RUNNING: chunk
        SYNCHRONIZE: member

Loops definition#

You need to use the FOR and NAME keys to define a loop.

To generate the following jobs:

experiment:
  DATELIST: 19600101
  MEMBERS: "00"
  CHUNKSIZEUNIT: day
  CHUNKSIZE: '1'
  NUMCHUNKS: '2'
  CALENDAR: standard
JOBS:
  POST_20:

    DEPENDENCIES:
      POST_20:
      SIM_20:
    FILE: POST.sh
    PROCESSORS: '20'
    RUNNING: chunk
    THREADS: '1'
    WALLCLOCK: 00:05
  POST_40:

    DEPENDENCIES:
      POST_40:
      SIM_40:
    FILE: POST.sh
    PROCESSORS: '40'
    RUNNING: chunk
    THREADS: '1'
    WALLCLOCK: 00:05
  POST_80:

    DEPENDENCIES:
      POST_80:
      SIM_80:
    FILE: POST.sh
    PROCESSORS: '80'
    RUNNING: chunk
    THREADS: '1'
    WALLCLOCK: 00:05
  SIM_20:

    DEPENDENCIES:
      SIM_20-1:
    FILE: POST.sh
    PROCESSORS: '20'
    RUNNING: chunk
    THREADS: '1'
    WALLCLOCK: 00:05
  SIM_40:

    DEPENDENCIES:
      SIM_40-1:
    FILE: POST.sh
    PROCESSORS: '40'
    RUNNING: chunk
    THREADS: '1'
    WALLCLOCK: 00:05
  SIM_80:

    DEPENDENCIES:
      SIM_80-1:
    FILE: POST.sh
    PROCESSORS: '80'
    RUNNING: chunk
    THREADS: '1'
    WALLCLOCK: 00:05

One can use now the following configuration:

experiment:
  DATELIST: 19600101
  MEMBERS: "00"
  CHUNKSIZEUNIT: day
  CHUNKSIZE: '1'
  NUMCHUNKS: '2'
  CALENDAR: standard
JOBS:
  SIM:
    FOR:
      NAME: [ 20,40,80 ]
      PROCESSORS: [ 20,40,80 ]
      THREADS: [ 1,1,1 ]
      DEPENDENCIES: [ SIM_20-1,SIM_40-1,SIM_80-1 ]
    FILE: POST.sh
    RUNNING: chunk
    WALLCLOCK: '00:05'
  POST:
      FOR:
        NAME: [ 20,40,80 ]
        PROCESSORS: [ 20,40,80 ]
        THREADS: [ 1,1,1 ]
        DEPENDENCIES: [ SIM_20 POST_20,SIM_40 POST_40,SIM_80 POST_80 ]
      FILE: POST.sh
      RUNNING: chunk
      WALLCLOCK: '00:05'

Warning

The mutable parameters must be inside the FOR key.