How to add a new job ==================== To add a new job, open the /cxxx/conf/jobs_cxxx.conf file where cxxx is the experiment identifier and add this text:s .. code-block:: ini [new_job] FILE = This will create a new job named "new_job" that will be executed once at the default platform. This job will user the template located at (path is relative to project folder). This is the minimum job definition and usually is not enough. You usually will need to add some others parameters: * PLATFORM: allows you to execute the job in a platform of your choice. It must be defined in the experiment's platforms.conf file or to have the value 'LOCAL' that always refer to the machine running Autosubmit * RUNNING: defines if jobs runs only once or once per start-date, member or chunk. Options are: once, date, member, chunk * DEPENDENCIES: defines dependencies from job as a list of parents jobs separated by spaces. For example, if 'new_job' has to wait for "old_job" to finish, you must add the line "DEPENDENCIES = old_job". * For dependencies to jobs running in previous chunks, members or start-dates, use -(DISTANCE). For example, for a job "SIM" waiting for the previous "SIM" job to finish, you have to add "DEPENDENCIES = SIM-1". * For dependencies that are not mandatory for the normal workflow behaviour, you must add the char '?' at the end of the dependency. * SELECT_CHUNKS (optional): by default, all sections depend on all jobs the items specified on the DEPENDENCIES parameter. However, with this parameter, you could select the chunks of a specific job section. At the end of this doc, you will find diverse examples of this feature. The syntaxis is as follows: .. code-block:: ini [jobs] SELECT_CHUNKS = SIM*[1]*[3] # Enables the dependency of chunk 1 with chunk 3. While chunks 2,4 won't be linked. SELECT_CHUNKS = SIM*[1:3] # Enables the dependency of chunk 1,2 and 3. While 4 won't be linked. SELECT_CHUNKS = SIM*[1,3] # Enables the dependency of chunk 1 and 3. While 2 and 4 won't be linked SELECT_CHUNKS = SIM*[1] # Enables the dependency of chunk 1. While 2, 3 and 4 won't be linked * SELECT_MEMBERS (optional): by default, all sections depend on all jobs the items specified on the DEPENDENCIES parameter. However, with this parameter, you could select the members of a specific job section. At the end of this doc, you will find diverse examples of this feature. Caution, you must pick the member index, not the member name. .. code-block:: ini [expdef.conf] ... MEMBERS = AA BB CC DD ... [jobs.conf] SELECT_MEMBERS = SIM*[1]*[3] # Enables the dependency of member BB with member DD. While AA and CC won't be linked. SELECT_MEMBERS = SIM*[1:3] # Enables the dependency of member BB,CC and DD. While AA won't be linked. SELECT_MEMBERS = SIM*[1,3] # Enables the dependency of member BB and DD. While AA and CC won't be linked SELECT_MEMBERS = SIM*[1] # Enables the dependency of member BB. While AA, CC and DD won't be linked * EXCLUDED_CHUNKS (optional): With this parameter, you can prevent the generation of jobs for a list of chunks. * EXCLUDED_MEMBERS (optional): With this parameter, you can prevent the generation of jobs for a list of members. For jobs running in HPC platforms, usually you have to provide information about processors, wallclock times and more. To do this use: * WALLCLOCK: wallclock time to be submitted to the HPC queue in format HH:MM * PROCESSORS: processors number to be submitted to the HPC. If not specified, defaults to 1. * THREADS: threads number to be submitted to the HPC. If not specified, defaults to 1. * TASKS: tasks number to be submitted to the HPC. If not specified, defaults to 1. * HYPERTHREADING: Enables Hyperthreading, this will double the max amount of threads. defaults to false. ( Not avaliable on slurm platforms ) * QUEUE: queue to add the job to. If not specificied, uses PLATFORM default. * RETRIALS: Number of retrials if job fails * DELAY_RETRY_TIME: Allows to put a delay between retries. Triggered when a job fails. If not specified, Autosubmit will retry the job as soon as possible. Accepted formats are: plain number (there will be a constant delay between retrials, of as many seconds as specified), plus (+) sign followed by a number (the delay will steadily increase by the addition of these number of seconds), or multiplication (*) sign follows by a number (the delay after n retries will be the number multiplied by 10*n). Having this in mind, the ideal scenario is to use +(number) or plain(number) in case that the HPC has little issues or the experiment will run for a little time. Otherwise, is better to use the *(number) approach. .. code-block:: ini #DELAY_RETRY_TIME = 11 #DELAY_RETRY_TIME = +11 # will wait 11 + number specified #DELAY_RETRY_TIME = *11 # will wait 11,110,1110,11110...* by 10 to prevent a too big number There are also other, less used features that you can use: * FREQUENCY: specifies that a job has only to be run after X dates, members or chunk. A job will always be created for the last one. If not specified, defaults to 1 * SYNCHRONIZE: specifies that a job with RUNNING=chunk, has to synchronize its dependencies chunks at a 'date' or 'member' level, which means that the jobs will be unified: one per chunk for all members or dates. If not specified, the synchronization is for each chunk of all the experiment. * RERUN_ONLY: determines if a job is only to be executed in reruns. If not specified, defaults to false. * CUSTOM_DIRECTIVES: Custom directives for the HPC resource manager headers of the platform used for that job. * SKIPPABLE: When this is true, the job will be able to skip it work if there is an higher chunk or member already ready, running, queuing or in complete status. * EXPORT: Allows to run an env script or load some modules before running this job. * EXECUTABLE: Allows to wrap a job for be launched with a set of env variables. * QUEUE: queue to add the job to. If not specificied, uses PLATFORM default. Workflow examples: ================== Example 1: ---------- In this first example, you can see 3 jobs in which last job (POST) shows an example with select chunks: .. code-block:: ini [INI] FILE = templates/common/ini.tmpl.sh RUNNING = member WALLCLOCK = 00:30 QUEUE = debug CHECK = true [SIM] FILE = templates/ecearth3/ecearth3.sim DEPENDENCIES = INI RUNNING = chunk WALLCLOCK = 04:00 PROCESSORS = 1616 THREADS = 1 [POST] FILE = templates/common/post.tmpl.sh DEPENDENCIES = SIM RUNNING = chunk WALLCLOCK = 01:00 QUEUE = Debug check = true # Then you can select the specific chunks of dependency SIM with one of those lines: SELECT_CHUNKS = SIM*[1]*[3] # Will do the dependency of chunk 1 and chunk 3. While chunks 2,4 won't be linked. SELECT_CHUNKS = SIM*[1:3] #Enables the dependency of chunk 1,2 and 3. While 4 won't be linked. SELECT_CHUNKS = SIM*[1,3] #Enables the dependency of chunk 1 and 3. While 2 and 4 won't be linked SELECT_CHUNKS = SIM*[1] #Enables the dependency of chunk 1. While 2, 3 and 4 won't be linked Example 2: select_chunks ------------------------ In this workflow you can see an illustrated example of select_chunks used in an actual workflow, to avoid an excess of information we only will see the configuration of a single job: .. code-block:: ini [SIM] FILE = templates/sim.tmpl.sh DEPENDENCIES = INI SIM-1 POST-1 CLEAN-5 SELECT_CHUNKS = POST*[1] RUNNING = chunk WALLCLOCK = 0:30 PROCESSORS = 768 .. figure:: ../../workflows/select_chunks.png :name: simple :width: 100% :align: center :alt: select_chunks_workflow Example 3: SKIPPABLE -------------------- In this workflow you can see an illustrated example of SKIPPABLE parameter used in an dummy workflow. .. code-block:: ini [SIM] FILE = sim.sh DEPENDENCIES = INI POST-1 WALLCLOCK = 00:15 RUNNING = chunk QUEUE = debug SKIPPABLE = TRUE [POST] FILE = post.sh DEPENDENCIES = SIM WALLCLOCK = 00:05 RUNNING = member #QUEUE = debug .. figure:: ../../workflows/skip.png :name: simple :width: 100% :align: center :alt: skip_workflow Example 4: Weak dependencies -------------------- In this workflow you can see an illustrated example of weak dependencies. Weak dependencies, work like this way: * X job only has one parent. X job parent can have "COMPLETED or FAILED" as status for current job to run. * X job has more than one parent. One of the X job parent must have "COMPLETED" as status while the rest can be "FAILED or COMPLETED". .. code-block:: ini [GET_FILES] FILE = templates/fail.sh RUNNING = chunk [IT] FILE = templates/work.sh RUNNING = chunk QUEUE = debug [CALC_STATS] FILE = templates/work.sh DEPENDENCIES = IT GET_FILES? RUNNING = chunk SYNCHRONIZE = member .. figure:: ../../workflows/Dashed.png :name: simple :width: 100% :align: center :alt: dashed_workflow Example 5: Select Member -------------------- In this workflow you can see an illustrated example of select member. Using 4 members 1 datelist and 4 different job sections. Expdef: .. code-block:: ini [experiment] DATELIST = 19600101 MEMBERS = 00 01 02 03 CHUNKSIZE = 1 NUMCHUNKS = 2 Jobs_conf: .. code-block:: ini [SIM] ... RUNNING = chunk QUEUE = debug [DA] ... DEPENDENCIES = SIM SELECT_MEMBERS = SIM*[0:2] RUNNING = chunk SYNCHRONIZE = member [REDUCE] ... DEPENDENCIES = SIM SELECT_MEMBERS = SIM*[3] RUNNING = member FREQUENCY = 4 [REDUCE_AN] ... FILE = templates/05b_sim.sh DEPENDENCIES = DA RUNNING = chunk SYNCHRONIZE = member .. figure:: ../../workflows/Select_members.png :name: simple :width: 100% :align: center :alt: select_members