autosubmit.job

Main module for Autosubmit. Only contains an interface class to all functionality implemented on Autosubmit

class autosubmit.job.job.Job(name, job_id, status, priority)

Class to handle all the tasks with Jobs at HPC. A job is created by default with a name, a jobid, a status and a type. It can have children and parents. The inheritance reflects the dependency between jobs. If Job2 must wait until Job1 is completed then Job2 is a child of Job1. Inversely Job1 is a parent of Job2

Parameters:
  • name (str) – job’s name
  • jobid (int) – job’s identifier
  • status (Status) – job initial status
  • priority (int) – job’s priority
add_parent(*parents)

Add parents for the job. It also adds current job as a child for all the new parents

Parameters:parents (*Job) – job’s parents to add
check_completion(default_status=-1, over_wallclock=False)

Check the presence of COMPLETED file. Change status to COMPLETED if COMPLETED file exists and to FAILED otherwise. :param default_status: status to set if job is not completed. By default is FAILED :type default_status: Status

check_end_time()

Returns end time from stat file

Returns:date and time
Return type:str
check_retrials_end_time()

Returns list of end datetime for retrials from total stats file

Returns:date and time
Return type:list[int]
check_retrials_start_time()

Returns list of start datetime for retrials from total stats file

Returns:date and time
Return type:list[int]
check_retrials_submit_time()

Returns list of submit datetime for retrials from total stats file

Returns:date and time
Return type:list[int]
check_running_after(date_limit)

Checks if the job was running after the given date :param date_limit: reference date :type date_limit: datetime.datetime :return: True if job was running after the given date, false otherwise :rtype: bool

check_script(as_conf, parameters, show_logs=False)

Checks if script is well formed

Parameters:
  • parameters (dict) – script parameters
  • as_conf (AutosubmitConfig) – configuration file
  • show_logs (Bool) – Display output
Returns:

true if not problem has been detected, false otherwise

Return type:

bool

check_start_time()

Returns job’s start time

Returns:start time
Return type:str
check_started_after(date_limit)

Checks if the job started after the given date :param date_limit: reference date :type date_limit: datetime.datetime :return: True if job started after the given date, false otherwise :rtype: bool

children

Returns a list containing all children of the job

Returns:child jobs
Return type:set
children_names_str

Comma separated list of children’s names

compare_by_id(other)

Compare jobs by ID

Parameters:other (Job) – job to compare
Returns:comparison result
Return type:bool
compare_by_name(other)

Compare jobs by name

Parameters:other (Job) – job to compare
Returns:comparison result
Return type:bool
compare_by_status(other)

Compare jobs by status value

Parameters:other (Job) – job to compare
Returns:comparison result
Return type:bool
create_script(as_conf)

Creates script file to be run for the job

Parameters:as_conf (AutosubmitConfig) – configuration object
Returns:script’s filename
Return type:str
delete_child(child)

Removes a child from the job

Parameters:child (Job) – child to remove
delete_parent(parent)

Remove a parent from the job

Parameters:parent (Job) – parent to remove
get_last_retrials()

Returns the retrials of a job, including the last COMPLETED run. The selection stops, and does not include, when the previous COMPLETED job is located or the list of registers is exhausted.

Returns:list of list of dates of retrial [submit, start, finish] in datetime format
Return type:list of list
has_children()

Returns true if job has any children, else return false

Returns:true if job has any children, otherwise return false
Return type:bool
has_parents()

Returns true if job has any parents, else return false

Returns:true if job has any parent, otherwise return false
Return type:bool
inc_fail_count()

Increments fail count

static is_a_completed_retrial(fields)

Returns true only if there 4 fields: submit start finish status, and status equals COMPLETED.

is_ancestor(job)

Check if the given job is an ancestor :param job: job to be checked if is an ancestor :return: True if job is an ancestor, false otherwise :rtype bool

is_over_wallclock(start_time, wallclock)

Check if the job is over the wallclock time, it is an alternative method to avoid platform issues :param start_time: :param wallclock: :return:

is_parent(job)

Check if the given job is a parent :param job: job to be checked if is a parent :return: True if job is a parent, false otherwise :rtype bool

log_job()

Prints job information in log

long_name

Job’s long name. If not setted, returns name

Returns:long name
Return type:str
parents

Returns parent jobs list

Returns:parent jobs
Return type:set
platform

Returns the platform to be used by the job. Chooses between serial and parallel platforms

:return HPCPlatform object for the job to use :rtype: HPCPlatform

print_job()

Prints debug information about the job

print_parameters()

Print sjob parameters in log

queue

Returns the queue to be used by the job. Chooses between serial and parallel platforms

:return HPCPlatform object for the job to use :rtype: HPCPlatform

read_header_tailer_script(script_path, as_conf)

Opens and reads a script. If it is not a BASH script it will fail :(

Will strip away the line with the hash bang (#!)

Parameters:
  • script_path (string) – relative to the experiment directory path to the script
  • as_conf (config) – Autosubmit configuration file
remove_redundant_parents()

Checks if a parent is also an ancestor, if true, removes the link in both directions. Useful to remove redundant dependencies.

shape

Returns the shape of the job. Chooses between serial and parallel platforms

:return HPCPlatform object for the job to use :rtype: HPCPlatform

status_str

String representation of the current status

total_processors

Number of processors requested by job. Reduces ‘:’ separated format if necessary.

update_content(as_conf)

Create the script content to be run for the job

Parameters:as_conf (config) – config
Returns:script code
Return type:str
update_parameters(as_conf, parameters, default_parameters={'M': '%M%', 'M_': '%M_%', 'Y': '%Y%', 'Y_': '%Y_%', 'd': '%d%', 'd_': '%d_%', 'm': '%m%', 'm_': '%m_%'})

Refresh parameters value

Parameters:
  • default_parameters (dict) –
  • as_conf (AutosubmitConfig) –
  • parameters (dict) –
update_status(copy_remote_logs=False, failed_file=False)

Updates job status, checking COMPLETED file if needed

Parameters:
  • new_status – job status retrieved from the platform
  • copy_remote_logs – should copy remote logs when finished?
Type:

Status

write_end_time(completed, enabled=False)

Writes ends date and time to TOTAL_STATS file :param completed: True if job was completed successfully, False otherwise :type completed: bool

write_start_time(enabled=False)

Writes start date and time to TOTAL_STATS file :return: True if succesful, False otherwise :rtype: bool

write_submit_time(enabled=False, hold=False)

Writes submit date and time to TOTAL_STATS file. It doesn’t write if hold == True.

write_total_stat_by_retries(total_stats, first_retrial=False)

Writes all data to TOTAL_STATS file :param total_stats: data gathered by the wrapper :type completed: str

class autosubmit.job.job.WrapperJob(name, job_id, status, priority, job_list, total_wallclock, num_processors, platform, as_config, hold)

Defines a wrapper from a package.

Calls Job constructor.

Parameters:
  • name (String) – Name of the Package
  • job_id (Integer) – Id of the first Job of the package
  • status (String) – ‘READY’ when coming from submit_ready_jobs()
  • priority (Integer) – 0 when coming from submit_ready_jobs()
  • job_list (List() of Job() objects) – List of jobs in the package
  • total_wallclock (String Formatted) – Wallclock of the package
  • num_processors (Integer) – Number of processors for the package
  • platform (Platform Object. e.g. EcPlatform()) – Platform object defined for the package
  • as_config (AutosubmitConfig object) – Autosubmit basic configuration object
class autosubmit.job.job_common.StatisticsSnippetBash

Class to handle the statistics snippet of a job. It contains header and tailer for local and remote jobs

class autosubmit.job.job_common.StatisticsSnippetEmpty

Class to handle the statistics snippet of a job. It contains header and footer for local and remote jobs

class autosubmit.job.job_common.StatisticsSnippetPython(version='2')

Class to handle the statistics snippet of a job. It contains header and tailer for local and remote jobs

class autosubmit.job.job_common.StatisticsSnippetR

Class to handle the statistics snippet of a job. It contains header and tailer for local and remote jobs

class autosubmit.job.job_common.Status

Class to handle the status of a job

class autosubmit.job.job_common.Type

Class to handle the status of a job

autosubmit.job.job_common.increase_wallclock_by_chunk(current, increase, chunk)

Receives the wallclock times an increases it according to a quantity times the number of the current chunk. The result cannot be larger than 48:00. If Chunk = 0 then no increment.

Parameters:
  • current (str) – WALLCLOCK HH:MM
  • increase (str) – WCHUNKINC HH:MM
  • chunk (int) – chunk number
Returns:

HH:MM wallclock

Return type:

str

autosubmit.job.job_common.parse_output_number(string_number)

Parses number in format 1.0K 1.0M 1.0G

Parameters:string_number (str) – String representation of number
Returns:number in float format
Return type:float
class autosubmit.job.job_list.JobList(expid, config, parser_factory, job_list_persistence)

Class to manage the list of jobs to be run by autosubmit

add_logs(logs)

add logs to the current job_list

Parameters:platform (HPCPlatform) – job platform
Returns:logs
Return type:dict(tuple)
backup_load()

Recreates an stored job list from the persistence

Returns:loaded job list object
Return type:JobList
backup_save()

Persists the job list

check_scripts(as_conf)

When we have created the scripts, all parameters should have been substituted. %PARAMETER% handlers not allowed

Parameters:as_conf (AutosubmitConfig) – experiment configuration
expid

Returns the experiment identifier

Returns:experiment’s identifier
Return type:str
generate(date_list, member_list, num_chunks, chunk_ini, parameters, date_format, default_retrials, default_job_type, wrapper_type=None, wrapper_jobs={}, new=True, notransitive=False, update_structure=False, run_only_members=[], show_log=True)

Creates all jobs needed for the current workflow

Parameters:
  • default_job_type (str) – default type for jobs
  • date_list (list) – start dates
  • member_list (list) – members
  • num_chunks (int) – number of chunks to run
  • chunk_ini (int) – the experiment will start by the given chunk
  • parameters (dict) – parameters for the jobs
  • date_format (str) – option to format dates
  • default_retrials (int) – default retrials for ech job
  • new (bool) – is it a new generation?
  • wrapper_type – Type of wrapper defined by the user in autosubmit_.conf [wrapper] section.
  • wrapper_jobs (String) – Job types defined in autosubmit_.conf [wrapper sections] to be wrapped.
get_active(platform=None, wrapper=False)

Returns a list of active jobs (In platforms queue + Ready)

Parameters:platform (HPCPlatform) – job platform
Returns:active jobs
Return type:list
get_all(platform=None, wrapper=False)

Returns a list of all jobs

Parameters:platform (HPCPlatform) – job platform
Returns:all jobs
Return type:list
get_chunk_list()

Get inner chunk list

Returns:chunk list
Return type:list
get_completed(platform=None, wrapper=False)

Returns a list of completed jobs

Parameters:platform (HPCPlatform) – job platform
Returns:completed jobs
Return type:list
get_date_list()

Get inner date list

Returns:date list
Return type:list
get_delayed(platform=None)

Returns a list of delayed jobs

Parameters:platform (HPCPlatform) – job platform
Returns:delayed jobs
Return type:list
get_failed(platform=None, wrapper=False)

Returns a list of failed jobs

Parameters:platform (HPCPlatform) – job platform
Returns:failed jobs
Return type:list
get_finished(platform=None, wrapper=False)

Returns a list of jobs finished (Completed, Failed)

Parameters:platform (HPCPlatform) – job platform
Returns:finished jobs
Return type:list
get_held_jobs(platform=None)

Returns a list of jobs in the platforms (Held)

Parameters:platform (HPCPlatform) – job platform
Returns:jobs in platforms
Return type:list
get_in_queue(platform=None, wrapper=False)

Returns a list of jobs in the platforms (Submitted, Running, Queuing, Unknown,Held)

Parameters:platform (HPCPlatform) – job platform
Returns:jobs in platforms
Return type:list
get_job_by_name(name)

Returns the job that its name matches parameter name

Parameters:name (str) – name to look for
Returns:found job
Return type:job
get_job_list()

Get inner job list

Returns:job list
Return type:list
get_job_names(lower_case=False)

Returns a list of all job names

Parameters:platform (HPCPlatform) – job platform
Returns:all jobs
Return type:list
Parameters:
  • select_jobs_by_name – job name
  • select_all_jobs_by_section – section name
  • filter_jobs_by_section – section, date , member? , chunk?
Returns:

jobs_list names

Return type:

list

get_jobs_by_section(section_list)

Returns the job that its name matches parameter section

Parameters:name – name to look for
Returns:found job
Return type:job
get_logs()

Returns a dict of logs by jobs_name jobs

Parameters:platform (HPCPlatform) – job platform
Returns:logs
Return type:dict(tuple)
get_member_list()

Get inner member list

Returns:member list
Return type:list
get_not_in_queue(platform=None, wrapper=False)

Returns a list of jobs NOT in the platforms (Ready, Waiting)

Parameters:platform (HPCPlatform) – job platform
Returns:jobs not in platforms
Return type:list
get_ordered_jobs_by_date_member(section)

Get the dictionary of jobs ordered according to wrapper’s expression divided by date and member

Returns:jobs ordered divided by date and member
Return type:dict
get_prepared(platform=None)

Returns a list of prepared jobs

Parameters:platform (HPCPlatform) – job platform
Returns:prepared jobs
Return type:list
get_queuing(platform=None, wrapper=False)

Returns a list of jobs queuing

Parameters:platform (HPCPlatform) – job platform
Returns:queuedjobs
Return type:list
get_ready(platform=None, hold=False, wrapper=False)

Returns a list of ready jobs

Parameters:platform (HPCPlatform) – job platform
Returns:ready jobs
Return type:list
get_running(platform=None, wrapper=False)

Returns a list of jobs running

Parameters:platform (HPCPlatform) – job platform
Returns:running jobs
Return type:list
get_skipped(platform=None)

Returns a list of skipped jobs

Parameters:platform (HPCPlatform) – job platform
Returns:skipped jobs
Return type:list
get_submitted(platform=None, hold=False, wrapper=False)

Returns a list of submitted jobs

Parameters:platform (HPCPlatform) – job platform
Returns:submitted jobs
Return type:list
get_suspended(platform=None, wrapper=False)

Returns a list of jobs on unknown state

Parameters:platform (HPCPlatform) – job platform
Returns:unknown state jobs
Return type:list
get_uncompleted(platform=None, wrapper=False)

Returns a list of completed jobs

Parameters:platform (HPCPlatform) – job platform
Returns:completed jobs
Return type:list
get_uncompleted_and_not_waiting(platform=None, wrapper=False)

Returns a list of completed jobs and waiting

Parameters:platform (HPCPlatform) – job platform
Returns:completed jobs
Return type:list
get_unknown(platform=None, wrapper=False)

Returns a list of jobs on unknown state

Parameters:platform (HPCPlatform) – job platform
Returns:unknown state jobs
Return type:list
get_unsubmitted(platform=None, wrapper=False)

Returns a list of unsummited jobs

Parameters:platform (HPCPlatform) – job platform
Returns:all jobs
Return type:list
get_waiting(platform=None, wrapper=False)

Returns a list of jobs waiting

Parameters:platform (HPCPlatform) – job platform
Returns:waiting jobs
Return type:list
get_waiting_remote_dependencies(platform_type='slurm')

Returns a list of jobs waiting on slurm scheduler

Parameters:platform (HPCPlatform) – job platform
Returns:waiting jobs
Return type:list
graph

Returns the graph

Returns:graph
Return type:networkx graph
load()

Recreates an stored job list from the persistence

Returns:loaded job list object
Return type:JobList
static load_file(filename)

Recreates an stored joblist from the pickle file

Parameters:filename (str) – pickle file to load
Returns:loaded joblist object
Return type:JobList
parameters

List of parameters common to all jobs :return: parameters :rtype: dict

print_with_status(statusChange=None, nocolor=False, existingList=None)

Returns the string representation of the dependency tree of the Job List

Parameters:
  • statusChange (List of strings) – List of changes in the list, supplied in set status
  • nocolor (Boolean) – True if the result should not include color codes
  • existingList (List of Job Objects) – External List of Jobs that will be printed, this excludes the inner list of jobs.
Returns:

String representation

Return type:

String

remove_rerun_only_jobs(notransitive=False)

Removes all jobs to be run only in reruns

rerun(job_list_unparsed, monitor=False)

Updates job list to rerun the jobs specified by a job list

Parameters:chunk_list (str) – list of chunks to rerun
static retrieve_packages(BasicConfig, expid, current_jobs=None)

Retrieves dictionaries that map the collection of packages in the experiment

Parameters:
  • BasicConfig (Configuration Object) – Basic configuration
  • expid (String) – Experiment Id
  • current_jobs (list) – list of names of current jobs
Returns:

job to package, package to jobs, package to package_id, package to symbol

Return type:

Dictionary(Job Object, Package), Dictionary(Package, List of Job Objects), Dictionary(String, String), Dictionary(String, String)

static retrieve_times(status_code, name, tmp_path, make_exception=False, job_times=None, seconds=False, job_data_collection=None)

Retrieve job timestamps from database. :param status_code: Code of the Status of the job :type status_code: Integer :param name: Name of the job :type name: String :param tmp_path: Path to the tmp folder of the experiment :type tmp_path: String :param make_exception: flag for testing purposes :type make_exception: Boolean :param job_times: Detail from as_times.job_times for the experiment :type job_times: Dictionary Key: job name, Value: 5-tuple (submit time, start time, finish time, status, detail id) :return: minutes the job has been queuing, minutes the job has been running, and the text that represents it :rtype: int, int, str

save()

Persists the job list

sort_by_id()

Returns a list of jobs sorted by id

Returns:jobs sorted by ID
Return type:list
sort_by_name()

Returns a list of jobs sorted by name

Returns:jobs sorted by name
Return type:list
sort_by_status()

Returns a list of jobs sorted by status

Returns:job sorted by status
Return type:list
sort_by_type()

Returns a list of jobs sorted by type

Returns:job sorted by type
Return type:list
update_from_file(store_change=True)

Updates jobs list on the fly from and update file :param store_change: if True, renames the update file to avoid reloading it at the next iteration

update_genealogy(new=True, notransitive=False, update_structure=False)

When we have created the job list, every type of job is created. Update genealogy remove jobs that have no templates :param new: if it is a new job list or not :type new: bool

update_list(as_conf, store_change=True, fromSetStatus=False, submitter=None, first_time=False)

Updates job list, resetting failed jobs and changing to READY all WAITING jobs with all parents COMPLETED

Parameters:as_conf (AutosubmitConfig) – autosubmit config object
Returns:True if job status were modified, False otherwise
Return type:bool