autosubmit.job¶
Main module for Autosubmit. Only contains an interface class to all functionality implemented on Autosubmit
-
class
autosubmit.job.job.
Job
(name, job_id, status, priority)¶ Class to handle all the tasks with Jobs at HPC. A job is created by default with a name, a jobid, a status and a type. It can have children and parents. The inheritance reflects the dependency between jobs. If Job2 must wait until Job1 is completed then Job2 is a child of Job1. Inversely Job1 is a parent of Job2
Parameters: - name (str) – job’s name
- jobid (int) – job’s identifier
- status (Status) – job initial status
- priority (int) – job’s priority
-
add_parent
(*parents)¶ Add parents for the job. It also adds current job as a child for all the new parents
Parameters: parents (*Job) – job’s parents to add
-
check_completion
(default_status=-1, over_wallclock=False)¶ Check the presence of COMPLETED file. Change status to COMPLETED if COMPLETED file exists and to FAILED otherwise. :param default_status: status to set if job is not completed. By default is FAILED :type default_status: Status
-
check_end_time
()¶ Returns end time from stat file
Returns: date and time Return type: str
-
check_retrials_end_time
()¶ Returns list of end datetime for retrials from total stats file
Returns: date and time Return type: list[int]
-
check_retrials_start_time
()¶ Returns list of start datetime for retrials from total stats file
Returns: date and time Return type: list[int]
-
check_retrials_submit_time
()¶ Returns list of submit datetime for retrials from total stats file
Returns: date and time Return type: list[int]
-
check_running_after
(date_limit)¶ Checks if the job was running after the given date :param date_limit: reference date :type date_limit: datetime.datetime :return: True if job was running after the given date, false otherwise :rtype: bool
-
check_script
(as_conf, parameters, show_logs=False)¶ Checks if script is well formed
Parameters: - parameters (dict) – script parameters
- as_conf (AutosubmitConfig) – configuration file
- show_logs (Bool) – Display output
Returns: true if not problem has been detected, false otherwise
Return type: bool
-
check_start_time
()¶ Returns job’s start time
Returns: start time Return type: str
-
check_started_after
(date_limit)¶ Checks if the job started after the given date :param date_limit: reference date :type date_limit: datetime.datetime :return: True if job started after the given date, false otherwise :rtype: bool
-
children
¶ Returns a list containing all children of the job
Returns: child jobs Return type: set
-
children_names_str
¶ Comma separated list of children’s names
-
compare_by_id
(other)¶ Compare jobs by ID
Parameters: other (Job) – job to compare Returns: comparison result Return type: bool
-
compare_by_name
(other)¶ Compare jobs by name
Parameters: other (Job) – job to compare Returns: comparison result Return type: bool
-
compare_by_status
(other)¶ Compare jobs by status value
Parameters: other (Job) – job to compare Returns: comparison result Return type: bool
-
create_script
(as_conf)¶ Creates script file to be run for the job
Parameters: as_conf (AutosubmitConfig) – configuration object Returns: script’s filename Return type: str
-
get_last_retrials
()¶ Returns the retrials of a job, including the last COMPLETED run. The selection stops, and does not include, when the previous COMPLETED job is located or the list of registers is exhausted.
Returns: list of list of dates of retrial [submit, start, finish] in datetime format Return type: list of list
-
has_children
()¶ Returns true if job has any children, else return false
Returns: true if job has any children, otherwise return false Return type: bool
-
has_parents
()¶ Returns true if job has any parents, else return false
Returns: true if job has any parent, otherwise return false Return type: bool
-
inc_fail_count
()¶ Increments fail count
-
static
is_a_completed_retrial
(fields)¶ Returns true only if there 4 fields: submit start finish status, and status equals COMPLETED.
-
is_ancestor
(job)¶ Check if the given job is an ancestor :param job: job to be checked if is an ancestor :return: True if job is an ancestor, false otherwise :rtype bool
-
is_over_wallclock
(start_time, wallclock)¶ Check if the job is over the wallclock time, it is an alternative method to avoid platform issues :param start_time: :param wallclock: :return:
-
is_parent
(job)¶ Check if the given job is a parent :param job: job to be checked if is a parent :return: True if job is a parent, false otherwise :rtype bool
-
log_job
()¶ Prints job information in log
-
long_name
¶ Job’s long name. If not setted, returns name
Returns: long name Return type: str
-
parents
¶ Returns parent jobs list
Returns: parent jobs Return type: set
-
platform
¶ Returns the platform to be used by the job. Chooses between serial and parallel platforms
:return HPCPlatform object for the job to use :rtype: HPCPlatform
-
print_job
()¶ Prints debug information about the job
-
print_parameters
()¶ Print sjob parameters in log
-
queue
¶ Returns the queue to be used by the job. Chooses between serial and parallel platforms
:return HPCPlatform object for the job to use :rtype: HPCPlatform
-
read_header_tailer_script
(script_path, as_conf)¶ Opens and reads a script. If it is not a BASH script it will fail :(
Will strip away the line with the hash bang (#!)
Parameters: - script_path (string) – relative to the experiment directory path to the script
- as_conf (config) – Autosubmit configuration file
-
remove_redundant_parents
()¶ Checks if a parent is also an ancestor, if true, removes the link in both directions. Useful to remove redundant dependencies.
-
shape
¶ Returns the shape of the job. Chooses between serial and parallel platforms
:return HPCPlatform object for the job to use :rtype: HPCPlatform
-
status_str
¶ String representation of the current status
-
total_processors
¶ Number of processors requested by job. Reduces ‘:’ separated format if necessary.
-
update_content
(as_conf)¶ Create the script content to be run for the job
Parameters: as_conf (config) – config Returns: script code Return type: str
-
update_parameters
(as_conf, parameters, default_parameters={'M': '%M%', 'M_': '%M_%', 'Y': '%Y%', 'Y_': '%Y_%', 'd': '%d%', 'd_': '%d_%', 'm': '%m%', 'm_': '%m_%'})¶ Refresh parameters value
Parameters: - default_parameters (dict) –
- as_conf (AutosubmitConfig) –
- parameters (dict) –
-
update_status
(copy_remote_logs=False, failed_file=False)¶ Updates job status, checking COMPLETED file if needed
Parameters: - new_status – job status retrieved from the platform
- copy_remote_logs – should copy remote logs when finished?
Type:
-
write_end_time
(completed, enabled=False)¶ Writes ends date and time to TOTAL_STATS file :param completed: True if job was completed successfully, False otherwise :type completed: bool
-
write_start_time
(enabled=False)¶ Writes start date and time to TOTAL_STATS file :return: True if succesful, False otherwise :rtype: bool
-
write_submit_time
(enabled=False, hold=False)¶ Writes submit date and time to TOTAL_STATS file. It doesn’t write if hold == True.
-
write_total_stat_by_retries
(total_stats, first_retrial=False)¶ Writes all data to TOTAL_STATS file :param total_stats: data gathered by the wrapper :type completed: str
-
class
autosubmit.job.job.
WrapperJob
(name, job_id, status, priority, job_list, total_wallclock, num_processors, platform, as_config, hold)¶ Defines a wrapper from a package.
Calls Job constructor.
Parameters: - name (String) – Name of the Package
- job_id (Integer) – Id of the first Job of the package
- status (String) – ‘READY’ when coming from submit_ready_jobs()
- priority (Integer) – 0 when coming from submit_ready_jobs()
- job_list (List() of Job() objects) – List of jobs in the package
- total_wallclock (String Formatted) – Wallclock of the package
- num_processors (Integer) – Number of processors for the package
- platform (Platform Object. e.g. EcPlatform()) – Platform object defined for the package
- as_config (AutosubmitConfig object) – Autosubmit basic configuration object
-
class
autosubmit.job.job_common.
StatisticsSnippetBash
¶ Class to handle the statistics snippet of a job. It contains header and tailer for local and remote jobs
-
class
autosubmit.job.job_common.
StatisticsSnippetEmpty
¶ Class to handle the statistics snippet of a job. It contains header and footer for local and remote jobs
-
class
autosubmit.job.job_common.
StatisticsSnippetPython
(version='2')¶ Class to handle the statistics snippet of a job. It contains header and tailer for local and remote jobs
-
class
autosubmit.job.job_common.
StatisticsSnippetR
¶ Class to handle the statistics snippet of a job. It contains header and tailer for local and remote jobs
-
class
autosubmit.job.job_common.
Status
¶ Class to handle the status of a job
-
class
autosubmit.job.job_common.
Type
¶ Class to handle the status of a job
-
autosubmit.job.job_common.
increase_wallclock_by_chunk
(current, increase, chunk)¶ Receives the wallclock times an increases it according to a quantity times the number of the current chunk. The result cannot be larger than 48:00. If Chunk = 0 then no increment.
Parameters: - current (str) – WALLCLOCK HH:MM
- increase (str) – WCHUNKINC HH:MM
- chunk (int) – chunk number
Returns: HH:MM wallclock
Return type: str
-
autosubmit.job.job_common.
parse_output_number
(string_number)¶ Parses number in format 1.0K 1.0M 1.0G
Parameters: string_number (str) – String representation of number Returns: number in float format Return type: float
-
class
autosubmit.job.job_list.
JobList
(expid, config, parser_factory, job_list_persistence)¶ Class to manage the list of jobs to be run by autosubmit
-
add_logs
(logs)¶ add logs to the current job_list
Parameters: platform (HPCPlatform) – job platform Returns: logs Return type: dict(tuple)
-
backup_load
()¶ Recreates an stored job list from the persistence
Returns: loaded job list object Return type: JobList
-
backup_save
()¶ Persists the job list
-
check_scripts
(as_conf)¶ When we have created the scripts, all parameters should have been substituted. %PARAMETER% handlers not allowed
Parameters: as_conf (AutosubmitConfig) – experiment configuration
-
expid
¶ Returns the experiment identifier
Returns: experiment’s identifier Return type: str
-
generate
(date_list, member_list, num_chunks, chunk_ini, parameters, date_format, default_retrials, default_job_type, wrapper_type=None, wrapper_jobs={}, new=True, notransitive=False, update_structure=False, run_only_members=[], show_log=True)¶ Creates all jobs needed for the current workflow
Parameters: - default_job_type (str) – default type for jobs
- date_list (list) – start dates
- member_list (list) – members
- num_chunks (int) – number of chunks to run
- chunk_ini (int) – the experiment will start by the given chunk
- parameters (dict) – parameters for the jobs
- date_format (str) – option to format dates
- default_retrials (int) – default retrials for ech job
- new (bool) – is it a new generation?
- wrapper_type – Type of wrapper defined by the user in autosubmit_.conf [wrapper] section.
- wrapper_jobs (String) – Job types defined in autosubmit_.conf [wrapper sections] to be wrapped.
-
get_active
(platform=None, wrapper=False)¶ Returns a list of active jobs (In platforms queue + Ready)
Parameters: platform (HPCPlatform) – job platform Returns: active jobs Return type: list
-
get_all
(platform=None, wrapper=False)¶ Returns a list of all jobs
Parameters: platform (HPCPlatform) – job platform Returns: all jobs Return type: list
-
get_chunk_list
()¶ Get inner chunk list
Returns: chunk list Return type: list
-
get_completed
(platform=None, wrapper=False)¶ Returns a list of completed jobs
Parameters: platform (HPCPlatform) – job platform Returns: completed jobs Return type: list
-
get_date_list
()¶ Get inner date list
Returns: date list Return type: list
-
get_delayed
(platform=None)¶ Returns a list of delayed jobs
Parameters: platform (HPCPlatform) – job platform Returns: delayed jobs Return type: list
-
get_failed
(platform=None, wrapper=False)¶ Returns a list of failed jobs
Parameters: platform (HPCPlatform) – job platform Returns: failed jobs Return type: list
-
get_finished
(platform=None, wrapper=False)¶ Returns a list of jobs finished (Completed, Failed)
Parameters: platform (HPCPlatform) – job platform Returns: finished jobs Return type: list
-
get_held_jobs
(platform=None)¶ Returns a list of jobs in the platforms (Held)
Parameters: platform (HPCPlatform) – job platform Returns: jobs in platforms Return type: list
-
get_in_queue
(platform=None, wrapper=False)¶ Returns a list of jobs in the platforms (Submitted, Running, Queuing, Unknown,Held)
Parameters: platform (HPCPlatform) – job platform Returns: jobs in platforms Return type: list
-
get_job_by_name
(name)¶ Returns the job that its name matches parameter name
Parameters: name (str) – name to look for Returns: found job Return type: job
-
get_job_list
()¶ Get inner job list
Returns: job list Return type: list
-
get_job_names
(lower_case=False)¶ Returns a list of all job names
Parameters: platform (HPCPlatform) – job platform Returns: all jobs Return type: list
Parameters: - select_jobs_by_name – job name
- select_all_jobs_by_section – section name
- filter_jobs_by_section – section, date , member? , chunk?
Returns: jobs_list names
Return type: list
-
get_jobs_by_section
(section_list)¶ Returns the job that its name matches parameter section
Parameters: name – name to look for Returns: found job Return type: job
-
get_logs
()¶ Returns a dict of logs by jobs_name jobs
Parameters: platform (HPCPlatform) – job platform Returns: logs Return type: dict(tuple)
-
get_member_list
()¶ Get inner member list
Returns: member list Return type: list
-
get_not_in_queue
(platform=None, wrapper=False)¶ Returns a list of jobs NOT in the platforms (Ready, Waiting)
Parameters: platform (HPCPlatform) – job platform Returns: jobs not in platforms Return type: list
-
get_ordered_jobs_by_date_member
(section)¶ Get the dictionary of jobs ordered according to wrapper’s expression divided by date and member
Returns: jobs ordered divided by date and member Return type: dict
-
get_prepared
(platform=None)¶ Returns a list of prepared jobs
Parameters: platform (HPCPlatform) – job platform Returns: prepared jobs Return type: list
-
get_queuing
(platform=None, wrapper=False)¶ Returns a list of jobs queuing
Parameters: platform (HPCPlatform) – job platform Returns: queuedjobs Return type: list
-
get_ready
(platform=None, hold=False, wrapper=False)¶ Returns a list of ready jobs
Parameters: platform (HPCPlatform) – job platform Returns: ready jobs Return type: list
-
get_running
(platform=None, wrapper=False)¶ Returns a list of jobs running
Parameters: platform (HPCPlatform) – job platform Returns: running jobs Return type: list
-
get_skipped
(platform=None)¶ Returns a list of skipped jobs
Parameters: platform (HPCPlatform) – job platform Returns: skipped jobs Return type: list
-
get_submitted
(platform=None, hold=False, wrapper=False)¶ Returns a list of submitted jobs
Parameters: platform (HPCPlatform) – job platform Returns: submitted jobs Return type: list
-
get_suspended
(platform=None, wrapper=False)¶ Returns a list of jobs on unknown state
Parameters: platform (HPCPlatform) – job platform Returns: unknown state jobs Return type: list
-
get_uncompleted
(platform=None, wrapper=False)¶ Returns a list of completed jobs
Parameters: platform (HPCPlatform) – job platform Returns: completed jobs Return type: list
-
get_uncompleted_and_not_waiting
(platform=None, wrapper=False)¶ Returns a list of completed jobs and waiting
Parameters: platform (HPCPlatform) – job platform Returns: completed jobs Return type: list
-
get_unknown
(platform=None, wrapper=False)¶ Returns a list of jobs on unknown state
Parameters: platform (HPCPlatform) – job platform Returns: unknown state jobs Return type: list
-
get_unsubmitted
(platform=None, wrapper=False)¶ Returns a list of unsummited jobs
Parameters: platform (HPCPlatform) – job platform Returns: all jobs Return type: list
-
get_waiting
(platform=None, wrapper=False)¶ Returns a list of jobs waiting
Parameters: platform (HPCPlatform) – job platform Returns: waiting jobs Return type: list
-
get_waiting_remote_dependencies
(platform_type='slurm')¶ Returns a list of jobs waiting on slurm scheduler
Parameters: platform (HPCPlatform) – job platform Returns: waiting jobs Return type: list
-
graph
¶ Returns the graph
Returns: graph Return type: networkx graph
-
load
()¶ Recreates an stored job list from the persistence
Returns: loaded job list object Return type: JobList
-
static
load_file
(filename)¶ Recreates an stored joblist from the pickle file
Parameters: filename (str) – pickle file to load Returns: loaded joblist object Return type: JobList
-
parameters
¶ List of parameters common to all jobs :return: parameters :rtype: dict
-
print_with_status
(statusChange=None, nocolor=False, existingList=None)¶ Returns the string representation of the dependency tree of the Job List
Parameters: - statusChange (List of strings) – List of changes in the list, supplied in set status
- nocolor (Boolean) – True if the result should not include color codes
- existingList (List of Job Objects) – External List of Jobs that will be printed, this excludes the inner list of jobs.
Returns: String representation
Return type: String
-
remove_rerun_only_jobs
(notransitive=False)¶ Removes all jobs to be run only in reruns
-
rerun
(job_list_unparsed, monitor=False)¶ Updates job list to rerun the jobs specified by a job list
Parameters: chunk_list (str) – list of chunks to rerun
-
static
retrieve_packages
(BasicConfig, expid, current_jobs=None)¶ Retrieves dictionaries that map the collection of packages in the experiment
Parameters: - BasicConfig (Configuration Object) – Basic configuration
- expid (String) – Experiment Id
- current_jobs (list) – list of names of current jobs
Returns: job to package, package to jobs, package to package_id, package to symbol
Return type: Dictionary(Job Object, Package), Dictionary(Package, List of Job Objects), Dictionary(String, String), Dictionary(String, String)
-
static
retrieve_times
(status_code, name, tmp_path, make_exception=False, job_times=None, seconds=False, job_data_collection=None)¶ Retrieve job timestamps from database. :param status_code: Code of the Status of the job :type status_code: Integer :param name: Name of the job :type name: String :param tmp_path: Path to the tmp folder of the experiment :type tmp_path: String :param make_exception: flag for testing purposes :type make_exception: Boolean :param job_times: Detail from as_times.job_times for the experiment :type job_times: Dictionary Key: job name, Value: 5-tuple (submit time, start time, finish time, status, detail id) :return: minutes the job has been queuing, minutes the job has been running, and the text that represents it :rtype: int, int, str
-
save
()¶ Persists the job list
-
sort_by_id
()¶ Returns a list of jobs sorted by id
Returns: jobs sorted by ID Return type: list
-
sort_by_name
()¶ Returns a list of jobs sorted by name
Returns: jobs sorted by name Return type: list
-
sort_by_status
()¶ Returns a list of jobs sorted by status
Returns: job sorted by status Return type: list
-
sort_by_type
()¶ Returns a list of jobs sorted by type
Returns: job sorted by type Return type: list
-
update_from_file
(store_change=True)¶ Updates jobs list on the fly from and update file :param store_change: if True, renames the update file to avoid reloading it at the next iteration
-
update_genealogy
(new=True, notransitive=False, update_structure=False)¶ When we have created the job list, every type of job is created. Update genealogy remove jobs that have no templates :param new: if it is a new job list or not :type new: bool
-
update_list
(as_conf, store_change=True, fromSetStatus=False, submitter=None, first_time=False)¶ Updates job list, resetting failed jobs and changing to READY all WAITING jobs with all parents COMPLETED
Parameters: as_conf (AutosubmitConfig) – autosubmit config object Returns: True if job status were modified, False otherwise Return type: bool
-