ATLAS Offline Software
Public Member Functions | Static Public Member Functions | Public Attributes | Protected Member Functions | Static Protected Attributes | Private Member Functions | Static Private Member Functions | Private Attributes | List of all members
EL::SlurmDriver Class Referencefinal

a Driver for running on SLURM batch systems More...

#include <SlurmDriver.h>

Inheritance diagram for EL::SlurmDriver:
Collaboration diagram for EL::SlurmDriver:

Public Member Functions

void testInvariant () const
 effects: test the invariant of this object guarantee: no-fail More...
 
 SlurmDriver ()
 effects: standard default constructor guarantee: strong failures: low level errors I More...
 
void SetJobName (std::string job_name)
 
void SetAccount (std::string account)
 
void SetPartition (std::string partition)
 
void SetRunTime (std::string run_time)
 
void SetMemory (std::string memory)
 
void SetConstrain (std::string constraint)
 
SH::MetaObjectoptions ()
 the list of options to jobs with this driver More...
 
const SH::MetaObjectoptions () const
 
std::string submit (const Job &job, const std::string &location) const
 submit the given job with the given output location and wait for it to finish More...
 
std::string submitOnly (const Job &job, const std::string &location) const
 submit the given job with the given output location and return immediately More...
 

Static Public Member Functions

static void resubmit (const std::string &location, const std::string &option)
 resubmit all failed sub-jobs for the job in the given location More...
 
static bool retrieve (const std::string &location)
 retrieve all the output for the job in the given location More...
 
static bool wait (const std::string &location, unsigned time=60)
 retrieve all the output for the job in the given location and wait until it is finished completely. More...
 
static void updateLocation (const std::string &location)
 update the internal location of files, after moving the submission directory More...
 
static void mergedOutputSave (Detail::ManagerData &data)
 create and save a sample handler assuming we created all the merged files at the requested locations More...
 
static void diskOutputSave (Detail::ManagerData &data)
 make the output sample handler for the given job or stream from the information stored in the histogram files. More...
 

Public Attributes

std::string m_job_name
 
std::string m_account
 
std::string m_partition
 
std::string m_run_time
 
std::string m_memory
 
std::string m_constraint
 
bool m_b_job_name
 
bool m_b_account
 
bool m_b_run_time
 
std::string shellInit
 description: these shell commands are run verbatim on each worker node before execution More...
 

Protected Member Functions

virtual ::StatusCode doManagerStep (Detail::ManagerData &data) const override
 
 ClassDef (SlurmDriver, 1)
 

Static Protected Attributes

static bool abortRetrieve
 this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible. More...
 

Private Member Functions

std::string defaultReleaseSetup (const Detail::ManagerData &data) const
 the code for setting up the release More...
 
void makeScript (Detail::ManagerData &data, std::size_t njobs) const
 effects: create the run script to be used guarantee: basic, may create a partial script failures: out of memory II failures: i/o errors More...
 

Static Private Member Functions

static bool mergeHists (Detail::ManagerData &data)
 effects: merge the fetched histograms returns: wether all histograms have been fetched guarantee: strong failures: out of memory II failures: i/o errors More...
 

Private Attributes

SH::MetaObject m_options
 members directly corresponding to accessors More...
 

Detailed Description

a Driver for running on SLURM batch systems

Definition at line 22 of file SlurmDriver.h.

Constructor & Destructor Documentation

◆ SlurmDriver()

EL::SlurmDriver::SlurmDriver ( )

effects: standard default constructor guarantee: strong failures: low level errors I

Member Function Documentation

◆ ClassDef()

EL::SlurmDriver::ClassDef ( SlurmDriver  ,
 
)
protected

◆ defaultReleaseSetup()

std::string EL::BatchDriver::defaultReleaseSetup ( const Detail::ManagerData data) const
privateinherited

the code for setting up the release

Guarantee
strong
Failures
out of memory II
failed to read environment variables

◆ diskOutputSave()

static void EL::Driver::diskOutputSave ( Detail::ManagerData data)
staticinherited

make the output sample handler for the given job or stream from the information stored in the histogram files.

This is optional, but it is convenient for drivers that use (conventional) writers

Guarantee
basic
Failures
out of memory II
i/o errors

◆ doManagerStep()

virtual ::StatusCode EL::SlurmDriver::doManagerStep ( Detail::ManagerData data) const
overrideprotected

◆ makeScript()

void EL::BatchDriver::makeScript ( Detail::ManagerData data,
std::size_t  njobs 
) const
privateinherited

effects: create the run script to be used guarantee: basic, may create a partial script failures: out of memory II failures: i/o errors

◆ mergedOutputSave()

static void EL::Driver::mergedOutputSave ( Detail::ManagerData data)
staticinherited

create and save a sample handler assuming we created all the merged files at the requested locations

This is optional, but it is convenient for drivers that want to keep their outputs locally.

Guarantee
basic
Failures
out of memory II
i/o errors

◆ mergeHists()

static bool EL::BatchDriver::mergeHists ( Detail::ManagerData data)
staticprivateinherited

effects: merge the fetched histograms returns: wether all histograms have been fetched guarantee: strong failures: out of memory II failures: i/o errors

◆ options() [1/2]

SH::MetaObject* EL::Driver::options ( )
inherited

the list of options to jobs with this driver

Guarantee
no-fail
Postcondition
result != 0

◆ options() [2/2]

const SH::MetaObject* EL::Driver::options ( ) const
inherited

◆ resubmit()

static void EL::Driver::resubmit ( const std::string &  location,
const std::string &  option 
)
staticinherited

resubmit all failed sub-jobs for the job in the given location

\parm option driver-specific option string selecting which jobs to resubmit (and how)

Guarantee
basic, may partially resubmit
Failures
out of memory III
job resubmission errors
job can't be read
job was made with different driver

◆ retrieve()

static bool EL::Driver::retrieve ( const std::string &  location)
staticinherited

retrieve all the output for the job in the given location

While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.

Returns
whether the job completed successfully
Guarantee
basic, may partially retrieve
Failures
out of memory III
job failures
job can't be read
job was made with different driver

◆ SetAccount()

void EL::SlurmDriver::SetAccount ( std::string  account)

◆ SetConstrain()

void EL::SlurmDriver::SetConstrain ( std::string  constraint)

◆ SetJobName()

void EL::SlurmDriver::SetJobName ( std::string  job_name)

◆ SetMemory()

void EL::SlurmDriver::SetMemory ( std::string  memory)

◆ SetPartition()

void EL::SlurmDriver::SetPartition ( std::string  partition)

◆ SetRunTime()

void EL::SlurmDriver::SetRunTime ( std::string  run_time)

◆ submit()

std::string EL::Driver::submit ( const Job job,
const std::string &  location 
) const
inherited

submit the given job with the given output location and wait for it to finish

This is mostly for small jobs and backward compatibility. For longer jobs use submitOnly instead.

Returns
The actual location of the submit directory, if the job was configured to generate a unique directory.
Guarantee
basic, may partially submit
Failures
out of memory II
Failures
can't create directory at location
submission errors

◆ submitOnly()

std::string EL::Driver::submitOnly ( const Job job,
const std::string &  location 
) const
inherited

submit the given job with the given output location and return immediately

This method allows you to submit jobs to your local batch system, log out and at a later point log back in again.

Returns
The actual location of the submit directory, if the job was configured to generate a unique directory.
Guarantee
basic, may partially submit
Failures
out of memory II
can't create directory at location
submission errors \warn not all drivers support this. some will do all their work in the submit function. \warn you normally need to call wait() or retrieve() before you can use the output.

◆ testInvariant()

void EL::SlurmDriver::testInvariant ( ) const

effects: test the invariant of this object guarantee: no-fail

◆ updateLocation()

static void EL::Driver::updateLocation ( const std::string &  location)
staticinherited

update the internal location of files, after moving the submission directory

Guarantee
basic, may update partially
Failures
out of memory II \warn only move the submission directory after all your jobs are finished, or the results will be unpredictable

◆ wait()

static bool EL::Driver::wait ( const std::string &  location,
unsigned  time = 60 
)
staticinherited

retrieve all the output for the job in the given location and wait until it is finished completely.

poll the output every time seconds.

While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.

Typically sleeping for 60 seconds is an appropriate interval, but if it doesn't work for you, you can change it here.

Guarantee
basic, may partially retrieve
Failures
out of memory III
job failures
job can't be read
job was made with different driver

Member Data Documentation

◆ abortRetrieve

bool EL::Driver::abortRetrieve
staticprotectedinherited

this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible.

drivers can use it to abort long running operations in doRetrieve before completion

Definition at line 212 of file Driver.h.

◆ m_account

std::string EL::SlurmDriver::m_account

Definition at line 49 of file SlurmDriver.h.

◆ m_b_account

bool EL::SlurmDriver::m_b_account

Definition at line 56 of file SlurmDriver.h.

◆ m_b_job_name

bool EL::SlurmDriver::m_b_job_name

Definition at line 55 of file SlurmDriver.h.

◆ m_b_run_time

bool EL::SlurmDriver::m_b_run_time

Definition at line 57 of file SlurmDriver.h.

◆ m_constraint

std::string EL::SlurmDriver::m_constraint

Definition at line 53 of file SlurmDriver.h.

◆ m_job_name

std::string EL::SlurmDriver::m_job_name

Definition at line 48 of file SlurmDriver.h.

◆ m_memory

std::string EL::SlurmDriver::m_memory

Definition at line 52 of file SlurmDriver.h.

◆ m_options

SH::MetaObject EL::Driver::m_options
privateinherited

members directly corresponding to accessors

Definition at line 233 of file Driver.h.

◆ m_partition

std::string EL::SlurmDriver::m_partition

Definition at line 50 of file SlurmDriver.h.

◆ m_run_time

std::string EL::SlurmDriver::m_run_time

Definition at line 51 of file SlurmDriver.h.

◆ shellInit

std::string EL::BatchDriver::shellInit
inherited

description: these shell commands are run verbatim on each worker node before execution

Definition at line 45 of file BatchDriver.h.


The documentation for this class was generated from the following file: