ATLAS Offline Software
Public Member Functions | Static Public Member Functions | Public Attributes | Protected Member Functions | Static Protected Attributes | Private Member Functions | Static Private Member Functions | Private Attributes | List of all members
EL::ExecDriver Class Referencefinal

a Driver for running batch jobs locally as a new process More...

#include <ExecDriver.h>

Inheritance diagram for EL::ExecDriver:
Collaboration diagram for EL::ExecDriver:

Public Member Functions

void testInvariant () const
 effects: test the invariant of this object guarantee: no-fail More...
 
 ExecDriver ()
 effects: standard default constructor guarantee: strong failures: low level errors I More...
 
SH::MetaObjectoptions ()
 the list of options to jobs with this driver More...
 
const SH::MetaObjectoptions () const
 
std::string submit (const Job &job, const std::string &location) const
 submit the given job with the given output location and wait for it to finish More...
 
std::string submitOnly (const Job &job, const std::string &location) const
 submit the given job with the given output location and return immediately More...
 

Static Public Member Functions

static void resubmit (const std::string &location, const std::string &option)
 resubmit all failed sub-jobs for the job in the given location More...
 
static bool retrieve (const std::string &location)
 retrieve all the output for the job in the given location More...
 
static bool wait (const std::string &location, unsigned time=60)
 retrieve all the output for the job in the given location and wait until it is finished completely. More...
 
static void updateLocation (const std::string &location)
 update the internal location of files, after moving the submission directory More...
 
static void mergedOutputSave (Detail::ManagerData &data)
 create and save a sample handler assuming we created all the merged files at the requested locations More...
 
static void diskOutputSave (Detail::ManagerData &data)
 make the output sample handler for the given job or stream from the information stored in the histogram files. More...
 

Public Attributes

std::string shellInit
 description: these shell commands are run verbatim on each worker node before execution More...
 

Protected Member Functions

virtual ::StatusCode doManagerStep (Detail::ManagerData &data) const override
 
 ClassDef (ExecDriver, 1)
 

Static Protected Attributes

static bool abortRetrieve
 this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible. More...
 

Private Member Functions

std::string defaultReleaseSetup (const Detail::ManagerData &data) const
 the code for setting up the release More...
 
void makeScript (Detail::ManagerData &data, std::size_t njobs) const
 effects: create the run script to be used guarantee: basic, may create a partial script failures: out of memory II failures: i/o errors More...
 

Static Private Member Functions

static bool mergeHists (Detail::ManagerData &data)
 effects: merge the fetched histograms returns: wether all histograms have been fetched guarantee: strong failures: out of memory II failures: i/o errors More...
 

Private Attributes

SH::MetaObject m_options
 members directly corresponding to accessors More...
 

Detailed Description

a Driver for running batch jobs locally as a new process

The main purpose here is to get rid of the memory overhead from configuration. With all the necessary ROOT dictionaries loaded into python we are consuming more than 1GB of memory. So the strategy is to run the configuration in one process, and then replace the existing process with a new process for running the job.

Definition at line 26 of file ExecDriver.h.

Constructor & Destructor Documentation

◆ ExecDriver()

EL::ExecDriver::ExecDriver ( )

effects: standard default constructor guarantee: strong failures: low level errors I

Member Function Documentation

◆ ClassDef()

EL::ExecDriver::ClassDef ( ExecDriver  ,
 
)
protected

◆ defaultReleaseSetup()

std::string EL::BatchDriver::defaultReleaseSetup ( const Detail::ManagerData data) const
privateinherited

the code for setting up the release

Guarantee
strong
Failures
out of memory II
failed to read environment variables

◆ diskOutputSave()

static void EL::Driver::diskOutputSave ( Detail::ManagerData data)
staticinherited

make the output sample handler for the given job or stream from the information stored in the histogram files.

This is optional, but it is convenient for drivers that use (conventional) writers

Guarantee
basic
Failures
out of memory II
i/o errors

◆ doManagerStep()

virtual ::StatusCode EL::ExecDriver::doManagerStep ( Detail::ManagerData data) const
overrideprotected

◆ makeScript()

void EL::BatchDriver::makeScript ( Detail::ManagerData data,
std::size_t  njobs 
) const
privateinherited

effects: create the run script to be used guarantee: basic, may create a partial script failures: out of memory II failures: i/o errors

◆ mergedOutputSave()

static void EL::Driver::mergedOutputSave ( Detail::ManagerData data)
staticinherited

create and save a sample handler assuming we created all the merged files at the requested locations

This is optional, but it is convenient for drivers that want to keep their outputs locally.

Guarantee
basic
Failures
out of memory II
i/o errors

◆ mergeHists()

static bool EL::BatchDriver::mergeHists ( Detail::ManagerData data)
staticprivateinherited

effects: merge the fetched histograms returns: wether all histograms have been fetched guarantee: strong failures: out of memory II failures: i/o errors

◆ options() [1/2]

SH::MetaObject* EL::Driver::options ( )
inherited

the list of options to jobs with this driver

Guarantee
no-fail
Postcondition
result != 0

◆ options() [2/2]

const SH::MetaObject* EL::Driver::options ( ) const
inherited

◆ resubmit()

static void EL::Driver::resubmit ( const std::string &  location,
const std::string &  option 
)
staticinherited

resubmit all failed sub-jobs for the job in the given location

\parm option driver-specific option string selecting which jobs to resubmit (and how)

Guarantee
basic, may partially resubmit
Failures
out of memory III
job resubmission errors
job can't be read
job was made with different driver

◆ retrieve()

static bool EL::Driver::retrieve ( const std::string &  location)
staticinherited

retrieve all the output for the job in the given location

While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.

Returns
whether the job completed successfully
Guarantee
basic, may partially retrieve
Failures
out of memory III
job failures
job can't be read
job was made with different driver

◆ submit()

std::string EL::Driver::submit ( const Job job,
const std::string &  location 
) const
inherited

submit the given job with the given output location and wait for it to finish

This is mostly for small jobs and backward compatibility. For longer jobs use submitOnly instead.

Returns
The actual location of the submit directory, if the job was configured to generate a unique directory.
Guarantee
basic, may partially submit
Failures
out of memory II
Failures
can't create directory at location
submission errors

◆ submitOnly()

std::string EL::Driver::submitOnly ( const Job job,
const std::string &  location 
) const
inherited

submit the given job with the given output location and return immediately

This method allows you to submit jobs to your local batch system, log out and at a later point log back in again.

Returns
The actual location of the submit directory, if the job was configured to generate a unique directory.
Guarantee
basic, may partially submit
Failures
out of memory II
can't create directory at location
submission errors \warn not all drivers support this. some will do all their work in the submit function. \warn you normally need to call wait() or retrieve() before you can use the output.

◆ testInvariant()

void EL::ExecDriver::testInvariant ( ) const

effects: test the invariant of this object guarantee: no-fail

◆ updateLocation()

static void EL::Driver::updateLocation ( const std::string &  location)
staticinherited

update the internal location of files, after moving the submission directory

Guarantee
basic, may update partially
Failures
out of memory II \warn only move the submission directory after all your jobs are finished, or the results will be unpredictable

◆ wait()

static bool EL::Driver::wait ( const std::string &  location,
unsigned  time = 60 
)
staticinherited

retrieve all the output for the job in the given location and wait until it is finished completely.

poll the output every time seconds.

While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.

Typically sleeping for 60 seconds is an appropriate interval, but if it doesn't work for you, you can change it here.

Guarantee
basic, may partially retrieve
Failures
out of memory III
job failures
job can't be read
job was made with different driver

Member Data Documentation

◆ abortRetrieve

bool EL::Driver::abortRetrieve
staticprotectedinherited

this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible.

drivers can use it to abort long running operations in doRetrieve before completion

Definition at line 212 of file Driver.h.

◆ m_options

SH::MetaObject EL::Driver::m_options
privateinherited

members directly corresponding to accessors

Definition at line 233 of file Driver.h.

◆ shellInit

std::string EL::BatchDriver::shellInit
inherited

description: these shell commands are run verbatim on each worker node before execution

Definition at line 45 of file BatchDriver.h.


The documentation for this class was generated from the following file: