ATLAS Offline Software
|
a Driver for running batch jobs locally as a new process More...
#include <ExecDriver.h>
Public Member Functions | |
void | testInvariant () const |
effects: test the invariant of this object guarantee: no-fail More... | |
ExecDriver () | |
effects: standard default constructor guarantee: strong failures: low level errors I More... | |
SH::MetaObject * | options () |
the list of options to jobs with this driver More... | |
const SH::MetaObject * | options () const |
std::string | submit (const Job &job, const std::string &location) const |
submit the given job with the given output location and wait for it to finish More... | |
std::string | submitOnly (const Job &job, const std::string &location) const |
submit the given job with the given output location and return immediately More... | |
Static Public Member Functions | |
static void | resubmit (const std::string &location, const std::string &option) |
resubmit all failed sub-jobs for the job in the given location More... | |
static bool | retrieve (const std::string &location) |
retrieve all the output for the job in the given location More... | |
static bool | wait (const std::string &location, unsigned time=60) |
retrieve all the output for the job in the given location and wait until it is finished completely. More... | |
static void | updateLocation (const std::string &location) |
update the internal location of files, after moving the submission directory More... | |
static void | mergedOutputSave (Detail::ManagerData &data) |
create and save a sample handler assuming we created all the merged files at the requested locations More... | |
static void | diskOutputSave (Detail::ManagerData &data) |
make the output sample handler for the given job or stream from the information stored in the histogram files. More... | |
Public Attributes | |
std::string | shellInit |
description: these shell commands are run verbatim on each worker node before execution More... | |
Protected Member Functions | |
virtual ::StatusCode | doManagerStep (Detail::ManagerData &data) const override |
ClassDef (ExecDriver, 1) | |
Static Protected Attributes | |
static bool | abortRetrieve |
this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible. More... | |
Private Member Functions | |
std::string | defaultReleaseSetup (const Detail::ManagerData &data) const |
the code for setting up the release More... | |
void | makeScript (Detail::ManagerData &data, std::size_t njobs) const |
effects: create the run script to be used guarantee: basic, may create a partial script failures: out of memory II failures: i/o errors More... | |
Static Private Member Functions | |
static bool | mergeHists (Detail::ManagerData &data) |
effects: merge the fetched histograms returns: wether all histograms have been fetched guarantee: strong failures: out of memory II failures: i/o errors More... | |
Private Attributes | |
SH::MetaObject | m_options |
members directly corresponding to accessors More... | |
a Driver for running batch jobs locally as a new process
The main purpose here is to get rid of the memory overhead from configuration. With all the necessary ROOT dictionaries loaded into python we are consuming more than 1GB of memory. So the strategy is to run the configuration in one process, and then replace the existing process with a new process for running the job.
Definition at line 26 of file ExecDriver.h.
EL::ExecDriver::ExecDriver | ( | ) |
effects: standard default constructor guarantee: strong failures: low level errors I
|
protected |
|
privateinherited |
the code for setting up the release
|
staticinherited |
make the output sample handler for the given job or stream from the information stored in the histogram files.
This is optional, but it is convenient for drivers that use (conventional) writers
|
overrideprotected |
|
privateinherited |
effects: create the run script to be used guarantee: basic, may create a partial script failures: out of memory II failures: i/o errors
|
staticinherited |
create and save a sample handler assuming we created all the merged files at the requested locations
This is optional, but it is convenient for drivers that want to keep their outputs locally.
|
staticprivateinherited |
effects: merge the fetched histograms returns: wether all histograms have been fetched guarantee: strong failures: out of memory II failures: i/o errors
|
inherited |
the list of options to jobs with this driver
|
inherited |
|
staticinherited |
resubmit all failed sub-jobs for the job in the given location
\parm option driver-specific option string selecting which jobs to resubmit (and how)
|
staticinherited |
retrieve all the output for the job in the given location
While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.
submit the given job with the given output location and wait for it to finish
This is mostly for small jobs and backward compatibility. For longer jobs use submitOnly instead.
|
inherited |
submit the given job with the given output location and return immediately
This method allows you to submit jobs to your local batch system, log out and at a later point log back in again.
void EL::ExecDriver::testInvariant | ( | ) | const |
effects: test the invariant of this object guarantee: no-fail
|
staticinherited |
update the internal location of files, after moving the submission directory
|
staticinherited |
retrieve all the output for the job in the given location and wait until it is finished completely.
poll the output every time seconds.
While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.
Typically sleeping for 60 seconds is an appropriate interval, but if it doesn't work for you, you can change it here.
|
staticprotectedinherited |
|
privateinherited |
|
inherited |
description: these shell commands are run verbatim on each worker node before execution
Definition at line 45 of file BatchDriver.h.