|
ATLAS Offline Software
|
a Driver to submit jobs via prun
More...
#include <PrunDriver.h>
|
static void | status (const std::string &location) |
|
static void | setState (const std::string &location, const std::string &task, const std::string &state) |
|
static void | resubmit (const std::string &location, const std::string &option) |
| resubmit all failed sub-jobs for the job in the given location More...
|
|
static bool | retrieve (const std::string &location) |
| retrieve all the output for the job in the given location More...
|
|
static bool | wait (const std::string &location, unsigned time=60) |
| retrieve all the output for the job in the given location and wait until it is finished completely. More...
|
|
static void | updateLocation (const std::string &location) |
| update the internal location of files, after moving the submission directory More...
|
|
static void | mergedOutputSave (Detail::ManagerData &data) |
| create and save a sample handler assuming we created all the merged files at the requested locations More...
|
|
static void | diskOutputSave (Detail::ManagerData &data) |
| make the output sample handler for the given job or stream from the information stored in the histogram files. More...
|
|
|
static bool | abortRetrieve |
| this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible. More...
|
|
a Driver to submit jobs via prun
Definition at line 23 of file PrunDriver.h.
◆ PrunDriver()
EL::PrunDriver::PrunDriver |
( |
| ) |
|
◆ ClassDef()
◆ diskOutputSave()
make the output sample handler for the given job or stream from the information stored in the histogram files.
This is optional, but it is convenient for drivers that use (conventional) writers
- Guarantee
- basic
- Failures
- out of memory II
i/o errors
◆ doManagerStep()
Definition at line 473 of file PrunDriver.cxx.
476 using namespace msgEventLoop;
482 const std::string jobELGDir =
data.submitDir +
"/elg";
483 const std::string runShFile = jobELGDir +
"/runjob.sh";
485 const std::string mergeShFile = jobELGDir +
"/elg_merge";
491 const std::string jobDefFile = jobELGDir +
"/jobdef.root";
492 gSystem->Exec(Form(
"mkdir -p %s", jobELGDir.c_str()));
493 gSystem->Exec(Form(
"cp %s %s", runShOrig.c_str(), runShFile.c_str()));
494 gSystem->Exec(Form(
"chmod +x %s", runShFile.c_str()));
495 gSystem->Exec(Form(
"cp %s %s", mergeShOrig.c_str(), mergeShFile.c_str()));
496 gSystem->Exec(Form(
"chmod +x %s", mergeShFile.c_str()));
501 if (listToShipToGrid.size()){
503 "Creating symbolic links for additional files or directories to be sent to grid.\n"
504 "For root or heavy files you should also add their name (not the full path) to EL::Job::optUserFiles.\n"
505 "Otherwise prun ignores those files."
508 std::vector<std::string> vect_filesOrDirToShip;
510 boost::split(vect_filesOrDirToShip,listToShipToGrid,boost::is_any_of(
","));
513 for (
const std::string & fileOrDirToShip: vect_filesOrDirToShip){
514 ANA_MSG_INFO ((
"Creating symbolic link for: " +fileOrDirToShip).c_str());
527 std::string outputSampleName = meta.
castString(
"nc_outputSampleName");
528 if (outputSampleName.empty()) {
529 outputSampleName =
"user.%nickname%.%in:name%";
531 meta.
setString(
"nc_outDS", formatOutputName(meta, outputSampleName));
533 meta.
setString(
"nc_writeInputToTxt",
"IN:input.txt");
535 const std::string execstr =
"runjob.sh " + (*s)->name();
539 saveJobDef(jobDefFile, *
data.job,
sh);
544 shOut.
save(
data.submitDir +
"/output-" +
out->label());
547 shHist.
save(
data.submitDir +
"/output-hist");
553 sh.save(
data.submitDir +
"/input");
554 data.submitted =
true;
567 return ::StatusCode::SUCCESS;
◆ doRetrieve()
Definition at line 570 of file PrunDriver.cxx.
575 TmpCd tmpDir(
data.submitDir);
585 processAllInState(
sh, JobState::DOWNLOAD, nDlThreads);
590 std::cout << std::endl;
594 JobState::Enum state = sampleState(*
s);
598 std::cout << (*s)->name() <<
"\t";
602 case JobState::DOWNLOAD:
607 std::cout <<
"\033[1;32m" <<
JobState::name[state] <<
"\033[0m\t";
609 case JobState::FAILED:
610 std::cout <<
"\033[1;31m" <<
JobState::name[state] <<
"\033[0m\t";
613 std::cout <<
details << std::endl;
618 std::cout << std::endl;
620 data.retrieved =
true;
621 data.completed = allDone;
622 return ::StatusCode::SUCCESS;
◆ mergedOutputSave()
create and save a sample handler assuming we created all the merged files at the requested locations
This is optional, but it is convenient for drivers that want to keep their outputs locally.
- Guarantee
- basic
- Failures
- out of memory II
i/o errors
◆ options() [1/2]
the list of options to jobs with this driver
- Guarantee
- no-fail
- Postcondition
- result != 0
◆ options() [2/2]
◆ resubmit()
static void EL::Driver::resubmit |
( |
const std::string & |
location, |
|
|
const std::string & |
option |
|
) |
| |
|
staticinherited |
resubmit all failed sub-jobs for the job in the given location
\parm option driver-specific option string selecting which jobs to resubmit (and how)
- Guarantee
- basic, may partially resubmit
- Failures
- out of memory III
job resubmission errors
job can't be read
job was made with different driver
◆ retrieve()
static bool EL::Driver::retrieve |
( |
const std::string & |
location | ) |
|
|
staticinherited |
retrieve all the output for the job in the given location
While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.
- Returns
- whether the job completed successfully
- Guarantee
- basic, may partially retrieve
- Failures
- out of memory III
job failures
job can't be read
job was made with different driver
◆ setState()
void EL::PrunDriver::setState |
( |
const std::string & |
location, |
|
|
const std::string & |
task, |
|
|
const std::string & |
state |
|
) |
| |
|
static |
Definition at line 643 of file PrunDriver.cxx.
650 TmpCd tmpDir(location);
654 if (not
sh.get(task)) {
655 std::cout <<
"Unknown task: " << task << std::endl;
656 std::cout <<
"Choose one of: " << std::endl;
661 sh.get(task)->meta()->setString(
"nc_ELG_state", state);
◆ status()
void EL::PrunDriver::status |
( |
const std::string & |
location | ) |
|
|
static |
Definition at line 625 of file PrunDriver.cxx.
628 TmpCd tmpDir(location);
635 JobState::Enum state = sampleState(*
s);
639 <<
"\t" <<
details << std::endl;
◆ submit()
std::string EL::Driver::submit |
( |
const Job & |
job, |
|
|
const std::string & |
location |
|
) |
| const |
|
inherited |
submit the given job with the given output location and wait for it to finish
This is mostly for small jobs and backward compatibility. For longer jobs use submitOnly instead.
- Returns
- The actual location of the submit directory, if the job was configured to generate a unique directory.
- Guarantee
- basic, may partially submit
- Failures
- out of memory II
- Failures
- can't create directory at location
submission errors
◆ submitOnly()
std::string EL::Driver::submitOnly |
( |
const Job & |
job, |
|
|
const std::string & |
location |
|
) |
| const |
|
inherited |
submit the given job with the given output location and return immediately
This method allows you to submit jobs to your local batch system, log out and at a later point log back in again.
- Returns
- The actual location of the submit directory, if the job was configured to generate a unique directory.
- Guarantee
- basic, may partially submit
- Failures
- out of memory II
can't create directory at location
submission errors \warn not all drivers support this. some will do all their work in the submit function. \warn you normally need to call wait() or retrieve() before you can use the output.
◆ testInvariant()
void EL::PrunDriver::testInvariant |
( |
| ) |
const |
◆ updateLocation()
static void EL::Driver::updateLocation |
( |
const std::string & |
location | ) |
|
|
staticinherited |
update the internal location of files, after moving the submission directory
- Guarantee
- basic, may update partially
- Failures
- out of memory II \warn only move the submission directory after all your jobs are finished, or the results will be unpredictable
◆ wait()
static bool EL::Driver::wait |
( |
const std::string & |
location, |
|
|
unsigned |
time = 60 |
|
) |
| |
|
staticinherited |
retrieve all the output for the job in the given location and wait until it is finished completely.
poll the output every time seconds.
While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.
Typically sleeping for 60 seconds is an appropriate interval, but if it doesn't work for you, you can change it here.
- Guarantee
- basic, may partially retrieve
- Failures
- out of memory III
job failures
job can't be read
job was made with different driver
◆ abortRetrieve
bool EL::Driver::abortRetrieve |
|
staticprotectedinherited |
this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible.
drivers can use it to abort long running operations in doRetrieve before completion
Definition at line 212 of file Driver.h.
◆ m_options
members directly corresponding to accessors
Definition at line 233 of file Driver.h.
The documentation for this class was generated from the following files:
SH::MetaObject * options()
the list of options to jobs with this driver
char data[hepevt_bytes_allocation_ATLAS]
virtual ::StatusCode doManagerStep(Detail::ManagerData &data) const
std::vector< Sample * >::const_iterator iterator
the iterator to use
void save(const std::string &directory) const
save the list of samples to the given directory
static const std::string optGridPrunShipAdditionalFilesOrDirs
Enables to ship additional files to the tarbal sent to the grid Should be a list of comma separated p...
@ doRetrieve
call the actual doRetrieve method
std::string PathResolverFindCalibFile(const std::string &logical_file_name)
void exec(const std::string &cmd)
effects: execute the given command guarantee: strong failures: out of memory II failures: system fail...
A class that manages a list of Sample objects.
::StatusCode doRetrieve(Detail::ManagerData &data) const
std::string outputFileNames(const EL::Job &job)
#define RCU_READ_INVARIANT(x)
@ submitJob
do the actual job submission
#define RCU_NEW_INVARIANT(x)