ATLAS Offline Software
Public Member Functions | Static Public Member Functions | Protected Member Functions | Static Protected Attributes | Private Member Functions | Private Attributes | List of all members
EL::PrunDriver Class Referencefinal

a Driver to submit jobs via prun More...

#include <PrunDriver.h>

Inheritance diagram for EL::PrunDriver:
Collaboration diagram for EL::PrunDriver:

Public Member Functions

 PrunDriver ()
 
void testInvariant () const
 
SH::MetaObjectoptions ()
 the list of options to jobs with this driver More...
 
const SH::MetaObjectoptions () const
 
std::string submit (const Job &job, const std::string &location) const
 submit the given job with the given output location and wait for it to finish More...
 
std::string submitOnly (const Job &job, const std::string &location) const
 submit the given job with the given output location and return immediately More...
 

Static Public Member Functions

static void status (const std::string &location)
 
static void setState (const std::string &location, const std::string &task, const std::string &state)
 
static void resubmit (const std::string &location, const std::string &option)
 resubmit all failed sub-jobs for the job in the given location More...
 
static bool retrieve (const std::string &location)
 retrieve all the output for the job in the given location More...
 
static bool wait (const std::string &location, unsigned time=60)
 retrieve all the output for the job in the given location and wait until it is finished completely. More...
 
static void updateLocation (const std::string &location)
 update the internal location of files, after moving the submission directory More...
 
static void mergedOutputSave (Detail::ManagerData &data)
 create and save a sample handler assuming we created all the merged files at the requested locations More...
 
static void diskOutputSave (Detail::ManagerData &data)
 make the output sample handler for the given job or stream from the information stored in the histogram files. More...
 

Protected Member Functions

virtual ::StatusCode doManagerStep (Detail::ManagerData &data) const override
 

Static Protected Attributes

static bool abortRetrieve
 this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible. More...
 

Private Member Functions

::StatusCode doRetrieve (Detail::ManagerData &data) const
 
 ClassDef (EL::PrunDriver, 1)
 

Private Attributes

SH::MetaObject m_options
 members directly corresponding to accessors More...
 

Detailed Description

a Driver to submit jobs via prun

Definition at line 23 of file PrunDriver.h.

Constructor & Destructor Documentation

◆ PrunDriver()

EL::PrunDriver::PrunDriver ( )

Definition at line 468 of file PrunDriver.cxx.

469 {
470  RCU_NEW_INVARIANT(this);
471 }

Member Function Documentation

◆ ClassDef()

EL::PrunDriver::ClassDef ( EL::PrunDriver  ,
 
)
private

◆ diskOutputSave()

static void EL::Driver::diskOutputSave ( Detail::ManagerData data)
staticinherited

make the output sample handler for the given job or stream from the information stored in the histogram files.

This is optional, but it is convenient for drivers that use (conventional) writers

Guarantee
basic
Failures
out of memory II
i/o errors

◆ doManagerStep()

StatusCode EL::PrunDriver::doManagerStep ( Detail::ManagerData data) const
overrideprotected

Definition at line 473 of file PrunDriver.cxx.

475 {
476  using namespace msgEventLoop;
478  switch (data.step)
479  {
481  {
482  const std::string jobELGDir = data.submitDir + "/elg";
483  const std::string runShFile = jobELGDir + "/runjob.sh";
484  //const std::string runShOrig = "$ROOTCOREBIN/data/EventLoopGrid/runjob.sh";
485  const std::string mergeShFile = jobELGDir + "/elg_merge";
486  //const std::string mergeShOrig =
487  // "$ROOTCOREBIN/user_scripts/EventLoopGrid/elg_merge";
488  const std::string runShOrig = PathResolverFindCalibFile("EventLoopGrid/runjob.sh");
489  const std::string mergeShOrig = PathResolverFindCalibFile("EventLoopGrid/elg_merge");
490 
491  const std::string jobDefFile = jobELGDir + "/jobdef.root";
492  gSystem->Exec(Form("mkdir -p %s", jobELGDir.c_str()));
493  gSystem->Exec(Form("cp %s %s", runShOrig.c_str(), runShFile.c_str()));
494  gSystem->Exec(Form("chmod +x %s", runShFile.c_str()));
495  gSystem->Exec(Form("cp %s %s", mergeShOrig.c_str(), mergeShFile.c_str()));
496  gSystem->Exec(Form("chmod +x %s", mergeShFile.c_str()));
497 
498  // create symbolic links for additionnal files/directories if any to ship to the grid
499  std::string listToShipToGrid = data.options.castString(EL::Job::optGridPrunShipAdditionalFilesOrDirs, "");
500  // parse the list of comma separated files and/or directories to ship to the grid
501  if (listToShipToGrid.size()){
502  ANA_MSG_INFO (
503  "Creating symbolic links for additional files or directories to be sent to grid.\n"
504  "For root or heavy files you should also add their name (not the full path) to EL::Job::optUserFiles.\n"
505  "Otherwise prun ignores those files."
506  );
507 
508  std::vector<std::string> vect_filesOrDirToShip;
509  // split string based on comma separators
510  boost::split(vect_filesOrDirToShip,listToShipToGrid,boost::is_any_of(","));
511 
512  // Create symbolic links of files or directories to the submission directory
513  for (const std::string & fileOrDirToShip: vect_filesOrDirToShip){
514  ANA_MSG_INFO (("Creating symbolic link for: " +fileOrDirToShip).c_str());
515  RCU::Shell::exec("ln -sf " + fileOrDirToShip + " " + jobELGDir);
516  }
517  ANA_MSG_INFO ("Finished creation of symbolic links");
518  }
519 
520  const SH::SampleHandler& sh = data.job->sampleHandler();
521 
522  for (SH::SampleHandler::iterator s = sh.begin(); s != sh.end(); ++s) {
523  SH::MetaObject& meta = *(*s)->meta();
524  meta.fetchDefaults(data.options);
525  meta.fetchDefaults(defaultOpts());
526  meta.setString("nc_outputs", outputFileNames(*data.job));
527  std::string outputSampleName = meta.castString("nc_outputSampleName");
528  if (outputSampleName.empty()) {
529  outputSampleName = "user.%nickname%.%in:name%";
530  }
531  meta.setString("nc_outDS", formatOutputName(meta, outputSampleName));
532  meta.setString("nc_inDS", meta.castString("nc_grid", (*s)->name()));
533  meta.setString("nc_writeInputToTxt", "IN:input.txt");
534  meta.setString("nc_match", meta.castString("nc_grid_filter"));
535  const std::string execstr = "runjob.sh " + (*s)->name();
536  meta.setString("nc_exec", execstr);
537  }
538 
539  saveJobDef(jobDefFile, *data.job, sh);
540 
541  for (EL::Job::outputIter out = data.job->outputBegin();
542  out != data.job->outputEnd(); ++out) {
543  SH::SampleHandler shOut = outputSH(sh, out->label());
544  shOut.save(data.submitDir + "/output-" + out->label());
545  }
546  SH::SampleHandler shHist = outputSH(sh, "hist-output");
547  shHist.save(data.submitDir + "/output-hist");
548 
549  TmpCd keepDir(jobELGDir);
550 
551  processAllInState(sh, JobState::INIT, 0);
552 
553  sh.save(data.submitDir + "/input");
554  data.submitted = true;
555  }
556  break;
557 
559  {
561  }
562  break;
563 
564  default:
565  (void) true; // safe to do nothing
566  }
567  return ::StatusCode::SUCCESS;
568 }

◆ doRetrieve()

StatusCode EL::PrunDriver::doRetrieve ( Detail::ManagerData data) const
private

Definition at line 570 of file PrunDriver.cxx.

571 {
572  RCU_READ_INVARIANT(this);
573  RCU_REQUIRE(not data.submitDir.empty());
574 
575  TmpCd tmpDir(data.submitDir);
576 
578  sh.load("input");
579  RCU_ASSERT(sh.size());
580 
581  const size_t nRunThreads = options()->castDouble("nc_run_threads", 0);
582  const size_t nDlThreads = options()->castDouble("nc_download_threads", 0);
583  processAllInState(sh, JobState::INIT, 0);
584  processAllInState(sh, JobState::RUN, nRunThreads);
585  processAllInState(sh, JobState::DOWNLOAD, nDlThreads);
586  processAllInState(sh, JobState::MERGE, 0);
587 
588  sh.save("input");
589 
590  std::cout << std::endl;
591 
592  bool allDone = true;
593  for (SH::SampleHandler::iterator s = sh.begin(); s != sh.end(); ++s) {
594  JobState::Enum state = sampleState(*s);
595  std::string details = (*s)->meta()->castString("nc_ELG_state_details", "", SH::MetaObject::CAST_NOCAST_DEFAULT);
596  if (not details.empty()) { details = '(' + details + ')'; }
597 
598  std::cout << (*s)->name() << "\t";
599  switch (state) {
600  case JobState::INIT:
601  case JobState::RUN:
602  case JobState::DOWNLOAD:
603  case JobState::MERGE:
604  std::cout << JobState::name[state] << "\t";
605  break;
606  case JobState::FINISHED:
607  std::cout << "\033[1;32m" << JobState::name[state] << "\033[0m\t";
608  break;
609  case JobState::FAILED:
610  std::cout << "\033[1;31m" << JobState::name[state] << "\033[0m\t";
611  break;
612  }
613  std::cout << details << std::endl;
614 
615  allDone &= (state == JobState::FINISHED || state == JobState::FAILED);
616  }
617 
618  std::cout << std::endl;
619 
620  data.retrieved = true;
621  data.completed = allDone;
622  return ::StatusCode::SUCCESS;
623 }

◆ mergedOutputSave()

static void EL::Driver::mergedOutputSave ( Detail::ManagerData data)
staticinherited

create and save a sample handler assuming we created all the merged files at the requested locations

This is optional, but it is convenient for drivers that want to keep their outputs locally.

Guarantee
basic
Failures
out of memory II
i/o errors

◆ options() [1/2]

SH::MetaObject* EL::Driver::options ( )
inherited

the list of options to jobs with this driver

Guarantee
no-fail
Postcondition
result != 0

◆ options() [2/2]

const SH::MetaObject* EL::Driver::options ( ) const
inherited

◆ resubmit()

static void EL::Driver::resubmit ( const std::string &  location,
const std::string &  option 
)
staticinherited

resubmit all failed sub-jobs for the job in the given location

\parm option driver-specific option string selecting which jobs to resubmit (and how)

Guarantee
basic, may partially resubmit
Failures
out of memory III
job resubmission errors
job can't be read
job was made with different driver

◆ retrieve()

static bool EL::Driver::retrieve ( const std::string &  location)
staticinherited

retrieve all the output for the job in the given location

While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.

Returns
whether the job completed successfully
Guarantee
basic, may partially retrieve
Failures
out of memory III
job failures
job can't be read
job was made with different driver

◆ setState()

void EL::PrunDriver::setState ( const std::string &  location,
const std::string &  task,
const std::string &  state 
)
static

Definition at line 643 of file PrunDriver.cxx.

646 {
647  RCU_REQUIRE(not location.empty());
648  RCU_REQUIRE(not task.empty());
649  RCU_REQUIRE(not state.empty());
650  TmpCd tmpDir(location);
652  sh.load("input");
653  RCU_ASSERT(sh.size());
654  if (not sh.get(task)) {
655  std::cout << "Unknown task: " << task << std::endl;
656  std::cout << "Choose one of: " << std::endl;
657  sh.print();
658  return;
659  }
660  JobState::parse(state);
661  sh.get(task)->meta()->setString("nc_ELG_state", state);
662  sh.save("input");
663 }

◆ status()

void EL::PrunDriver::status ( const std::string &  location)
static

Definition at line 625 of file PrunDriver.cxx.

626 {
627  RCU_REQUIRE(not location.empty());
628  TmpCd tmpDir(location);
630  sh.load("input");
631  RCU_ASSERT(sh.size());
632  processAllInState(sh, JobState::RUN, 0);
633  sh.save("input");
634  for (SH::SampleHandler::iterator s = sh.begin(); s != sh.end(); ++s) {
635  JobState::Enum state = sampleState(*s);
636  std::string details = (*s)->meta()->castString("nc_ELG_state_details", "", SH::MetaObject::CAST_NOCAST_DEFAULT);
637  if (not details.empty()) { details = '(' + details + ')'; }
638  std::cout << (*s)->name() << "\t" << JobState::name[state]
639  << "\t" << details << std::endl;
640  }
641 }

◆ submit()

std::string EL::Driver::submit ( const Job job,
const std::string &  location 
) const
inherited

submit the given job with the given output location and wait for it to finish

This is mostly for small jobs and backward compatibility. For longer jobs use submitOnly instead.

Returns
The actual location of the submit directory, if the job was configured to generate a unique directory.
Guarantee
basic, may partially submit
Failures
out of memory II
Failures
can't create directory at location
submission errors

◆ submitOnly()

std::string EL::Driver::submitOnly ( const Job job,
const std::string &  location 
) const
inherited

submit the given job with the given output location and return immediately

This method allows you to submit jobs to your local batch system, log out and at a later point log back in again.

Returns
The actual location of the submit directory, if the job was configured to generate a unique directory.
Guarantee
basic, may partially submit
Failures
out of memory II
can't create directory at location
submission errors \warn not all drivers support this. some will do all their work in the submit function. \warn you normally need to call wait() or retrieve() before you can use the output.

◆ testInvariant()

void EL::PrunDriver::testInvariant ( ) const

Definition at line 463 of file PrunDriver.cxx.

464 {
465  RCU_INVARIANT(this != 0);
466 }

◆ updateLocation()

static void EL::Driver::updateLocation ( const std::string &  location)
staticinherited

update the internal location of files, after moving the submission directory

Guarantee
basic, may update partially
Failures
out of memory II \warn only move the submission directory after all your jobs are finished, or the results will be unpredictable

◆ wait()

static bool EL::Driver::wait ( const std::string &  location,
unsigned  time = 60 
)
staticinherited

retrieve all the output for the job in the given location and wait until it is finished completely.

poll the output every time seconds.

While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.

Typically sleeping for 60 seconds is an appropriate interval, but if it doesn't work for you, you can change it here.

Guarantee
basic, may partially retrieve
Failures
out of memory III
job failures
job can't be read
job was made with different driver

Member Data Documentation

◆ abortRetrieve

bool EL::Driver::abortRetrieve
staticprotectedinherited

this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible.

drivers can use it to abort long running operations in doRetrieve before completion

Definition at line 212 of file Driver.h.

◆ m_options

SH::MetaObject EL::Driver::m_options
privateinherited

members directly corresponding to accessors

Definition at line 233 of file Driver.h.


The documentation for this class was generated from the following files:
EL::Driver::options
SH::MetaObject * options()
the list of options to jobs with this driver
data
char data[hepevt_bytes_allocation_ATLAS]
Definition: HepEvt.cxx:11
EL::Driver::doManagerStep
virtual ::StatusCode doManagerStep(Detail::ManagerData &data) const
SH::MetaObject::CAST_NOCAST_DEFAULT
@ CAST_NOCAST_DEFAULT
cast and return the default value if the input has the wrong type
Definition: MetaObject.h:78
SH::SampleHandler::iterator
std::vector< Sample * >::const_iterator iterator
the iterator to use
Definition: SampleHandler.h:475
INIT
#define INIT(__TYPE)
python.SystemOfUnits.s
int s
Definition: SystemOfUnits.py:131
offline_EventStorage_v5::FINISHED
@ FINISHED
Definition: v5_DataWriter.h:42
SH::MetaObject
A class that manages meta-data to be associated with an object.
Definition: MetaObject.h:56
EL::OutputStream
Definition: OutputStream.h:34
parse
std::map< std::string, std::string > parse(const std::string &list)
Definition: egammaLayerRecalibTool.cxx:1054
RCU_REQUIRE
#define RCU_REQUIRE(x)
Definition: Assert.h:208
python.AthDsoLogger.out
out
Definition: AthDsoLogger.py:71
ANA_CHECK
#define ANA_CHECK(EXP)
check whether the given expression was successful
Definition: Control/AthToolSupport/AsgMessaging/AsgMessaging/MessageCheck.h:324
SH::SampleHandler::save
void save(const std::string &directory) const
save the list of samples to the given directory
FullCPAlgorithmsTest_eljob.sh
sh
Definition: FullCPAlgorithmsTest_eljob.py:114
details
Definition: IParticleWriter.h:21
internal_poltrig::MERGE
@ MERGE
Definition: PolygonTriangulator.cxx:112
RCU::Shell
Definition: ShellExec.cxx:28
EL::Job::optGridPrunShipAdditionalFilesOrDirs
static const std::string optGridPrunShipAdditionalFilesOrDirs
Enables to ship additional files to the tarbal sent to the grid Should be a list of comma separated p...
Definition: Job.h:485
ANA_MSG_INFO
#define ANA_MSG_INFO(xmsg)
Macro printing info messages.
Definition: Control/AthToolSupport/AsgMessaging/AsgMessaging/MessageCheck.h:290
SH::MetaObject::castString
std::string castString(const std::string &name, const std::string &def_val="", CastMode mode=CAST_ERROR_THROW) const
the meta-data string with the given name
EL::Detail::ManagerStep::doRetrieve
@ doRetrieve
call the actual doRetrieve method
RCU_INVARIANT
#define RCU_INVARIANT(x)
Definition: Assert.h:201
SH::MetaObject::setString
void setString(const std::string &name, const std::string &value)
set the meta-data string with the given name
name
std::string name
Definition: Control/AthContainers/Root/debug.cxx:221
SH::MetaObject::fetchDefaults
void fetchDefaults(const MetaObject &source)
fetch the meta-data from the given sample not present in this sample.
PathResolverFindCalibFile
std::string PathResolverFindCalibFile(const std::string &logical_file_name)
Definition: PathResolver.cxx:431
RCU::Shell::exec
void exec(const std::string &cmd)
effects: execute the given command guarantee: strong failures: out of memory II failures: system fail...
Definition: ShellExec.cxx:29
SH::MetaObject::castDouble
double castDouble(const std::string &name, double def_val=0, CastMode mode=CAST_ERROR_THROW) const
the meta-data double with the given name
SH::SampleHandler
A class that manages a list of Sample objects.
Definition: SampleHandler.h:60
EL::PrunDriver::doRetrieve
::StatusCode doRetrieve(Detail::ManagerData &data) const
Definition: PrunDriver.cxx:570
outputFileNames
std::string outputFileNames(const EL::Job &job)
Definition: PrunDriver.cxx:402
RCU_ASSERT
#define RCU_ASSERT(x)
Definition: Assert.h:222
create_dcsc_inputs_sqlite.RUN
int RUN
Definition: create_dcsc_inputs_sqlite.py:45
RCU_READ_INVARIANT
#define RCU_READ_INVARIANT(x)
Definition: Assert.h:229
EL::Detail::ManagerStep::submitJob
@ submitJob
do the actual job submission
skel.keepDir
keepDir
Definition: skel.ABtoEVGEN.py:487
Trk::split
@ split
Definition: LayerMaterialProperties.h:38
RCU_NEW_INVARIANT
#define RCU_NEW_INVARIANT(x)
Definition: Assert.h:233