ATLAS Offline Software
Public Member Functions | Static Public Member Functions | Protected Member Functions | Static Protected Attributes | Private Member Functions | Private Attributes | List of all members
EL::PrunDriver Class Referencefinal

a Driver to submit jobs via prun More...

#include <PrunDriver.h>

Inheritance diagram for EL::PrunDriver:
Collaboration diagram for EL::PrunDriver:

Public Member Functions

 PrunDriver ()
 
void testInvariant () const
 
SH::MetaObjectoptions ()
 the list of options to jobs with this driver More...
 
const SH::MetaObjectoptions () const
 
std::string submit (const Job &job, const std::string &location) const
 submit the given job with the given output location and wait for it to finish More...
 
std::string submitOnly (const Job &job, const std::string &location) const
 submit the given job with the given output location and return immediately More...
 

Static Public Member Functions

static void status (const std::string &location)
 
static void setState (const std::string &location, const std::string &task, const std::string &state)
 
static void resubmit (const std::string &location, const std::string &option)
 resubmit all failed sub-jobs for the job in the given location More...
 
static bool retrieve (const std::string &location)
 retrieve all the output for the job in the given location More...
 
static bool wait (const std::string &location, unsigned time=60)
 retrieve all the output for the job in the given location and wait until it is finished completely. More...
 
static void updateLocation (const std::string &location)
 update the internal location of files, after moving the submission directory More...
 
static void mergedOutputSave (Detail::ManagerData &data)
 create and save a sample handler assuming we created all the merged files at the requested locations More...
 
static void diskOutputSave (Detail::ManagerData &data)
 make the output sample handler for the given job or stream from the information stored in the histogram files. More...
 

Protected Member Functions

virtual ::StatusCode doManagerStep (Detail::ManagerData &data) const override
 

Static Protected Attributes

static bool abortRetrieve
 this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible. More...
 

Private Member Functions

::StatusCode doRetrieve (Detail::ManagerData &data) const
 
 ClassDef (EL::PrunDriver, 1)
 

Private Attributes

SH::MetaObject m_options
 members directly corresponding to accessors More...
 

Detailed Description

a Driver to submit jobs via prun

Definition at line 23 of file PrunDriver.h.

Constructor & Destructor Documentation

◆ PrunDriver()

EL::PrunDriver::PrunDriver ( )

Definition at line 491 of file PrunDriver.cxx.

492 {
493  RCU_NEW_INVARIANT(this);
494 }

Member Function Documentation

◆ ClassDef()

EL::PrunDriver::ClassDef ( EL::PrunDriver  ,
 
)
private

◆ diskOutputSave()

static void EL::Driver::diskOutputSave ( Detail::ManagerData data)
staticinherited

make the output sample handler for the given job or stream from the information stored in the histogram files.

This is optional, but it is convenient for drivers that use (conventional) writers

Guarantee
basic
Failures
out of memory II
i/o errors

◆ doManagerStep()

StatusCode EL::PrunDriver::doManagerStep ( Detail::ManagerData data) const
overrideprotected

Definition at line 496 of file PrunDriver.cxx.

498 {
499  using namespace msgEventLoop;
501  switch (data.step)
502  {
504  {
505  const std::string jobELGDir = data.submitDir + "/elg";
506  const std::string runShFile = jobELGDir + "/runjob.sh";
507  //const std::string runShOrig = "$ROOTCOREBIN/data/EventLoopGrid/runjob.sh";
508  const std::string mergeShFile = jobELGDir + "/elg_merge";
509  //const std::string mergeShOrig =
510  // "$ROOTCOREBIN/user_scripts/EventLoopGrid/elg_merge";
511  const std::string runShOrig = PathResolverFindCalibFile("EventLoopGrid/runjob.sh");
512  const std::string mergeShOrig = PathResolverFindCalibFile("EventLoopGrid/elg_merge");
513 
514  const std::string jobDefFile = jobELGDir + "/jobdef.root";
515  gSystem->Exec(Form("mkdir -p %s", jobELGDir.c_str()));
516  gSystem->Exec(Form("cp %s %s", runShOrig.c_str(), runShFile.c_str()));
517  gSystem->Exec(Form("chmod +x %s", runShFile.c_str()));
518  gSystem->Exec(Form("cp %s %s", mergeShOrig.c_str(), mergeShFile.c_str()));
519  gSystem->Exec(Form("chmod +x %s", mergeShFile.c_str()));
520 
521  // create symbolic links for additionnal files/directories if any to ship to the grid
522  std::string listToShipToGrid = data.options.castString(EL::Job::optGridPrunShipAdditionalFilesOrDirs, "");
523  // parse the list of comma separated files and/or directories to ship to the grid
524  if (listToShipToGrid.size()){
525  ANA_MSG_INFO (
526  "Creating symbolic links for additional files or directories to be sent to grid.\n"
527  "For root or heavy files you should also add their name (not the full path) to EL::Job::optUserFiles.\n"
528  "Otherwise prun ignores those files."
529  );
530 
531  std::vector<std::string> vect_filesOrDirToShip;
532  // split string based on comma separators
533  boost::split(vect_filesOrDirToShip,listToShipToGrid,boost::is_any_of(","));
534 
535  // Create symbolic links of files or directories to the submission directory
536  for (const std::string & fileOrDirToShip: vect_filesOrDirToShip){
537  ANA_MSG_INFO (("Creating symbolic link for: " +fileOrDirToShip).c_str());
538  RCU::Shell::exec("ln -sf " + fileOrDirToShip + " " + jobELGDir);
539  }
540  ANA_MSG_INFO ("Finished creation of symbolic links");
541  }
542 
543  const SH::SampleHandler& sh = data.job->sampleHandler();
544 
545  for (SH::SampleHandler::iterator s = sh.begin(); s != sh.end(); ++s) {
546  SH::MetaObject& meta = *(*s)->meta();
547  meta.fetchDefaults(data.options);
548  meta.fetchDefaults(defaultOpts());
549  meta.setString("nc_outputs", outputFileNames(*data.job));
550  std::string outputSampleName = meta.castString("nc_outputSampleName");
551  if (outputSampleName.empty()) {
552  outputSampleName = "user.%nickname%.%in:name%";
553  }
554  meta.setString("nc_outDS", formatOutputName(meta, outputSampleName));
555  meta.setString("nc_inDS", meta.castString("nc_grid", (*s)->name()));
556  meta.setString("nc_writeInputToTxt", "IN:input.txt");
557  meta.setString("nc_match", meta.castString("nc_grid_filter"));
558  const std::string execstr = "runjob.sh " + (*s)->name();
559  meta.setString("nc_exec", execstr);
560  meta.setString("nc_framework", "EventLoopGrid");
561  }
562 
563  saveJobDef(jobDefFile, *data.job, sh);
564 
565  for (EL::Job::outputIter out = data.job->outputBegin();
566  out != data.job->outputEnd(); ++out) {
567  SH::SampleHandler shOut = outputSH(sh, out->label());
568  shOut.save(data.submitDir + "/output-" + out->label());
569  }
570  SH::SampleHandler shHist = outputSH(sh, "hist-output");
571  shHist.save(data.submitDir + "/output-hist");
572 
573  TmpCd keepDir(jobELGDir);
574 
575  processAllInState(sh, JobState::INIT, 0);
576 
577  sh.save(data.submitDir + "/input");
578  data.submitted = true;
579  }
580  break;
581 
583  {
585  }
586  break;
587 
588  default:
589  (void) true; // safe to do nothing
590  }
591  return ::StatusCode::SUCCESS;
592 }

◆ doRetrieve()

StatusCode EL::PrunDriver::doRetrieve ( Detail::ManagerData data) const
private

Definition at line 594 of file PrunDriver.cxx.

595 {
596  RCU_READ_INVARIANT(this);
597  RCU_REQUIRE(not data.submitDir.empty());
598 
599  TmpCd tmpDir(data.submitDir);
600 
602  sh.load("input");
603  RCU_ASSERT(sh.size());
604 
605  const size_t nRunThreads = options()->castDouble("nc_run_threads", 0);
606  const size_t nDlThreads = options()->castDouble("nc_download_threads", 0);
607  processAllInState(sh, JobState::INIT, 0);
608  processAllInState(sh, JobState::RUN, nRunThreads);
609  processAllInState(sh, JobState::DOWNLOAD, nDlThreads);
610  processAllInState(sh, JobState::MERGE, 0);
611 
612  sh.save("input");
613 
614  std::cout << std::endl;
615 
616  bool allDone = true;
617  for (SH::SampleHandler::iterator s = sh.begin(); s != sh.end(); ++s) {
618  JobState::Enum state = sampleState(*s);
619  std::string details = (*s)->meta()->castString("nc_ELG_state_details", "", SH::MetaObject::CAST_NOCAST_DEFAULT);
620  if (not details.empty()) { details = '(' + details + ')'; }
621 
622  std::cout << (*s)->name() << "\t";
623  switch (state) {
624  case JobState::INIT:
625  case JobState::RUN:
626  case JobState::DOWNLOAD:
627  case JobState::MERGE:
628  std::cout << JobState::name[state] << "\t";
629  break;
630  case JobState::FINISHED:
631  std::cout << "\033[1;32m" << JobState::name[state] << "\033[0m\t";
632  break;
633  case JobState::FAILED:
634  std::cout << "\033[1;31m" << JobState::name[state] << "\033[0m\t";
635  break;
636  }
637  std::cout << details << std::endl;
638 
639  allDone &= (state == JobState::FINISHED || state == JobState::FAILED);
640  }
641 
642  std::cout << std::endl;
643 
644  data.retrieved = true;
645  data.completed = allDone;
646  return ::StatusCode::SUCCESS;
647 }

◆ mergedOutputSave()

static void EL::Driver::mergedOutputSave ( Detail::ManagerData data)
staticinherited

create and save a sample handler assuming we created all the merged files at the requested locations

This is optional, but it is convenient for drivers that want to keep their outputs locally.

Guarantee
basic
Failures
out of memory II
i/o errors

◆ options() [1/2]

SH::MetaObject* EL::Driver::options ( )
inherited

the list of options to jobs with this driver

Guarantee
no-fail
Postcondition
result != 0

◆ options() [2/2]

const SH::MetaObject* EL::Driver::options ( ) const
inherited

◆ resubmit()

static void EL::Driver::resubmit ( const std::string &  location,
const std::string &  option 
)
staticinherited

resubmit all failed sub-jobs for the job in the given location

\parm option driver-specific option string selecting which jobs to resubmit (and how)

Guarantee
basic, may partially resubmit
Failures
out of memory III
job resubmission errors
job can't be read
job was made with different driver

◆ retrieve()

static bool EL::Driver::retrieve ( const std::string &  location)
staticinherited

retrieve all the output for the job in the given location

While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.

Returns
whether the job completed successfully
Guarantee
basic, may partially retrieve
Failures
out of memory III
job failures
job can't be read
job was made with different driver

◆ setState()

void EL::PrunDriver::setState ( const std::string &  location,
const std::string &  task,
const std::string &  state 
)
static

Definition at line 667 of file PrunDriver.cxx.

670 {
671  RCU_REQUIRE(not location.empty());
672  RCU_REQUIRE(not task.empty());
673  RCU_REQUIRE(not state.empty());
674  TmpCd tmpDir(location);
676  sh.load("input");
677  RCU_ASSERT(sh.size());
678  if (not sh.get(task)) {
679  std::cout << "Unknown task: " << task << std::endl;
680  std::cout << "Choose one of: " << std::endl;
681  sh.print();
682  return;
683  }
684  JobState::parse(state);
685  sh.get(task)->meta()->setString("nc_ELG_state", state);
686  sh.save("input");
687 }

◆ status()

void EL::PrunDriver::status ( const std::string &  location)
static

Definition at line 649 of file PrunDriver.cxx.

650 {
651  RCU_REQUIRE(not location.empty());
652  TmpCd tmpDir(location);
654  sh.load("input");
655  RCU_ASSERT(sh.size());
656  processAllInState(sh, JobState::RUN, 0);
657  sh.save("input");
658  for (SH::SampleHandler::iterator s = sh.begin(); s != sh.end(); ++s) {
659  JobState::Enum state = sampleState(*s);
660  std::string details = (*s)->meta()->castString("nc_ELG_state_details", "", SH::MetaObject::CAST_NOCAST_DEFAULT);
661  if (not details.empty()) { details = '(' + details + ')'; }
662  std::cout << (*s)->name() << "\t" << JobState::name[state]
663  << "\t" << details << std::endl;
664  }
665 }

◆ submit()

std::string EL::Driver::submit ( const Job job,
const std::string &  location 
) const
inherited

submit the given job with the given output location and wait for it to finish

This is mostly for small jobs and backward compatibility. For longer jobs use submitOnly instead.

Returns
The actual location of the submit directory, if the job was configured to generate a unique directory.
Guarantee
basic, may partially submit
Failures
out of memory II
Failures
can't create directory at location
submission errors

◆ submitOnly()

std::string EL::Driver::submitOnly ( const Job job,
const std::string &  location 
) const
inherited

submit the given job with the given output location and return immediately

This method allows you to submit jobs to your local batch system, log out and at a later point log back in again.

Returns
The actual location of the submit directory, if the job was configured to generate a unique directory.
Guarantee
basic, may partially submit
Failures
out of memory II
can't create directory at location
submission errors \warn not all drivers support this. some will do all their work in the submit function. \warn you normally need to call wait() or retrieve() before you can use the output.

◆ testInvariant()

void EL::PrunDriver::testInvariant ( ) const

Definition at line 486 of file PrunDriver.cxx.

487 {
488  RCU_INVARIANT(this != 0);
489 }

◆ updateLocation()

static void EL::Driver::updateLocation ( const std::string &  location)
staticinherited

update the internal location of files, after moving the submission directory

Guarantee
basic, may update partially
Failures
out of memory II \warn only move the submission directory after all your jobs are finished, or the results will be unpredictable

◆ wait()

static bool EL::Driver::wait ( const std::string &  location,
unsigned  time = 60 
)
staticinherited

retrieve all the output for the job in the given location and wait until it is finished completely.

poll the output every time seconds.

While job failures will cause this method to fail you can typically retry it multiple times if you can use partial results.

Typically sleeping for 60 seconds is an appropriate interval, but if it doesn't work for you, you can change it here.

Guarantee
basic, may partially retrieve
Failures
out of memory III
job failures
job can't be read
job was made with different driver

Member Data Documentation

◆ abortRetrieve

bool EL::Driver::abortRetrieve
staticprotectedinherited

this flag is set to true when the wait() function is running and a SIGINT is caught, meaning that control should be returned to the user as soon as possible.

drivers can use it to abort long running operations in doRetrieve before completion

Definition at line 212 of file Driver.h.

◆ m_options

SH::MetaObject EL::Driver::m_options
privateinherited

members directly corresponding to accessors

Definition at line 233 of file Driver.h.


The documentation for this class was generated from the following files:
EL::Driver::options
SH::MetaObject * options()
the list of options to jobs with this driver
data
char data[hepevt_bytes_allocation_ATLAS]
Definition: HepEvt.cxx:11
EL::Driver::doManagerStep
virtual ::StatusCode doManagerStep(Detail::ManagerData &data) const
SH::MetaObject::CAST_NOCAST_DEFAULT
@ CAST_NOCAST_DEFAULT
cast and return the default value if the input has the wrong type
Definition: MetaObject.h:78
SH::SampleHandler::iterator
std::vector< Sample * >::const_iterator iterator
the iterator to use
Definition: SampleHandler.h:475
INIT
#define INIT(__TYPE)
python.SystemOfUnits.s
int s
Definition: SystemOfUnits.py:131
offline_EventStorage_v5::FINISHED
@ FINISHED
Definition: v5_DataWriter.h:42
SH::MetaObject
A class that manages meta-data to be associated with an object.
Definition: MetaObject.h:56
EL::OutputStream
Definition: OutputStream.h:34
parse
std::map< std::string, std::string > parse(const std::string &list)
Definition: egammaLayerRecalibTool.cxx:1054
RCU_REQUIRE
#define RCU_REQUIRE(x)
Definition: Assert.h:208
python.AthDsoLogger.out
out
Definition: AthDsoLogger.py:71
ANA_CHECK
#define ANA_CHECK(EXP)
check whether the given expression was successful
Definition: Control/AthToolSupport/AsgMessaging/AsgMessaging/MessageCheck.h:324
SH::SampleHandler::save
void save(const std::string &directory) const
save the list of samples to the given directory
FullCPAlgorithmsTest_eljob.sh
sh
Definition: FullCPAlgorithmsTest_eljob.py:111
details
Definition: IParticleWriter.h:21
internal_poltrig::MERGE
@ MERGE
Definition: PolygonTriangulator.cxx:112
RCU::Shell
Definition: ShellExec.cxx:28
EL::Job::optGridPrunShipAdditionalFilesOrDirs
static const std::string optGridPrunShipAdditionalFilesOrDirs
Enables to ship additional files to the tarbal sent to the grid Should be a list of comma separated p...
Definition: Job.h:485
ANA_MSG_INFO
#define ANA_MSG_INFO(xmsg)
Macro printing info messages.
Definition: Control/AthToolSupport/AsgMessaging/AsgMessaging/MessageCheck.h:290
SH::MetaObject::castString
std::string castString(const std::string &name, const std::string &def_val="", CastMode mode=CAST_ERROR_THROW) const
the meta-data string with the given name
EL::Detail::ManagerStep::doRetrieve
@ doRetrieve
call the actual doRetrieve method
RCU_INVARIANT
#define RCU_INVARIANT(x)
Definition: Assert.h:201
SH::MetaObject::setString
void setString(const std::string &name, const std::string &value)
set the meta-data string with the given name
name
std::string name
Definition: Control/AthContainers/Root/debug.cxx:228
SH::MetaObject::fetchDefaults
void fetchDefaults(const MetaObject &source)
fetch the meta-data from the given sample not present in this sample.
PathResolverFindCalibFile
std::string PathResolverFindCalibFile(const std::string &logical_file_name)
Definition: PathResolver.cxx:431
RCU::Shell::exec
void exec(const std::string &cmd)
effects: execute the given command guarantee: strong failures: out of memory II failures: system fail...
Definition: ShellExec.cxx:29
SH::MetaObject::castDouble
double castDouble(const std::string &name, double def_val=0, CastMode mode=CAST_ERROR_THROW) const
the meta-data double with the given name
SH::SampleHandler
A class that manages a list of Sample objects.
Definition: SampleHandler.h:60
EL::PrunDriver::doRetrieve
::StatusCode doRetrieve(Detail::ManagerData &data) const
Definition: PrunDriver.cxx:594
outputFileNames
std::string outputFileNames(const EL::Job &job)
Definition: PrunDriver.cxx:425
RCU_ASSERT
#define RCU_ASSERT(x)
Definition: Assert.h:222
create_dcsc_inputs_sqlite.RUN
int RUN
Definition: create_dcsc_inputs_sqlite.py:45
RCU_READ_INVARIANT
#define RCU_READ_INVARIANT(x)
Definition: Assert.h:229
EL::Detail::ManagerStep::submitJob
@ submitJob
do the actual job submission
skel.keepDir
keepDir
Definition: skel.ABtoEVGEN.py:487
Trk::split
@ split
Definition: LayerMaterialProperties.h:38
RCU_NEW_INVARIANT
#define RCU_NEW_INVARIANT(x)
Definition: Assert.h:233