Collaboration diagram for python.TrfUtils.JobRunnerTransform:

Public Member Functions
def	__init__ (self, inputParamName, outputParamName, templateOutputName='outputfile', jobDirOutputName='', mandatoryArgs=[], optionalArgs=[])

def	setProdDir (self, dir)

def	setProdTaskDatabase (self, taskdb)

def	getJobRunner (self, **jobRunnerArgs)

def	addOutput (self, paramName, templateName, jobDirName='')

def	showParams (self)

def	configure (self)

def	addTaskToDatabase (self, comment='')

def	run (self)

def	go (self, commentForTaskDb='')

def	report (self, errAcronym='', moreText='')

Public Attributes
	inputParamName

	outputParamName

	mandatoryArgs

	optionalArgs

	runner

	templateOutputName

	jobDirOutputName

	outputList

	reportName

	prodDir

	prodTaskDb

	argdictFileName

	argdict

	inputfiles

	outputfile

	outputds

	dataset

	taskname

	jobname

Detailed Description

Job transform for running a JobRunner job at T0 or at the CAF Task Management
   System. Note that this class may abort execution by calling exit() in case of errors.
   Except in case of syntactical errors caught by OptionParser, a jobReport will always
   be produced.

Definition at line 117 of file TrfUtils.py.

Constructor & Destructor Documentation

◆ init()

def python.TrfUtils.JobRunnerTransform.__init__	(	self,
		inputParamName,
		outputParamName,
		templateOutputName = `'outputfile'`,
		jobDirOutputName = `''`,
		mandatoryArgs = `[]`,
		optionalArgs = `[]`
	)

Definition at line 123 of file TrfUtils.py.

     def __init__(self, inputParamName, outputParamName, templateOutputName='outputfile', jobDirOutputName='',
                  mandatoryArgs = [], optionalArgs = []):
         self.inputParamName = inputParamName
         self.outputParamName = outputParamName
         self.mandatoryArgs = mandatoryArgs
         if inputParamName not in mandatoryArgs:
             mandatoryArgs.append(inputParamName)
         if outputParamName not in mandatoryArgs:
             mandatoryArgs.append(outputParamName)
         self.optionalArgs = optionalArgs
         self.runner = None
         self.templateOutputName = templateOutputName
         self.jobDirOutputName = jobDirOutputName
         self.outputList = [ ]   # List of qualified output files (ie file names including dataset name)
         self.reportName = 'jobReport.json'
         self.prodDir = '.'
         self.prodTaskDb = ''
  
         # Process command line args and extract argdict
         parser = OptionParser(usage="%prog --argJSON=<JSON file>")
         parser.add_option('-a', '--argJSON', dest='argJSON', help='Local file with JSON-serialized dictionary of key/value pairs')
         (options,args) = parser.parse_args()
         if len(args) != 0:
             self.report('WRONGARGSNUMBER_ERROR','Wrong number of command line arguments')
             parser.error('wrong number of command line arguments')
         if not options.argJSON:
             self.report('NOARGDICT_ERROR','Must use --argJSON to specify argdict.json file')
             parser.error('option --argJSON is mandatory')
         try:
             self.argdictFileName = options.argJSON 
             self.argdict = readJSON(options.argJSON)
         except Exception as e:
             self.report('ARGDICTNOTREADABLE_ERROR','File %s with JSON-serialized argdict cannot be read' % options.argJSON)
             print ('ERROR: file %s with JSON-serialized argdict cannot be read' % options.argJSON)
             print ('DEBUG: Exception =',e)
             sys.exit(1)
  
         # Print argdict
         print ('\nInput argdict (%s):\n' % options.argJSON)
         print (pprint.pformat(self.argdict))
         print ('\n')
  
         # Check for all mandatory parameters
         missingArgs = [ x for x in mandatoryArgs if x not in self.argdict ]
         if missingArgs:
             self.report('MISSINGPARAM_ERROR','Mandatory parameter(s) missing from argdict: '+str(missingArgs))
             print ('ERROR: mandatory parameter(s) missing from argdict:', missingArgs)
             sys.exit(1)
  
         # Extract input and output dataset and file names
         # NOTE: inputs come from a list, output is a single file (but there might be others)
         self.inputds, self.inputfiles = parseQualifiedFileNames(self.argdict[inputParamName])
         if not self.inputfiles:
             self.report('NOINPUTFILE_ERROR','No input file specified (only dataset name?)')
             print ('ERROR: no input file specified')
             sys.exit(1)
         self.outputfile = getFileName(self.argdict[outputParamName])
         #self.outputList.append(self.argdict[outputParamName])
         self.outputds = getDataSetName(self.argdict[outputParamName])
         if not self.outputds:
             self.report('NODATASET_ERROR','No dataset given in parameter '+outputParamName)
             print ('ERROR: No dataset given in parameter',outputParamName)
             sys.exit(1)
         splitDSName = self.outputds.split('.')
         if len(splitDSName)<5:
             self.report('DATASETNAME_ERROR',"Output dataset name %s doesn't conform to standard naming convention" % self.outputds)
             print ("ERROR: Output dataset name %s doesn't conform to standard naming convention" % self.outputds)
             sys.exit(1)
         self.dataset = '.'.join(splitDSName[:-3])
         self.taskname = '.'.join(splitDSName[-2:])
         self.jobname = os.path.basename(self.outputfile)
         if '_attempt' in self.argdict:
             self.jobname = self.jobname + '.v%s' % self.argdict['_attempt']
  

Member Function Documentation

◆ addOutput()

def python.TrfUtils.JobRunnerTransform.addOutput	(	self,
		paramName,
		templateName,
		jobDirName = `''`
	)

Add an additional output file to the output dataset. If jobDirName is set, the
   output file will also be copied under that name to the job directory.

Definition at line 235 of file TrfUtils.py.

     def addOutput(self, paramName, templateName, jobDirName=''):
         """Add an additional output file to the output dataset. If jobDirName is set, the
            output file will also be copied under that name to the job directory."""
         #self.outputList.append(self.argdict[paramName])
         f = getFileName(self.argdict[paramName])
         self.runner.setParam(templateName,f)
         if jobDirName:
             self.runner.appendParam('cmdjobpostprocessing',
                                     'cp %s %s/%s-%s' % (f,self.runner.getParam('jobdir'),self.jobname,jobDirName))

◆ addTaskToDatabase()

def python.TrfUtils.JobRunnerTransform.addTaskToDatabase	(	self,
		comment = `''`
	)

Definition at line 266 of file TrfUtils.py.

     def addTaskToDatabase(self,comment=''):
         if self.prodTaskDb:
             try:
                 with TaskManager(self.prodTaskDb) as taskman:
                     taskman.addTask(self.dataset,
                                     self.taskname,
                                     self.runner.getParam('joboptionpath'),
                                     self.runner.getParam('release'),
                                     self.runner.getNJobs(),
                                     self.runner.getParam('taskpostprocsteps'),
                                     comment=comment)
             except Exception as e:
                 print ('ERROR: Unable to add task to task manager database '+self.prodTaskDb)
                 print ('DEBUG: Exception =',e)
         else:
             print ('WARNING: No task manager database configured')
  

◆ configure()

def python.TrfUtils.JobRunnerTransform.configure ( self )

Definition at line 248 of file TrfUtils.py.

     def configure(self):
         self.runner.configure()
         # If JobRunner configuration was successful, move any earlier jobs
         # out of the way if _attempt is specified. This should guarantee
         # that the task directory contains only valid jobs
         if '_attempt' in self.argdict:
             currentAttempt = int(self.argdict['_attempt'])
             for i in range(-1,currentAttempt):
                 if i==-1:
                     d = '%s/%s/%s/%s' % (self.prodDir,self.dataset,self.taskname,os.path.basename(self.outputfile))
                 else:
                     d = '%s/%s/%s/%s.v%s' % (self.prodDir,self.dataset,self.taskname,os.path.basename(self.outputfile),i)
                 if os.path.exists(d):
                     retriedJobDir = '%s/%s/%s/RETRIED_JOBS' % (self.prodDir,self.dataset,self.taskname)
                     print ('\nMoving previous job directory %s to %s' % (d,retriedJobDir))
                     os.system('mkdir -p %s' % (retriedJobDir))
                     os.system('mv -f %s %s' % (d,retriedJobDir))
  

◆ getJobRunner()

def python.TrfUtils.JobRunnerTransform.getJobRunner	(		self,
		**	jobRunnerArgs
	)

Definition at line 209 of file TrfUtils.py.

     def getJobRunner(self,**jobRunnerArgs):
         if self.runner:
             print ('WARNING: Overwriting already configured JobRunner')
         self.runner = JobRunner.JobRunner(jobdir=self.prodDir+'/'+self.dataset+'/'+self.taskname+'/'+self.jobname,
                                           jobname=self.jobname,
                                           inputds=self.inputds,
                                           inputfiles=self.inputfiles,
                                           outputds=self.outputds,
                                           filesperjob=len(self.inputfiles),
                                           setuprelease=False,
                                           addinputtopoolcatalog=False,
                                           returnstatuscode=True)
         self.runner.appendParam('cmdjobpreprocessing',
                                 'cp %s %s/%s.argdict.json' % (self.argdictFileName, self.runner.getParam('jobdir'),self.jobname))
         self.runner.setParam(self.templateOutputName,self.outputfile)
         if self.jobDirOutputName:
             self.runner.appendParam('cmdjobpostprocessing',
                                     'cp %s %s/%s-%s' % (self.outputfile, self.runner.getParam('jobdir'),self.jobname,self.jobDirOutputName))
         for k,v in jobRunnerArgs.items():
             self.runner.setParam(k,v)
         for k,v in self.argdict.items():
             if k in self.mandatoryArgs: continue
             if k in self.optionalArgs: continue
             self.runner.setParam(k,v)
         return self.runner
  

◆ go()

def python.TrfUtils.JobRunnerTransform.go	(	self,
		commentForTaskDb = `''`
	)

Show parameters, configure job, update task database, run job and produce report.
   This method will ensure that a job report is always produced, independently of any errors.

Definition at line 288 of file TrfUtils.py.

     def go(self,commentForTaskDb=''):
         """Show parameters, configure job, update task database, run job and produce report.
            This method will ensure that a job report is always produced, independently of any errors."""
         try:
             self.showParams()
             self.configure()
         except Exception as e:
             self.report('JOBRUNNER_CONFIGURE_ERROR','Unable to configure JobRunner job - perhaps same job was already configured / run before?')
             print ("ERROR: Unable to configure JobRunner job - perhaps same job was already configured / run before?")
             print ('DEBUG: Exception =',e)
         else:
             try:
                 self.addTaskToDatabase(commentForTaskDb)
                 self.run()
             finally:
                 self.report()
  
  

◆ report()

def python.TrfUtils.JobRunnerTransform.report	(	self,
		errAcronym = `''`,
		moreText = `''`
	)

Definition at line 306 of file TrfUtils.py.

     def report(self,errAcronym='',moreText=''):
         if errAcronym:
             jobStatus = 999
             jobStatusAcronym = errAcronym
         else:
             try:
                 jobStatus = self.runner.jobStatus[0]   # Assume we always run single jobs
                 jobStatusAcronym = 'OK' if jobStatus==0 else 'ATHENA_ERROR'
             except Exception:
                 jobStatus = 999
                 jobStatusAcronym = 'NOJOBSTATUS_ERROR'
                 moreText = "Jobrunner terminated abnormally and w/o a job status; athena job may or may not have run"
  
         jobStatusAcronym = jobStatusAcronym[:128]   # 128 char limit in T0 DB
         report =  {'exitCode': jobStatus,
                    'exitAcronym': jobStatusAcronym,
                    'files': { 'output':[] }
                   }
         if moreText:
             report['exitMsg'] = moreText
  
         # If there was no error, store outputs (request from Armin to not give any outputs for failed jobs).
         # Must also check that output file indeed exists.
         if jobStatus==0:
             for f in self.outputList:
                 if os.path.exists(getFileName(f)):
                     report['files']['output'].append(getFileDescription(f))
  
         # Write jobReport file
         writeJSON(self.reportName,report)
  
         # Copy  jobReport file to job directory - note we do this only if there was
         # no error, otherwise we might overwrite an older report from an OK job
         if jobStatus==0:
             try:
                 os.system('cp %s %s/%s.%s' % (self.reportName,self.runner.getParam('jobdir'),self.jobname,self.reportName) )
             except Exception as e:
                 print ('WARNING: Copying of job report file (%s) to job directory failed' % self.reportName)
                 print ('DEBUG: Exception =',e)
  
         # Nicely print job report to stdout
         print ('\n\nJob report (jobReport.json):\n')
         print (pprint.pformat(report))
         print ('\n')

◆ run()

def python.TrfUtils.JobRunnerTransform.run ( self )

Definition at line 283 of file TrfUtils.py.

     def run(self):
         self.runner.run()
         #print (self.runner.jobStatus)
  
  

◆ setProdDir()

def python.TrfUtils.JobRunnerTransform.setProdDir	(	self,
		dir
	)

Definition at line 197 of file TrfUtils.py.

     def setProdDir(self,dir):
         if os.access(dir,os.W_OK):
             self.prodDir = dir
         else:
             # Continue anyway or abort?
             print ('ERROR: No write access to production directory',dir,'- will use current working directory instead:', os.getcwd())
             self.prodDir = os.getcwd()
             sys.exit(1)
  

◆ setProdTaskDatabase()

def python.TrfUtils.JobRunnerTransform.setProdTaskDatabase	(	self,
		taskdb
	)

Definition at line 206 of file TrfUtils.py.

     def setProdTaskDatabase(self,taskdb):
         self.prodTaskDb = taskdb
  

◆ showParams()

def python.TrfUtils.JobRunnerTransform.showParams ( self )

Definition at line 244 of file TrfUtils.py.

     def showParams(self):
         print ('JobRunner parameters:\n')
         self.runner.showParams()
  

Member Data Documentation

◆ argdict

python.TrfUtils.JobRunnerTransform.argdict

Definition at line 152 of file TrfUtils.py.

◆ argdictFileName

python.TrfUtils.JobRunnerTransform.argdictFileName

Definition at line 151 of file TrfUtils.py.

◆ dataset

python.TrfUtils.JobRunnerTransform.dataset

Definition at line 190 of file TrfUtils.py.

◆ inputfiles

python.TrfUtils.JobRunnerTransform.inputfiles

Definition at line 173 of file TrfUtils.py.

◆ inputParamName

python.TrfUtils.JobRunnerTransform.inputParamName

Definition at line 124 of file TrfUtils.py.

◆ jobDirOutputName

python.TrfUtils.JobRunnerTransform.jobDirOutputName

Definition at line 134 of file TrfUtils.py.

◆ jobname

python.TrfUtils.JobRunnerTransform.jobname

Definition at line 192 of file TrfUtils.py.

◆ mandatoryArgs

python.TrfUtils.JobRunnerTransform.mandatoryArgs

Definition at line 126 of file TrfUtils.py.

◆ optionalArgs

python.TrfUtils.JobRunnerTransform.optionalArgs

Definition at line 131 of file TrfUtils.py.

◆ outputds

python.TrfUtils.JobRunnerTransform.outputds

Definition at line 180 of file TrfUtils.py.

◆ outputfile

python.TrfUtils.JobRunnerTransform.outputfile

Definition at line 178 of file TrfUtils.py.

◆ outputList

python.TrfUtils.JobRunnerTransform.outputList

Definition at line 135 of file TrfUtils.py.

◆ outputParamName

python.TrfUtils.JobRunnerTransform.outputParamName

Definition at line 125 of file TrfUtils.py.

◆ prodDir

python.TrfUtils.JobRunnerTransform.prodDir

Definition at line 137 of file TrfUtils.py.

◆ prodTaskDb

python.TrfUtils.JobRunnerTransform.prodTaskDb

Definition at line 138 of file TrfUtils.py.

◆ reportName

python.TrfUtils.JobRunnerTransform.reportName

Definition at line 136 of file TrfUtils.py.

◆ runner

python.TrfUtils.JobRunnerTransform.runner

Definition at line 132 of file TrfUtils.py.

◆ taskname

python.TrfUtils.JobRunnerTransform.taskname

Definition at line 191 of file TrfUtils.py.

◆ templateOutputName

python.TrfUtils.JobRunnerTransform.templateOutputName

Definition at line 133 of file TrfUtils.py.

The documentation for this class was generated from the following file:

TrfUtils.py

Public Member Functions

Public Attributes