Core transform class. More...

Inheritance diagram for python.transform.transform:

Collaboration diagram for python.transform.transform:

Public Member Functions
def	__init__ (self, standardSignalHandlers=True, standardTrfArgs=True, standardValidationArgs=True, trfName=None, executor=None, exeArgs=None, description='')
	Initialise a job transform. More...

def	name (self)

def	exitCode (self)

def	exitMsg (self)

def	argdict (self)

def	dataDictionary (self)

def	report (self)

def	transformStart (self)

def	transformSetupCpuTime (self)

def	transformSetupWallTime (self)

def	inFileValidationCpuTime (self)

def	inFileValidationWallTime (self)

def	outFileValidationCpuTime (self)

def	outFileValidationWallTime (self)

def	outFileValidationStop (self)

def	trfPredata (self)

def	executors (self)

def	processedEvents (self)

def	getProcessedEvents (self)

def	appendToExecutorSet (self, executors)

def	parseCmdLineArgs (self, args)
	Parse command line arguments for a transform. More...

def	setGlobalLogLevel (self)
	Check transform argument dictionary and set the correct root logger option. More...

def	execute (self)
	Execute transform. More...

def	setupSplitting (self)
	Setup executor splitting. More...

def	lastExecuted (self)
	Return the last executor which actually executed. More...

def	generateReport (self, reportType=None, fast=False, fileReport=defaultFileReport)
	Transform report generator. More...

def	updateValidationDict (self, newValidationOptions)
	Setter for transform's validation dictionary. More...

def	getValidationDict (self)
	Getter function for transform validation dictionary. More...

def	getValidationOption (self, key)
	Getter for a specific validation option. More...

def	getFiles (self, io=None)
	Return a list of fileArgs used by the transform. More...

def	validateInFiles (self)

def	validateOutFiles (self)

Public Attributes
	parser

Private Member Functions
def	_setupGraph (self)
	Setup the executor graph. More...

def	_tracePath (self)
	Trace the path through the executor graph. More...

def	_doSteering (self, steeringDict=None)
	Setup steering, which manipulates the graph before we trace the path for this transform. More...

def	_exitWithReport (self, signum, frame)
	Common signal handler. More...

Private Attributes
	_transformStart
	Get transform starting timestamp as early as possible. More...

	_inFileValidationStart

	_inFileValidationStop

	_outFileValidationStart

	_outFileValidationStop

	_trfPredata
	Get trf pre-data as early as possible. More...

	_name
	Transform _name. More...

	_argdict
	Argument dictionary for this transform. More...

	_dataDictionary
	Dsta dictionary place holder (this maps data types to their argFile instances) More...

	_executors

	_executorDictionary

	_exitCode
	Transform exit code/message holders. More...

	_exitMsg

	_report
	Report object for this transform. More...

	_processedEvents
	Transform processed events. More...

	_inputData

	_outputData

	_executorGraph

	_executorPath

Detailed Description

Core transform class.

Note: Every transform should only have one transform class instantiated

Definition at line 39 of file transform.py.

Constructor & Destructor Documentation

◆ init()

def python.transform.transform.__init__	(	self,
		standardSignalHandlers = `True`,
		standardTrfArgs = `True`,
		standardValidationArgs = `True`,
		trfName = `None`,
		executor = `None`,
		exeArgs = `None`,
		description = `''`
	)

Initialise a job transform.

Parameters

standardSignalHandlers	Boolean to set signal handlers. Default `True`.
standardValidationArgs	Boolean to set standard validation options. Default `True`.
trfName	Name of the transform. Default is executable name with .py rstripped.
executor	Executor list Transform class initialiser

Definition at line 46 of file transform.py.

     def __init__(self, standardSignalHandlers = True, standardTrfArgs = True, standardValidationArgs=True, 
                  trfName = None, executor = None, exeArgs = None, description = ''):
         '''Transform class initialiser'''
         msg.debug('Welcome to ATLAS job transforms')
         
         self._transformStart = os.times()
         msg.debug('transformStart time is {0}'.format(self._transformStart))
  
         self._inFileValidationStart = None
         self._inFileValidationStop = None
         self._outFileValidationStart = None
         self._outFileValidationStop = None
  
         self._trfPredata = os.environ.get('TRF_PREDATA')
  
         self._name = trfName or path.basename(sys.argv[0]).rsplit('.py', 1)[0]
         
         self.parser = trfArgParser(description='Transform {0}. {1}'.format(self.name, description),
                                    argument_default=argparse.SUPPRESS,
                                    fromfile_prefix_chars='@')
  
         if standardTrfArgs:
             addStandardTrfArgs(self.parser)
  
         if standardValidationArgs:
             addValidationArguments(self.parser)
             addFileValidationArguments(self.parser)
  
         self._argdict = dict()
         
         self._dataDictionary = dict()
         
         # Transform executor list - initalise with an empty set
         self._executors = set()
         self._executorDictionary = {}
         
         # Append the given executors or a default one to the set:
         if executor is not None:
             self.appendToExecutorSet(executor or {transformExecutor()})
         
         self._exitCode = None
         self._exitMsg = None
  
         self._report = trfJobReport(parentTrf = self)
         
         self._processedEvents = None
         
         # Setup standard signal handling if asked
         if standardSignalHandlers:
             setTrfSignalHandlers(self._exitWithReport)
             msg.debug('Standard signal handlers established')

Member Function Documentation

◆ _doSteering()

def python.transform.transform._doSteering	(	self,
		steeringDict = `None`
	)

private

Setup steering, which manipulates the graph before we trace the path for this transform.

Parameters

steeringDict Manual steering dictionary (if specified, used instead of the steering from the steering argument - pay attention to the input structure!

Definition at line 588 of file transform.py.

     def _doSteering(self, steeringDict = None):
         if not steeringDict:
             steeringDict = self._argdict['steering'].value
         for substep, steeringValues in steeringDict.items():
             foundSubstep = False
             for executor in self._executors:
                 if executor.name == substep or executor.substep == substep:
                     foundSubstep = True
                     msg.debug('Updating {0} with {1}'.format(executor.name, steeringValues))
                     # Steering consists of tuples with (in/out, +/-, datatype) 
                     for steeringValue in steeringValues:
                         if steeringValue[0] == 'in':
                             startSet = executor.inData
                         else:
                             startSet = executor.outData
                         origLen = len(startSet)
                         msg.debug('Data values to be modified are: {0}'.format(startSet))
                         if steeringValue[1] == '+':
                             startSet.add(steeringValue[2])
                             if len(startSet) != origLen + 1:
                                 raise trfExceptions.TransformSetupException(trfExit.nameToCode('TRF_GRAPH_STEERING_ERROR'),
                                                                             'Attempting to add data type {0} from {1} {2} fails (original set of data: {3}). Was this datatype already there?'.format(steeringValue[2], executor.name, steeringValue[1], startSet))
                         else:
                             startSet.discard(steeringValue[2])
                             if len(startSet) != origLen - 1:
                                 raise trfExceptions.TransformSetupException(trfExit.nameToCode('TRF_GRAPH_STEERING_ERROR'),
                                                                             'Attempting to remove data type {0} from {1} {2} fails (original set of data: {3}). Was this datatype even present?'.format(steeringValue[2], executor.name, steeringValue[1], startSet))
                     msg.debug('Updated data values to: {0}'.format(startSet))
             if not foundSubstep:
                 raise trfExceptions.TransformSetupException(trfExit.nameToCode('TRF_GRAPH_STEERING_ERROR'),
                                                             'This transform has no executor/substep {0}'.format(substep))                                
         
  

◆ _exitWithReport()

def python.transform.transform._exitWithReport	(	self,
		signum,
		frame
	)

private

Common signal handler.

This function is installed in place of the default signal handler and attempts to terminate the transform gracefully. When a signal is caught by the transform, the stdout from the running application process (i.e. athena.py) is allowed to continue uninterrupted and write it's stdout to the log file (to retrieve the traceback) before the associated job report records the fact that a signal has been caught and complete the report accordingly.

Parameters

signum	Signal number. Not used since this is a common handle assigned to predefined signals using the `_installSignalHandlers()`. This param is still required to satisfy the requirements of `signal.signal()`.
frame	Not used. Provided here to satisfy the requirements of `signal.signal()`.

Returns: Does not return. Raises SystemExit exception.

Exceptions

SystemExit()

Definition at line 724 of file transform.py.

     def _exitWithReport(self, signum, frame):
         msg.critical('Transform received signal {0}'.format(signum))
         msg.critical('Stack trace now follows:\n{0!s}'.format(''.join(traceback.format_stack(frame))))
         self._exitCode = 128+signum
         self._exitMsg = 'Transform received signal {0}'.format(signum)
         
         # Reset signal handlers now - we don't want to recurse if the same signal arrives again (e.g. multiple ^C)
         resetTrfSignalHandlers()
  
         msg.critical('Attempting to write reports with known information...')
         self.generateReport(fast=True)
         if ('orphanKiller' in self._argdict):
             infanticide(message=True, listOrphans=True)
         else:
             infanticide(message=True)
  
         sys.exit(self._exitCode)
     

◆ _setupGraph()

def python.transform.transform._setupGraph ( self )

private

Setup the executor graph.

Note: This function might need to be called again when the number of 'substeps' is unknown just based on the input data types - e.g., DigiMReco jobs don't know how many RDOtoESD steps they need to run until after digitisation.

Definition at line 496 of file transform.py.

     def _setupGraph(self):        
         # Get input/output data
         self._inputData = list()
         self._outputData = list()
         
         for key, value in self._argdict.items():
             # Note specifier [A-Za-z0-9_]+? makes this match non-greedy (avoid swallowing the optional 'File' suffix)
             m = re.match(r'(input|output|tmp)([A-Za-z0-9_]+?)(File)?$', key)
             # N.B. Protect against taking argunents which are not type argFile
             if m:
                 if isinstance(value, argFile):
                     if m.group(1) == 'input':
                         self._inputData.append(m.group(2))
                     else:
                         self._outputData.append(m.group(2))
                     self._dataDictionary[m.group(2)] = value
                 elif isinstance(value, list) and value and isinstance(value[0], argFile):
                     if m.group(1) == 'input':
                         self._inputData.append(m.group(2))
                     else:
                         self._outputData.append(m.group(2))
                     self._dataDictionary[m.group(2)] = value
  
         
         if len(self._inputData) == 0:
             self._inputData.append('inNULL')
         if len(self._outputData) == 0:
             self._outputData.append('outNULL')            
         msg.debug('Transform has this input data: {0}; output data {1}'.format(self._inputData, self._outputData))
         
         # Now see if we have any steering - manipulate the substep inputs and outputs before we
         # setup the graph
         if 'steering' in self._argdict:
             msg.debug('Now applying steering to graph: {0}'.format(self._argdict['steering'].value))
             self._doSteering()
         
         # Setup the graph and topo sort it
         self._executorGraph = executorGraph(self._executors, self._inputData, self._outputData)
         self._executorGraph.doToposort()
     

◆ _tracePath()

def python.transform.transform._tracePath ( self )

private

Trace the path through the executor graph.

Note: This function might need to be called again when the number of 'substeps' is unknown just based on the input data types - e.g., DigiMReco jobs don't know how many RDOtoESD steps they need to run until after digitisation.

Definition at line 573 of file transform.py.

     def _tracePath(self):
         self._executorGraph.findExecutionPath()
         
         self._executorPath = self._executorGraph.execution
         if len(self._executorPath) == 0:
             raise trfExceptions.TransformSetupException(trfExit.nameToCode('TRF_SETUP'), 
                                                         'Execution path finding resulted in no substeps being executed'
                                                         '(Did you correctly specify input data for this transform?)')
         # Tell the first executor that they are the first
         self._executorDictionary[self._executorPath[0]['name']].conf.firstExecutor = True
  

◆ appendToExecutorSet()

def python.transform.transform.appendToExecutorSet	(	self,
		executors
	)

Definition at line 220 of file transform.py.

     def appendToExecutorSet(self, executors):
         # Normalise to something iterable
         if isinstance(executors, transformExecutor):
             executors = [executors,]
         elif not isinstance(executors, (list, tuple, set)):
             raise trfExceptions.TransformInternalException(trfExit.nameToCode('TRF_INTERNAL'), 
                                                            'Transform was initialised with an executor which was not a simple executor or an executor set')
     
         # TRY TO DEPRECATE SETTING trf IN THE EXECUTOR - USE CONF!
         # Executor book keeping: set parent link back to me for all executors
         # Also setup a dictionary, indexed by executor name and check that name is unique
         
         for executor in executors:
             executor.trf = self
             if executor.name in self._executorDictionary:
                 raise trfExceptions.TransformInternalException(trfExit.nameToCode('TRF_INTERNAL'), 
                                                                'Transform has been initialised with two executors with the same name ({0})'
                                                                ' - executor names must be unique'.format(executor.name))
             self._executors.add(executor)
             self._executorDictionary[executor.name] = executor
  
  

◆ argdict()

def python.transform.transform.argdict ( self )

Definition at line 133 of file transform.py.

     def argdict(self):
         return self._argdict
     

◆ dataDictionary()

def python.transform.transform.dataDictionary ( self )

Definition at line 137 of file transform.py.

     def dataDictionary(self):
         return self._dataDictionary
     

◆ execute()

def python.transform.transform.execute ( self )

Execute transform.

This function calls the actual transform execution class and sets self.exitCode, self.exitMsg and self.processedEvents transform data members.

Returns: None.

Definition at line 383 of file transform.py.

     def execute(self):
         msg.debug('Entering transform execution phase')
   
         try:
             # Intercept a few special options here      
             if 'dumpargs' in self._argdict:
                 self.parser.dumpArgs()
                 sys.exit(0)
                 
             # Graph stuff!
             msg.info('Resolving execution graph')
             self._setupGraph()
             
             if 'showSteps' in self._argdict:
                 for exe in self._executors:
                     print("Executor Step: {0} (alias {1})".format(exe.name, exe.substep))
                     if msg.level <= logging.DEBUG:
                         print(" {0} -> {1}".format(exe.inData, exe.outData))
                 sys.exit(0)
                         
             if 'showGraph' in self._argdict:
                 print(self._executorGraph)
                 sys.exit(0)
                 
             # Graph stuff!
             msg.info('Starting to trace execution path')            
             self._tracePath()
             msg.info('Execution path found with {0} step(s): {1}'.format(len(self._executorPath),
                                                                          ' '.join([exe['name'] for exe in self._executorPath])))
  
             if 'showPath' in self._argdict:
                 msg.debug('Execution path list is: {0}'.format(self._executorPath))
                 # Now print it nice
                 print('Executor path is:')
                 for node in self._executorPath:
                     print('  {0}: {1} -> {2}'.format(node['name'], list(node['input']), list(node['output'])))
                 sys.exit(0)
  
             msg.debug('Execution path is {0}'.format(self._executorPath))
                 
             # Prepare files for execution (separate method?)
             for dataType in [ data for data in self._executorGraph.data if 'NULL' not in data ]:
                 if dataType in self._dataDictionary:
                     msg.debug('Data type {0} maps to existing argument {1}'.format(dataType, self._dataDictionary[dataType]))
                 else:
                     fileName = 'tmp.' + dataType
                     # How to pick the correct argFile class?
                     for (prefix, suffix) in (('tmp', ''), ('output', 'File'), ('input', 'File')):
                         stdArgName = prefix + dataType + suffix
                         if stdArgName in self.parser._argClass:
                             msg.debug('Matched data type {0} to argument {1}'.format(dataType, stdArgName))
                             self._dataDictionary[dataType] = self.parser._argClass[stdArgName](fileName)
                             self._dataDictionary[dataType].io = 'temporary'
                             break
                     if dataType not in self._dataDictionary:
                         if 'HIST' in fileName:
                             self._dataDictionary[dataType] = argHISTFile(fileName, io='temporary', type=dataType.lower())
                             
                         else:
                             self._dataDictionary[dataType] = argFile(fileName, io='temporary', type=dataType.lower())
                             msg.debug('Did not find any argument matching data type {0} - setting to plain argFile: {1}'.format(dataType, self._dataDictionary[dataType]))
                     self._dataDictionary[dataType].name = fileName
  
             # Do splitting if required
             self.setupSplitting()
  
             # Now we can set the final executor configuration properly, with the final dataDictionary
             for executor in self._executors:
                 executor.conf.setFromTransform(self)
  
             self.validateInFiles()
             
             for executionStep in self._executorPath:
                 msg.debug('Now preparing to execute {0}'.format(executionStep))
                 executor = self._executorDictionary[executionStep['name']]
                 executor.preExecute(input = executionStep['input'], output = executionStep['output'])
                 try:
                     executor.execute()
                     executor.postExecute()
                 finally:
                     executor.validate()
              
             self._processedEvents = self.getProcessedEvents()
             self.validateOutFiles()
             
             msg.debug('Transform executor succeeded')
             self._exitCode = 0
             self._exitMsg = trfExit.codeToName(self._exitCode)
             
         except trfExceptions.TransformNeedCheckException as e:
             msg.warning('Transform executor signaled NEEDCHECK condition: {0}'.format(e.errMsg))
             self._exitCode = e.errCode
             self._exitMsg = e.errMsg
             self.generateReport(fast=False)
  
         except trfExceptions.TransformException as e:
             msg.critical('Transform executor raised %s: %s' % (e.__class__.__name__, e.errMsg))
             self._exitCode = e.errCode
             self._exitMsg = e.errMsg
             # Try and write a job report...
             self.generateReport(fast=True)
  
         finally:
             # Clean up any orphaned processes and exit here if things went bad
             infanticide(message=True)
             if self._exitCode:
                 msg.warning('Transform now exiting early with exit code {0} ({1})'.format(self._exitCode, self._exitMsg))
                 sys.exit(self._exitCode)
         

◆ executors()

def python.transform.transform.executors ( self )

Definition at line 205 of file transform.py.

     def executors(self):
         return self._executors
     

◆ exitCode()

def python.transform.transform.exitCode ( self )

Definition at line 117 of file transform.py.

     def exitCode(self):
         if self._exitCode is None:
             msg.warning('Transform exit code getter: _exitCode is unset, returning "TRF_UNKNOWN"')
             return trfExit.nameToCode('TRF_UNKNOWN')
         else:
             return self._exitCode
         

◆ exitMsg()

def python.transform.transform.exitMsg ( self )

Definition at line 125 of file transform.py.

     def exitMsg(self):
         if self._exitMsg is None:
             msg.warning('Transform exit message getter: _exitMsg is unset, returning empty string')
             return ''
         else:
             return self._exitMsg
     

◆ generateReport()

def python.transform.transform.generateReport	(	self,
		reportType = `None`,
		fast = `False`,
		fileReport = `defaultFileReport`
	)

Transform report generator.

Parameters

fast	If True ensure that no external calls are made for file metadata (this is used to generate reports in a hurry after a crash or a forced exit)
fileReport	Dictionary giving the type of report to make for each type of file. This dictionary has to have all io types as keys and valid values are: `None` - skip this io type; `'full'` - Provide all details; `'name'` - only dataset and filename will be reported on.
reportType	Iterable with report types to generate, otherwise a sensible default is used (~everything, plus the Tier0 report at Tier0)

Definition at line 645 of file transform.py.

     def generateReport(self, reportType=None, fast=False, fileReport = defaultFileReport):
         msg.debug('Transform report generator')
  
         if 'reportType' in self._argdict:
             if reportType is not None:
                 msg.info('Transform requested report types {0} overridden by command line to {1}'.format(reportType, self._argdict['reportType'].value))
             reportType = self._argdict['reportType'].value
  
         if reportType is None:
             reportType = ['json', ]
             # Only generate the Tier0 report at Tier0 ;-)
             # (It causes spurious warnings for some grid jobs with background files (e.g., digitisation)
             if 'TZHOME' in os.environ:
                 reportType.append('gpickle')
  
             if not isInteractiveEnv():
                 reportType.append('text')
                 msg.debug('Detected Non-Interactive environment. Enabled text report')
  
         if 'reportName' in self._argdict:
             baseName = classicName = self._argdict['reportName'].value
         else:
             baseName = 'jobReport'
             classicName = 'metadata'
  
         try:
             # Text: Writes environment variables and machine report in text format.
             if reportType is None or 'text' in reportType:
                 envName = baseName if 'reportName' in self._argdict else 'env'  # Use fallback name 'env.txt' if it's not specified.
                 self._report.writeTxtReport(filename='{0}.txt'.format(envName), fast=fast, fileReport=fileReport)
             # JSON
             if reportType is None or 'json' in reportType:
                 self._report.writeJSONReport(filename='{0}.json'.format(baseName), fast=fast, fileReport=fileReport)
             # Classic XML
             if reportType is None or 'classic' in reportType:
                 self._report.writeClassicXMLReport(filename='{0}.xml'.format(classicName), fast=fast)
             # Classic gPickle
             if reportType is None or 'gpickle' in reportType:
                 self._report.writeGPickleReport(filename='{0}.gpickle'.format(baseName), fast=fast)
             # Pickled version of the JSON report for pilot
             if reportType is None or 'pilotPickle' in reportType:
                 self._report.writePilotPickleReport(filename='{0}Extract.pickle'.format(baseName), fast=fast, fileReport=fileReport)
  
         except trfExceptions.TransformTimeoutException as reportException:
             msg.error('Received timeout when writing report ({0})'.format(reportException))
             msg.error('Report writing is aborted - sorry. Transform will exit with TRF_METADATA_CALL_FAIL status.')
             if ('orphanKiller' in self._argdict):
                 infanticide(message=True, listOrphans=True)
             else:
                 infanticide(message=True)
             sys.exit(trfExit.nameToCode('TRF_METADATA_CALL_FAIL'))
  
         except trfExceptions.TransformException as reportException:
             # This is a bad one!
             msg.critical('Attempt to write job report failed with exception {0!s}: {1!s}'.format(reportException.__class__.__name__, reportException))
             msg.critical('Stack trace now follows:\n{0}'.format(traceback.format_exc()))
             msg.critical('Job reports are likely to be missing or incomplete - sorry')
             msg.critical('Please report this as a transforms bug!')
             msg.critical('Before calling the report generator the transform status was: {0}; exit code {1}'.format(self._exitMsg, self._exitCode))
             msg.critical('Now exiting with a transform internal error code')
             if ('orphanKiller' in self._argdict):
                 infanticide(message=True, listOrphans=True)
             else:
                 infanticide(message=True)
             sys.exit(trfExit.nameToCode('TRF_INTERNAL'))
     
  

◆ getFiles()

def python.transform.transform.getFiles	(	self,
		io = `None`
	)

Return a list of fileArgs used by the transform.

Parameters

Definition at line 768 of file transform.py.

     def getFiles(self, io = None):
         res = []
         msg.debug('Looking for file arguments matching: io={0}'.format(io))
         for argName, arg in self._argdict.items():
             if isinstance(arg, argFile):
                 msg.debug('Argument {0} is argFile type ({1!s})'.format(argName, arg))
                 if io is not None and arg.io != io:
                     continue
                 msg.debug('Argument {0} matches criteria'.format(argName))
                 res.append(arg)
         return res
  
  

◆ getProcessedEvents()

def python.transform.transform.getProcessedEvents ( self )

Definition at line 212 of file transform.py.

     def getProcessedEvents(self):
         nEvts = None
         for executionStep in self._executorPath:
             executor = self._executorDictionary[executionStep['name']]
             if executor.conf.firstExecutor:
                 nEvts = executor.eventCount
         return nEvts
  

◆ getValidationDict()

def python.transform.transform.getValidationDict ( self )

Getter function for transform validation dictionary.

Returns: Validiation dictionary

Definition at line 753 of file transform.py.

     def getValidationDict(self):
         return self.validation
     

◆ getValidationOption()

def python.transform.transform.getValidationOption	(	self,
		key
	)

Getter for a specific validation option.

Parameters

key	Validation dictionary key

Returns: Valdiation key value or None if this key is absent

Definition at line 759 of file transform.py.

     def getValidationOption(self, key):
         if key in self.validation:
             return self.validation[key]
         else:
             return None
  

◆ inFileValidationCpuTime()

def python.transform.transform.inFileValidationCpuTime ( self )

Definition at line 165 of file transform.py.

     def inFileValidationCpuTime(self):
         inFileValidationCpuTime = None
         if self._inFileValidationStart and self._inFileValidationStop:
             inFileValidationCpuTime = calcCpuTime(self._inFileValidationStart, self._inFileValidationStop)
  
         return inFileValidationCpuTime
  

◆ inFileValidationWallTime()

def python.transform.transform.inFileValidationWallTime ( self )

Definition at line 173 of file transform.py.

     def inFileValidationWallTime(self):
         inFileValidationWallTime = None
         if self._inFileValidationStart and self._inFileValidationStop:
             inFileValidationWallTime = calcWallTime(self._inFileValidationStart, self._inFileValidationStop)
  
         return inFileValidationWallTime
  

◆ lastExecuted()

def python.transform.transform.lastExecuted ( self )

Return the last executor which actually executed.

Returns: Last executor which has _hasExecuted == True, or the very first executor if we didn't even start yet

Definition at line 624 of file transform.py.

     def lastExecuted(self):
         # Just make sure we have the path traced
         if not hasattr(self, '_executorPath') or len(self._executorPath) == 0:
             return None
             
         lastExecutor = self._executorDictionary[self._executorPath[0]['name']]
         for executorStep in self._executorPath[1:]:
             if self._executorDictionary[executorStep['name']].hasExecuted:
                 lastExecutor = self._executorDictionary[executorStep['name']]
         return lastExecutor
  
  

◆ name()

def python.transform.transform.name ( self )

Definition at line 113 of file transform.py.

     def name(self):
         return self._name
             

◆ outFileValidationCpuTime()

def python.transform.transform.outFileValidationCpuTime ( self )

Definition at line 181 of file transform.py.

     def outFileValidationCpuTime(self):
         outFileValidationCpuTime = None
         if self._outFileValidationStart and self._outFileValidationStop:
             outFileValidationCpuTime = calcCpuTime(self._outFileValidationStart, self._outFileValidationStop)
  
         return outFileValidationCpuTime
  

◆ outFileValidationStop()

def python.transform.transform.outFileValidationStop ( self )

Definition at line 197 of file transform.py.

     def outFileValidationStop(self):
         return self._outFileValidationStop
  

◆ outFileValidationWallTime()

def python.transform.transform.outFileValidationWallTime ( self )

Definition at line 189 of file transform.py.

     def outFileValidationWallTime(self):
         outFileValidationWallTime = None
         if self._outFileValidationStart and self._outFileValidationStop:
             outFileValidationWallTime = calcWallTime(self._outFileValidationStart, self._outFileValidationStop)
  
         return outFileValidationWallTime
  

◆ parseCmdLineArgs()

def python.transform.transform.parseCmdLineArgs	(	self,
		args
	)

Parse command line arguments for a transform.

Definition at line 243 of file transform.py.

     def parseCmdLineArgs(self, args):
         msg.info('Transform command line was: %s', ' '.join(shQuoteStrings(sys.argv)))
  
         try:
             # Use the argparse infrastructure to get the actual command line arguments
             self._argdict=vars(self.parser.parse_args(args))
  
             # Need to know if any input or output files were set - if so then we suppress the
             # corresponding parameters from AMI
             inputFiles = outputFiles = False
             for k, v in self._argdict.items():
                 if k.startswith('input') and isinstance(v, argFile):
                     inputFiles = True
                 elif k.startswith('output') and isinstance(v, argFile):
                     outputFiles = True
             msg.debug("CLI Input files: {0}; Output files {1}".format(inputFiles, outputFiles))
  
             # Now look for special arguments, which expand out to other parameters
             # Note that the pickled argdict beats AMIConfig because dict.update() will overwrite
             # (However, we defend the real command line against updates from either source)
             extraParameters = {}
             # AMI configuration?
             if 'AMIConfig' in self._argdict:
                 msg.debug('Given AMI tag configuration {0}'.format(self._argdict['AMIConfig']))
                 from PyJobTransforms.trfAMI import TagInfo
                 tag=TagInfo(self._argdict['AMIConfig'].value)
                 updateDict = {}
                 for k, v in dict(tag.trfs[0]).items():
                     # Convert to correct internal key form
                     k = cliToKey(k)
                     if inputFiles and k.startswith('input'):
                         msg.debug('Suppressing argument {0} from AMI'
                                   ' because input files have been specified on the command line'.format(k))
                         continue
                     if outputFiles and k.startswith('output'):
                         msg.debug('Suppressing argument {0} from AMI'
                                   ' because output files have been specified on the command line'.format(k))
                         continue
                     updateDict[k] = v
                 extraParameters.update(updateDict)
  
             # JSON arguments?
             if 'argJSON' in self._argdict:
                 try:
                     import json
                     msg.debug('Given JSON encoded arguments in {0}'.format(self._argdict['argJSON']))
                     argfile = open(self._argdict['argJSON'], 'r')
                     jsonParams = json.load(argfile)
                     msg.debug('Read: {0}'.format(jsonParams))
                     extraParameters.update(convertToStr(jsonParams))
                     argfile.close()
                 except Exception as e:
                     raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_ARG_ERROR'), 'Error when deserialising JSON file {0} ({1})'.format(self._argdict['argJSON'], e))
             
             # Event Service
             if 'eventService' in self._argdict and self._argdict['eventService'].value:
                 updateDict = {}
                 updateDict['athenaMPMergeTargetSize'] = '*:0'
                 updateDict['checkEventCount'] = False
                 updateDict['outputFileValidation'] = False
                 extraParameters.update(updateDict)
                 
             # Process anything we found
             # List of command line arguments
             argsList = [ i.split("=", 1)[0].lstrip('-') for i in args if i.startswith('-')]
             for k,v in extraParameters.items():
                 msg.debug('Found this extra argument: {0} with value: {1} ({2})'.format(k, v, type(v)))
                 if k not in self.parser._argClass and k not in self.parser._argAlias:
                     raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_ARG_ERROR'), 'Argument "{0}" not known (try "--help")'.format(k))
                 # Check if it is an alias
                 if k in self.parser._argAlias:
                     msg.debug('Resolving alias from {0} to {1}'.format(k, self.parser._argAlias[k]))
                     k = self.parser._argAlias[k]
                 # Check if argument has already been set on the command line
                 if k in argsList:
                     msg.debug('Ignored {0}={1} as extra parameter because this argument was given on the command line.'.format(k, v))
                     continue
                 # For callable classes we instantiate properly, otherwise we set the value for simple arguments
                 if '__call__' in dir(self.parser._argClass[k]):
                     self._argdict[k] = self.parser._argClass[k](v)
                 else:
                     self._argdict[k] = v
                 msg.debug('Argument {0} set to {1}'.format(k, self._argdict[k]))
  
             # Set the key name as an argument property - useful to be able to look bask at where this
             # argument came from
             for k, v in self._argdict.items():
                 if isinstance(v, argument):
                     v.name = k
                 elif isinstance(v, list):
                     for it in v:
                         if isinstance(it, argument):
                             it.name = k
  
             # Now we parsed all arguments, if a pickle/json dump is requested do it here and exit
             if 'dumpPickle' in self._argdict:
                 msg.info('Now dumping pickled version of command line to {0}'.format(self._argdict['dumpPickle']))
                 pickledDump(self._argdict)
                 sys.exit(0)
  
             # Now we parsed all arguments, if a pickle/json dump is requested do it here and exit
             if 'dumpJSON' in self._argdict:
                 msg.info('Now dumping JSON version of command line to {0}'.format(self._argdict['dumpJSON']))
                 JSONDump(self._argdict)
                 sys.exit(0)
  
         except trfExceptions.TransformArgException as e:
             msg.critical('Argument parsing failure: {0!s}'.format(e))
             self._exitCode = e.errCode
             self._exitMsg = e.errMsg
             self._report.fast = True
             self.generateReport()
             sys.exit(self._exitCode)
             
         except trfExceptions.TransformAMIException as e:
             msg.critical('AMI failure: {0!s}'.format(e))
             self._exitCode = e.errCode
             self._exitMsg = e.errMsg
             sys.exit(self._exitCode)
  
         self.setGlobalLogLevel()
         
  

◆ processedEvents()

def python.transform.transform.processedEvents ( self )

Definition at line 209 of file transform.py.

     def processedEvents(self):
         return self._processedEvents
     

◆ report()

def python.transform.transform.report ( self )

Definition at line 141 of file transform.py.

     def report(self):
         return self._report
     

◆ setGlobalLogLevel()

def python.transform.transform.setGlobalLogLevel ( self )

Check transform argument dictionary and set the correct root logger option.

Definition at line 367 of file transform.py.

     def setGlobalLogLevel(self):
         if 'verbose' in self._argdict:
             setRootLoggerLevel(stdLogLevels['DEBUG'])
         elif 'loglevel' in self._argdict:
             if self._argdict['loglevel'] in stdLogLevels:
                 msg.info("Loglevel option found - setting root logger level to %s", 
                          logging.getLevelName(stdLogLevels[self._argdict['loglevel']]))
                 setRootLoggerLevel(stdLogLevels[self._argdict['loglevel']])
             else:
                 msg.warning('Unrecognised loglevel ({0}) given - ignored'.format(self._argdict['loglevel']))
  
         

◆ setupSplitting()

def python.transform.transform.setupSplitting ( self )

Setup executor splitting.

Definition at line 538 of file transform.py.

     def setupSplitting(self):
         if 'splitConfig' not in self._argdict:
             return
  
         split = []
         for executionStep in self._executorPath:
             baseStepName = executionStep['name']
             if baseStepName in split:
                 continue
  
             baseExecutor = self._executorDictionary[baseStepName]
             splitting = getTotalExecutorSteps(baseExecutor, argdict=self._argdict)
             if splitting <= 1:
                 continue
  
             msg.info('Splitting {0} into {1} substeps'.format(executionStep, splitting))
             index = self._executorPath.index(executionStep)
             baseStep = self._executorPath.pop(index)
             for i in range(splitting):
                 name = baseStepName + executorStepSuffix + str(i)
                 step = copy.deepcopy(baseStep)
                 step['name'] = name
                 self._executorPath.insert(index + i, step)
                 executor = copy.deepcopy(baseExecutor)
                 executor.name = name
                 executor.conf.executorStep = i
                 executor.conf.totalExecutorSteps = splitting
                 self._executors.add(executor)
                 self._executorDictionary[name] = executor
                 split.append(name)
     

◆ transformSetupCpuTime()

def python.transform.transform.transformSetupCpuTime ( self )

Definition at line 149 of file transform.py.

     def transformSetupCpuTime(self):
         transformSetupCpuTime = None
         if self._transformStart and self._inFileValidationStart:
             transformSetupCpuTime = calcCpuTime(self._transformStart, self._inFileValidationStart)
  
         return transformSetupCpuTime
  

◆ transformSetupWallTime()

def python.transform.transform.transformSetupWallTime ( self )

Definition at line 157 of file transform.py.

     def transformSetupWallTime(self):
         transformSetupWallTime = None
         if self._transformStart and self._inFileValidationStart:
             transformSetupWallTime = calcWallTime(self._transformStart, self._inFileValidationStart)
  
         return transformSetupWallTime
  

◆ transformStart()

def python.transform.transform.transformStart ( self )

Definition at line 145 of file transform.py.

     def transformStart(self):
         return self._transformStart
  

◆ trfPredata()

def python.transform.transform.trfPredata ( self )

Definition at line 201 of file transform.py.

     def trfPredata(self):
         return self._trfPredata
     

◆ updateValidationDict()

def python.transform.transform.updateValidationDict	(	self,
		newValidationOptions
	)

Setter for transform's validation dictionary.

This function updates the validation dictionary for the transform, updating values which are passed in the newValidationOptions argument.

Parameters

newValidationOptions Dictionary (or tuples) to update validation dictionary with

Returns: None

Definition at line 748 of file transform.py.

     def updateValidationDict(self, newValidationOptions):
         self.validation.update(newValidationOptions)
  

◆ validateInFiles()

def python.transform.transform.validateInFiles ( self )

Definition at line 781 of file transform.py.

     def validateInFiles(self):
         if self._inFileValidationStart is None:
             self._inFileValidationStart = os.times()
             msg.debug('inFileValidationStart time is {0}'.format(self._inFileValidationStart))
  
         if (('skipFileValidation' in self._argdict and self._argdict['skipFileValidation'] is True) or
             ('skipInputFileValidation' in self._argdict and self._argdict['skipInputFileValidation'] is True) or
             ('fileValidation' in self._argdict and self._argdict['fileValidation'].value is False) or
             ('inputFileValidation' in self._argdict and self._argdict['inputFileValidation'].value is False)
             ):
             msg.info('Standard input file validation turned off for transform %s.', self.name)
         else:
             msg.info('Validating input files')
             if 'parallelFileValidation' in self._argdict:
                 trfValidation.performStandardFileValidation(dictionary=self._dataDictionary, io='input', parallelMode=self._argdict['parallelFileValidation'].value )
             else:
                 trfValidation.performStandardFileValidation(dictionary=self._dataDictionary, io='input')
  
         self._inFileValidationStop = os.times()
         msg.debug('inFileValidationStop time is {0}'.format(self._inFileValidationStop))
  

◆ validateOutFiles()

def python.transform.transform.validateOutFiles ( self )

Definition at line 802 of file transform.py.

     def validateOutFiles(self):
         if self._outFileValidationStart is None:
             self._outFileValidationStart = os.times()
             msg.debug('outFileValidationStart time is {0}'.format(self._outFileValidationStart))
  
         if (('skipFileValidation' in self._argdict and self._argdict['skipFileValidation'] is True) or
             ('skipOutputFileValidation' in self._argdict and self._argdict['skipOutputFileValidation'] is True) or
             ('fileValidation' in self._argdict and self._argdict['fileValidation'].value is False) or
             ('outputFileValidation' in self._argdict and self._argdict['outputFileValidation'].value is False)
             ):
             msg.info('Standard output file validation turned off for transform %s.', self.name)
         else:
             msg.info('Validating output files')
             parparallelMode = False
             # Make MT file validation default
             parmultithreadedMode = True
             if 'parallelFileValidation' in self._argdict:
                 parparallelMode = self._argdict['parallelFileValidation'].value
             if 'multithreadedFileValidation' in self._argdict:
                 parmultithreadedMode = self._argdict['multithreadedFileValidation'].value
             trfValidation.performStandardFileValidation(dictionary=self._dataDictionary, io='output', parallelMode=parparallelMode, multithreadedMode=parmultithreadedMode)
  
         self._outFileValidationStop = os.times()
         msg.debug('outFileValidationStop time is {0}'.format(self._outFileValidationStop))

Member Data Documentation

◆ _argdict

python.transform.transform._argdict

private

Argument dictionary for this transform.

Definition at line 81 of file transform.py.

◆ _dataDictionary

python.transform.transform._dataDictionary

private

Dsta dictionary place holder (this maps data types to their argFile instances)

Definition at line 84 of file transform.py.

◆ _executorDictionary

python.transform.transform._executorDictionary

private

Definition at line 89 of file transform.py.

◆ _executorGraph

python.transform.transform._executorGraph

private

Note: If we have no real data then add the pseudo datatype NULL, which allows us to manage transforms which can run without data

Definition at line 534 of file transform.py.

◆ _executorPath

python.transform.transform._executorPath

private

Definition at line 576 of file transform.py.

◆ _executors

python.transform.transform._executors

private

Definition at line 88 of file transform.py.

◆ _exitCode

python.transform.transform._exitCode

private

Transform exit code/message holders.

Definition at line 96 of file transform.py.

◆ _exitMsg

python.transform.transform._exitMsg

private

Definition at line 97 of file transform.py.

◆ _inFileValidationStart

python.transform.transform._inFileValidationStart

private

Definition at line 54 of file transform.py.

◆ _inFileValidationStop

python.transform.transform._inFileValidationStop

private

Definition at line 55 of file transform.py.

◆ _inputData

python.transform.transform._inputData

private

Definition at line 498 of file transform.py.

◆ _name

python.transform.transform._name

private

Transform _name.

Definition at line 63 of file transform.py.

◆ _outFileValidationStart

python.transform.transform._outFileValidationStart

private

Definition at line 56 of file transform.py.

◆ _outFileValidationStop

python.transform.transform._outFileValidationStop

private

Definition at line 57 of file transform.py.

◆ _outputData

python.transform.transform._outputData

private

Definition at line 499 of file transform.py.

◆ _processedEvents

python.transform.transform._processedEvents

private

Transform processed events.

Definition at line 103 of file transform.py.

◆ _report

python.transform.transform._report

private

Report object for this transform.

Definition at line 100 of file transform.py.

◆ _transformStart

python.transform.transform._transformStart

private

Get transform starting timestamp as early as possible.

Definition at line 51 of file transform.py.

◆ _trfPredata

python.transform.transform._trfPredata

private

Get trf pre-data as early as possible.

Definition at line 60 of file transform.py.

◆ parser

python.transform.transform.parser

Definition at line 68 of file transform.py.

The documentation for this class was generated from the following file:

transform.py

Public Member Functions

Public Attributes

Private Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ _doSteering()

◆ _exitWithReport()

◆ _setupGraph()

◆ _tracePath()

◆ appendToExecutorSet()

◆ argdict()

◆ dataDictionary()

◆ execute()

◆ executors()

◆ exitCode()

◆ exitMsg()

◆ generateReport()

◆ getFiles()

◆ getProcessedEvents()

◆ getValidationDict()

◆ getValidationOption()

◆ inFileValidationCpuTime()

◆ inFileValidationWallTime()

◆ lastExecuted()

◆ name()

◆ outFileValidationCpuTime()

◆ outFileValidationStop()

◆ outFileValidationWallTime()

◆ parseCmdLineArgs()

◆ processedEvents()

◆ report()

◆ setGlobalLogLevel()

◆ setupSplitting()

◆ transformSetupCpuTime()

◆ transformSetupWallTime()

◆ transformStart()

◆ trfPredata()

◆ updateValidationDict()

◆ validateInFiles()

◆ validateOutFiles()

Member Data Documentation

◆ _argdict

◆ _dataDictionary

◆ _executorDictionary

◆ _executorGraph

◆ _executorPath

◆ _executors

◆ _exitCode

◆ _exitMsg

◆ _inFileValidationStart

◆ _inFileValidationStop

◆ _inputData

◆ _name

◆ _outFileValidationStart

◆ _outFileValidationStop

◆ _outputData

◆ _processedEvents

◆ _report

◆ _transformStart

◆ _trfPredata

◆ parser

◆ init()