Class holding a transform job report. More...

Inheritance diagram for python.trfReports.trfJobReport:

Collaboration diagram for python.trfReports.trfJobReport:

Public Member Functions
def	__init__ (self, parentTrf)
	Constructor. More...

def	python (self, fast=False, fileReport=defaultFileReport)
	generate the python transform job report More...

def	classicEltree (self, fast=False)
	Classic metadata.xml report. More...

def	classicPython (self, fast=False)
	Classic Tier 0 metadata python object. More...

def	roundoff (self, value)

def	__str__ (self)
	String representation of the job report. More...

def	json (self, fast=False)
	Method which returns a JSON representation of a report. More...

def	writeJSONReport (self, filename, sort_keys=True, indent=2, fast=False, fileReport=defaultFileReport)

def	writeTxtReport (self, filename, dumpEnv=True, fast=False, fileReport=defaultFileReport)

def	writeGPickleReport (self, filename, fast=False)

def	writeClassicXMLReport (self, filename, fast=False)

def	writePilotPickleReport (self, filename, fast=False, fileReport=defaultFileReport)

Private Attributes
	_trf

	_precisionDigits

	_dbDataTotal

	_dbTimeTotal

	_dataDictionary

Static Private Attributes
	_reportVersion
	This is the version counter for transform job reports any changes to the format must be reflected by incrementing this. More...

	_metadataKeyMap

	_maxMsgLen

	_truncationMsg

Detailed Description

Class holding a transform job report.

Definition at line 117 of file trfReports.py.

Constructor & Destructor Documentation

◆ init()

def python.trfReports.trfJobReport.__init__	(	self,
		parentTrf
	)

Constructor.

Parameters

parentTrf Mandatory link to the transform this job report represents

Definition at line 127 of file trfReports.py.

     def __init__(self, parentTrf):
         super(trfJobReport, self).__init__()
         self._trf = parentTrf
         self._precisionDigits = 3
         self._dbDataTotal = 0
         self._dbTimeTotal = 0.0
  

Member Function Documentation

◆ str()

def python.trfReports.trfReport.__str__ ( self )

inherited

String representation of the job report.

Uses pprint module to output the python object as text

Note: This is a 'property', so no fast option is available

Definition at line 47 of file trfReports.py.

     def __str__(self):
         return pprint.pformat(self.python())
  

◆ classicEltree()

def python.trfReports.trfJobReport.classicEltree	(	self,
		fast = `False`
	)

Classic metadata.xml report.

Reimplemented from python.trfReports.trfReport.

Definition at line 253 of file trfReports.py.

     def classicEltree(self, fast = False):
         trfTree = ElementTree.Element('POOLFILECATALOG')
         # Extract some executor parameters here
         for exeKey in ('preExec', 'postExec', 'preInclude', 'postInclude'):
             if exeKey in self._trf.argdict:
                 for substep, pyfrag in self._trf.argdict[exeKey].value.items():
                     if substep == 'all':
                         ElementTree.SubElement(trfTree, 'META', type = 'string', name = exeKey, value = str(pyfrag))
                     else:
                         ElementTree.SubElement(trfTree, 'META', type = 'string', name = exeKey + '_' + substep, value = str(pyfrag))
         for exeKey in ('autoConfiguration', 'AMIConfig', 'AMITag'):
             if exeKey in self._trf.argdict:
                 if exeKey in self._metadataKeyMap:
                     classicName = self._metadataKeyMap[exeKey]
                 else:
                     classicName = exeKey
                 ElementTree.SubElement(trfTree, 'META', type = 'string', name = classicName,
                                        value = str(self._trf.argdict[exeKey].value))
  
         # Now add information about output files
         for dataArg in self._trf._dataDictionary.values():
             if isinstance(dataArg, list): # Always skip lists from the report (auxiliary files)
                 continue
             if dataArg.io == 'output':
                 for fileEltree in trfFileReport(dataArg).classicEltreeList(fast = fast):
                     trfTree.append(fileEltree)
  
         return trfTree
  

◆ classicPython()

def python.trfReports.trfJobReport.classicPython	(	self,
		fast = `False`
	)

Classic Tier 0 metadata python object.

Metadata in python nested dictionary form, which will produce a Tier 0 .gpickle when pickled

Reimplemented from python.trfReports.trfReport.

Definition at line 285 of file trfReports.py.

     def classicPython(self, fast = False):
         # Things we can get directly from the transform
         trfDict = {'jobInputs' : [],  # Always empty?
                    'jobOutputs' : [],  # Filled in below...
                    'more' : {'Machine' : 'unknown'},
                    'trfAcronym' : trfExit.codeToName(self._trf.exitCode),
                    'trfCode' :  self._trf.exitCode,
                    'trfExitCode' :  self._trf.exitCode,
                    }
  
         if self._trf.lastExecuted is not None:
             trfDict.update({'athAcronym' : self._trf.lastExecuted.errMsg,
                             'athCode' : self._trf.lastExecuted.rc})
  
  
         # Emulate the NEEDCHECK behaviour
         if hasattr(self._trf, '_executorPath'):
             for executor in self._trf._executorPath:
                 if hasattr(executor, '_logScan') and self._trf.exitCode == 0:
                     if executor._logScan._levelCounter['FATAL'] > 0 or executor._logScan._levelCounter['CRITICAL'] > 0:
                         # This should not happen!
                         msg.warning('Found FATAL/CRITICAL errors and exit code 0 - reseting to TRF_LOGFILE_FAIL')
                         self._trf.exitCode = trfExit.nameToCode('TRF_LOGFILE_FAIL')
                         trfDict['trfAcronym'] = 'TRF_LOGFILE_FAIL'
                     elif executor._logScan._levelCounter['ERROR'] > 0:
                         msg.warning('Found errors in logfile scan - changing exit acronymn to NEEDCHECK.')
                         trfDict['trfAcronym'] = 'NEEDCHECK'
  
         # Now add files
         fileArgs = self._trf.getFiles(io = 'output')
         for fileArg in fileArgs:
             # N.B. In the original Tier 0 gpickles there was executor
             # information added for each file (such as autoConfiguration, preExec).
             # However, Luc tells me it is ignored, so let's not bother.
             trfDict['jobOutputs'].extend(trfFileReport(fileArg).classicPython(fast = fast))
             # AMITag and friends is added per-file, but it's known only to the transform, so set it here:
             for argdictKey in ('AMITag', 'autoConfiguration',):
                 if argdictKey in self._trf.argdict:
                     trfDict['jobOutputs'][-1]['more']['metadata'][argdictKey] = self._trf.argdict[argdictKey].value
             # Mangle substep argumemts back to the old format
             for substepKey in ('preExec', 'postExec', 'preInclude', 'postInclude'):
                 if substepKey in self._trf.argdict:
                     for substep, values in self._trf.argdict[substepKey].value.items():
                         if substep == 'all':
                             trfDict['jobOutputs'][-1]['more']['metadata'][substepKey] = values
                         else:
                             trfDict['jobOutputs'][-1]['more']['metadata'][substepKey + '_' + substep] = values
  
         # Now retrieve the input event count
         nentries = 'UNKNOWN'
         for fileArg in self._trf.getFiles(io = 'input'):
             thisArgNentries = fileArg.nentries
             if isinstance(thisArgNentries, int):
                 if nentries == 'UNKNOWN':
                     nentries = thisArgNentries
                 elif thisArgNentries != nentries:
                     msg.warning('Found a file with different event count than others: {0} != {1} for {2}'.format(thisArgNentries, nentries, fileArg))
                     # Take highest number?
                     if thisArgNentries > nentries:
                         nentries = thisArgNentries
         trfDict['nevents'] = nentries
  
         # Tier 0 expects the report to be in a top level dictionary under the prodsys key
         return {'prodsys' : trfDict}
  

◆ json()

def python.trfReports.trfReport.json	(	self,
		fast = `False`
	)

inherited

Method which returns a JSON representation of a report.

Parameters

fast	Boolean which forces the fastest possible report to be written

Calls json.dumps on the python representation

Definition at line 58 of file trfReports.py.

     def json(self, fast = False):
         return json.dumps(self.python, type)
  

◆ python()

def python.trfReports.trfJobReport.python	(	self,
		fast = `False`,
		fileReport = `defaultFileReport`
	)

generate the python transform job report

Parameters

type	The general type of this report (e.g. fast)
fileReport	Dictionary giving the type of report to make for each type of file. This dictionary has to have all io types as keys and valid values are: `None` - skip this io type; `'full'` - Provide all details; `'name'` - only dataset and filename will be reported on.

Reimplemented from python.trfReports.trfReport.

Definition at line 140 of file trfReports.py.

     def python(self, fast = False, fileReport = defaultFileReport):
         myDict = {'name': self._trf.name,
                   'reportVersion': self._reportVersion,
                   'cmdLine': ' '.join(shQuoteStrings(sys.argv)),
                   'exitAcronym': trfExit.codeToName(self._trf.exitCode),
                   'exitCode': self._trf.exitCode,
                   'created': isodate(),
                   'resource': {'executor': {}, 'transform': {}},
                   'files': {}
                   }
         if len(self._trf.exitMsg) > self._maxMsgLen:
             myDict['exitMsg'] = self._trf.exitMsg[:self._maxMsgLen-len(self._truncationMsg)] + self._truncationMsg
             myDict['exitMsgExtra'] = self._trf.exitMsg[self._maxMsgLen-len(self._truncationMsg):]
         else:
             myDict['exitMsg'] = self._trf.exitMsg
             myDict['exitMsgExtra'] = ""
  
         # Iterate over files
         for fileType in ('input', 'output', 'temporary'):
             if fileReport[fileType]:
                 myDict['files'][fileType] = []
         # Should have a dataDictionary, unless something went wrong very early...
         for dataType, dataArg in self._trf._dataDictionary.items():
             if isinstance(dataArg, list): # Always skip lists from the report (auxiliary files)
                 continue
             if dataArg.auxiliaryFile: # Always skip auxilliary files from the report
                 continue
             if fileReport[dataArg.io]:
                 entry = {"type": dataType}
                 entry.update(trfFileReport(dataArg).python(fast = fast, type = fileReport[dataArg.io]))
                 # Supress RAW if all subfiles had nentries == 0
                 if 'subFiles' in entry and len(entry['subFiles']) == 0 and isinstance(dataArg, trfArgClasses.argBSFile) :
                     msg.info('No subFiles for entry {0}, suppressing from report.'.format(entry['argName']))
                 else:
                     myDict['files'][dataArg.io].append(entry)
  
         # We report on all executors, in execution order
         myDict['executor'] = []
         if hasattr(self._trf, '_executorPath'):
             for executionStep in self._trf._executorPath:
                 exe = self._trf._executorDictionary[executionStep['name']]
                 myDict['executor'].append(trfExecutorReport(exe).python(fast = fast))
                 # Executor resources are gathered here to unify where this information is held
                 # and allow T0/PanDA to just store this JSON fragment on its own
                 myDict['resource']['executor'][exe.name] = exeResourceReport(exe, self)
                 for mergeStep in exe.myMerger:
                     myDict['resource']['executor'][mergeStep.name] = exeResourceReport(mergeStep, self)
             if self._dbDataTotal > 0 or self._dbTimeTotal > 0:
                 myDict['resource']['dbDataTotal'] = self._dbDataTotal
                 myDict['resource']['dbTimeTotal'] = self.roundoff(self._dbTimeTotal)
         # Resource consumption
         reportTime = os.times()
  
         # Calculate total cpu time we used -
         myCpuTime = reportTime[0] + reportTime[1]
         childCpuTime = reportTime[2] + reportTime[3]
         wallTime = reportTime[4] - self._trf.transformStart[4]
         cpuTime = myCpuTime
         cpuTimeTotal = 0
         cpuTimePerWorker = myCpuTime
         maxWorkers = 1
         msg.debug('Raw cpu resource consumption: transform {0}, children {1}'.format(myCpuTime, childCpuTime))
         # Reduce childCpuTime by times reported in the executors (broken for MP...?)
         for exeName, exeReport in myDict['resource']['executor'].items():
             if 'mpworkers' in exeReport:
                 if exeReport['mpworkers'] > maxWorkers : maxWorkers = exeReport['mpworkers']
             try:
                 msg.debug('Subtracting {0}s time for executor {1}'.format(exeReport['cpuTime'], exeName))
                 childCpuTime -= exeReport['cpuTime']
             except TypeError:
                 pass
             try:
                 cpuTime += exeReport['cpuTime']
                 cpuTimeTotal += exeReport['total']['cpuTime']
                 if 'cpuTimePerWorker' in exeReport:
                     msg.debug('Adding {0}s to cpuTimePerWorker'.format(exeReport['cpuTimePerWorker']))
                     cpuTimePerWorker += exeReport['cpuTimePerWorker']
                 else:
                     msg.debug('Adding nonMP cpuTime {0}s to cpuTimePerWorker'.format(exeReport['cpuTime']))
                     cpuTimePerWorker += exeReport['cpuTime']
             except TypeError:
                 pass
  
         msg.debug('maxWorkers: {0}, cpuTimeTotal: {1}, cpuTimePerWorker: {2}'.format(maxWorkers, cpuTime, cpuTimePerWorker))
         reportGenerationCpuTime = reportGenerationWallTime = None
         if self._trf.outFileValidationStop and reportTime:
             reportGenerationCpuTime = calcCpuTime(self._trf.outFileValidationStop, reportTime)
             reportGenerationWallTime = calcWallTime(self._trf.outFileValidationStop, reportTime)
  
         myDict['resource']['transform'] = {'cpuTime': self.roundoff(myCpuTime),
                                            'cpuTimeTotal': self.roundoff(cpuTimeTotal),
                                            'externalCpuTime': self.roundoff(childCpuTime),
                                            'wallTime': self.roundoff(wallTime),
                                            'transformSetup': {'cpuTime': self.roundoff(self._trf.transformSetupCpuTime),
                                                               'wallTime': self.roundoff(self._trf.transformSetupWallTime)},
                                            'inFileValidation': {'cpuTime': self.roundoff(self._trf.inFileValidationCpuTime),
                                                                 'wallTime': self.roundoff(self._trf.inFileValidationWallTime)},
                                            'outFileValidation': {'cpuTime': self.roundoff(self._trf.outFileValidationCpuTime),
                                                                  'wallTime': self.roundoff(self._trf.outFileValidationWallTime)},
                                            'reportGeneration': {'cpuTime': self.roundoff(reportGenerationCpuTime),
                                                                 'wallTime': self.roundoff(reportGenerationWallTime)}, }
         if self._trf.processedEvents:
             myDict['resource']['transform']['processedEvents'] = self._trf.processedEvents
         myDict['resource']['transform']['trfPredata'] = self._trf.trfPredata
         # check for devision by zero for fast jobs, unit tests
         if wallTime > 0:
             myDict['resource']['transform']['cpuEfficiency'] = round(cpuTime/maxWorkers/wallTime, 4)
             myDict['resource']['transform']['cpuPWEfficiency'] = round(cpuTimePerWorker/wallTime, 4)
         myDict['resource']['machine'] = machineReport().python(fast = fast)
  
         return myDict
  

◆ roundoff()

def python.trfReports.trfJobReport.roundoff	(	self,
		value
	)

Definition at line 352 of file trfReports.py.

     def roundoff(self, value):
         return round(value, self._precisionDigits) if (value is not None) else value

◆ writeClassicXMLReport()

def python.trfReports.trfReport.writeClassicXMLReport	(	self,
		filename,
		fast = `False`
	)

inherited

Definition at line 104 of file trfReports.py.

     def writeClassicXMLReport(self, filename, fast = False):
         with open(filename, 'w') as report:
             print(prettyXML(self.classicEltree(fast = fast), poolFileCatalogFormat = True), file=report)
  

◆ writeGPickleReport()

def python.trfReports.trfReport.writeGPickleReport	(	self,
		filename,
		fast = `False`
	)

inherited

Definition at line 100 of file trfReports.py.

     def writeGPickleReport(self, filename, fast = False):
         with open(filename, 'wb') as report:
             pickle.dump(self.classicPython(fast = fast), report)
  

◆ writeJSONReport()

def python.trfReports.trfReport.writeJSONReport	(	self,
		filename,
		sort_keys = `True`,
		indent = `2`,
		fast = `False`,
		fileReport = `defaultFileReport`
	)

inherited

Definition at line 71 of file trfReports.py.

     def writeJSONReport(self, filename, sort_keys = True, indent = 2, fast = False, fileReport = defaultFileReport):
         with open(filename, 'w') as report:
             try:
                 if not self._dataDictionary:
                     self._dataDictionary = self.python(fast=fast, fileReport=fileReport)
  
                 json.dump(self._dataDictionary, report, sort_keys = sort_keys, indent = indent)
             except TypeError as e:
                 # TypeError means we had an unserialisable object - re-raise as a trf internal
                 message = 'TypeError raised during JSON report output: {0!s}'.format(e)
                 msg.error(message)
                 raise trfExceptions.TransformReportException(trfExit.nameToCode('TRF_INTERNAL_REPORT_ERROR'), message)
  

◆ writePilotPickleReport()

def python.trfReports.trfReport.writePilotPickleReport	(	self,
		filename,
		fast = `False`,
		fileReport = `defaultFileReport`
	)

inherited

Definition at line 108 of file trfReports.py.

     def writePilotPickleReport(self, filename, fast = False, fileReport = defaultFileReport):
         with open(filename, 'w') as report:
             if not self._dataDictionary:
                 self._dataDictionary = self.python(fast = fast, fileReport = fileReport)
  
             pickle.dump(self._dataDictionary, report)
  
  

◆ writeTxtReport()

def python.trfReports.trfReport.writeTxtReport	(	self,
		filename,
		dumpEnv = `True`,
		fast = `False`,
		fileReport = `defaultFileReport`
	)

inherited

Definition at line 84 of file trfReports.py.

     def writeTxtReport(self, filename, dumpEnv = True, fast = False, fileReport = defaultFileReport):
         with open(filename, 'w') as report:
             if not self._dataDictionary:
                 self._dataDictionary = self.python(fast = fast, fileReport = fileReport)
  
             print('# {0} file generated on'.format(self.__class__.__name__), isodate(), file=report)
             print(pprint.pformat(self._dataDictionary), file=report)
             if dumpEnv:
                 print('# Environment dump', file=report)
                 eKeys = list(os.environ)
                 eKeys.sort()
                 for k in eKeys:
                     print('%s=%s' % (k, os.environ[k]), file=report)
             print('# Machine report', file=report)
             print(pprint.pformat(machineReport().python(fast = fast)), file=report)
  

Member Data Documentation

◆ _dataDictionary

python.trfReports.trfReport._dataDictionary

privateinherited

Definition at line 41 of file trfReports.py.

◆ _dbDataTotal

python.trfReports.trfJobReport._dbDataTotal

private

Definition at line 131 of file trfReports.py.

◆ _dbTimeTotal

python.trfReports.trfJobReport._dbTimeTotal

private

Definition at line 132 of file trfReports.py.

◆ _maxMsgLen

python.trfReports.trfJobReport._maxMsgLen

staticprivate

Definition at line 122 of file trfReports.py.

◆ _metadataKeyMap

python.trfReports.trfJobReport._metadataKeyMap

staticprivate

Definition at line 121 of file trfReports.py.

◆ _precisionDigits

python.trfReports.trfJobReport._precisionDigits

private

Definition at line 130 of file trfReports.py.

◆ _reportVersion

python.trfReports.trfJobReport._reportVersion

staticprivate

This is the version counter for transform job reports any changes to the format must be reflected by incrementing this.

Definition at line 120 of file trfReports.py.

◆ _trf

python.trfReports.trfJobReport._trf

private

Definition at line 129 of file trfReports.py.

◆ _truncationMsg

python.trfReports.trfJobReport._truncationMsg

staticprivate

Definition at line 123 of file trfReports.py.

The documentation for this class was generated from the following file:

trfReports.py

Public Member Functions

Private Attributes

Static Private Attributes

Detailed Description

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ __str__()

◆ classicEltree()

◆ classicPython()

◆ json()

◆ python()

◆ roundoff()

◆ writeClassicXMLReport()

◆ writeGPickleReport()

◆ writeJSONReport()

◆ writePilotPickleReport()

◆ writeTxtReport()

Member Data Documentation

◆ _dataDictionary

◆ _dbDataTotal

◆ _dbTimeTotal

◆ _maxMsgLen

◆ _metadataKeyMap

◆ _precisionDigits

◆ _reportVersion

◆ _trf

◆ _truncationMsg

◆ init()

◆ str()