ATLAS Offline Software
Public Member Functions | Public Attributes | Private Member Functions | Private Attributes | List of all members
python.trfArgClasses.argFile Class Reference

File argument class. More...

Inheritance diagram for python.trfArgClasses.argFile:
Collaboration diagram for python.trfArgClasses.argFile:

Public Member Functions

def __init__ (self, value=list(), type=None, subtype=None, io='output', splitter=',', runarg=True, guid=None, multipleOK=None, name=None, executor=list(), mergeTargetSize=-1, auxiliaryFile=False)
 Initialise an argFile. More...
 
def value (self)
 Argument value getter. More...
 
def value (self, value)
 Argument value setter. More...
 
def multipleOK (self)
 multipleOK getter More...
 
def multipleOK (self, value)
 multipleOK value setter More...
 
def mergeTargetSize (self)
 mergeTargeSize value getter More...
 
def mergeTargetSize (self, value)
 mergeTargeSize value setter More...
 
def prodsysDescription (self)
 
def executor (self)
 Executor status getter. More...
 
def valueSetter (self, value)
 Set the argFile value, but allow parameters here. More...
 
def io (self)
 
def io (self, value)
 
def dataset (self)
 
def dataset (self, value)
 
def orignalName (self)
 
def originalName (self, value)
 
def type (self)
 
def type (self, value)
 
def subtype (self)
 
def subtype (self, value)
 
def name (self)
 Name getter. More...
 
def name (self, value)
 Name setter. More...
 
def auxiliaryFile (self)
 
def metadata (self)
 Returns the whole kit and kaboodle... More...
 
def nentries (self)
 Return total number of events in all constituent files. More...
 
def getnentries (self, fast=False)
 Explicit getter, offering fast switch. More...
 
def getMetadata (self, files=None, metadataKeys=None, maskMetadataKeys=None, populate=True, flush=False)
 Return specific keys for specific files. More...
 
def getSingleMetadata (self, fname, metadataKey, populate=True, flush=False)
 Convenience function to extract a single metadata key for a single file. More...
 
def isCached (self, files=None, metadataKeys=None)
 Test if certain metadata elements are already cached. More...
 
def __str__ (self)
 String representation of a file argument. More...
 
def append (self, addme)
 Append a value to the list. More...
 
def __repr__ (self)
 Repr conversion. More...
 
def isRunarg (self)
 Return runarg status. More...
 
def __eq__ (self, other)
 Comparison is based on value attribute. More...
 
def __nq__ (self, other)
 
def __lt__ (self, other)
 
def __gt__ (self, other)
 

Public Attributes

 io
 
 dataset
 
 value
 

Private Member Functions

def _resetMetadata (self, files=[])
 Resets all metadata files in this instance. More...
 
def _readMetadata (self, files, metadataKeys)
 Check metadata is in the cache or generate it if it's missing. More...
 
def _setMetadata (self, files=None, metadataKeys={})
 Set metadata values into the cache. More...
 
def _getDatasetFromFilename (self, reset=False)
 Look for dataset name in dataset::filename Tier0 convention. More...
 
def _getSize (self, files)
 Determines the size of files. More...
 
def _getIntegrity (self, files)
 File integrity checker. More...
 
def _generateGUID (self, files)
 Generate a GUID on demand - no intrinsic for this file type
More...
 
def _exists (self, files)
 Try to determine if a file actually exists... More...
 
def _mergeArgs (self, argdict, copyArgs=None)
 Utility to strip arguments which should not be passed to the selfMerge methods of our child classes. More...
 

Private Attributes

 _dataset
 
 _urlType
 
 _type
 
 _subtype
 
 _guid
 
 _mergeTargetSize
 
 _auxiliaryFile
 
 _originalName
 
 _exe
 
 _metadataKeys
 
 _fileMetadata
 
 _io
 Input file globbing and expansion. More...
 
 _multipleOK
 
 _value
 
 _name
 
 _splitter
 
 _supressEmptyStrings
 
 _runarg
 

Detailed Description

File argument class.

Inherits from argList

Definition at line 522 of file trfArgClasses.py.

Constructor & Destructor Documentation

◆ __init__()

def python.trfArgClasses.argFile.__init__ (   self,
  value = list(),
  type = None,
  subtype = None,
  io = 'output',
  splitter = ',',
  runarg = True,
  guid = None,
  multipleOK = None,
  name = None,
  executor = list(),
  mergeTargetSize = -1,
  auxiliaryFile = False 
)

Initialise an argFile.

Parameters
ioinput, output or temporary file, default output.
splitterchanges character a string list is split on, default is a comma (see argList).
typeDatatype in this instance - this should be the major datatype (ESD, AOD, etc).
subtypeThe data subtype, which should match the DATATYPE portion of the corresponding argument name, e.g., outputDESD_SGLMUFile -> DESD_SGLMU
guidThis is a non-standard option and allows the GUID for files without an intrinsic GUID to be set explicitly at initialisation. The parameter should be a dictionary, keyed by filename, which contains the GUID string, e.g., {'file1' : '930de3de-de8d-4819-9129-beef3bb4fadb', 'file2' : ... }
multipleOKExplicit declaration of whether multiple arguments are allowed; default is True for input, False for output and temporary
nameThe corresponding key for this argument in the argdict of the transform (e.g., inputESDFile)
executorList of execution substeps where this file type should be added explicitly (e.g., minbias HITS for digitisation)
mergeTargetSizeTarget merge size if this instance supports a selfMerge method. Value is in bytes, with the special values: -1 Always merge to a single file, 0 never merge these files
auxiliaryFileIs set to True then all validation for this file is disabled - used for non-primary input files, e.g., pileup inputs
Note
When used in argument parser, set nargs='+' to get auto-concatenation of multiple arguments (should be used when multipleOK is True)

Definition at line 544 of file trfArgClasses.py.

544  def __init__(self, value=list(), type=None, subtype=None, io = 'output', splitter=',', runarg=True, guid=None,
545  multipleOK = None, name=None, executor=list(), mergeTargetSize=-1, auxiliaryFile=False):
546  # Set these values before invoking super().__init__ to make sure they can be
547  # accessed in our setter
548  self._dataset = None
549  self._urlType = None
550  self._type = type
551  self._subtype = subtype
552  self._guid = guid
553  self._mergeTargetSize = mergeTargetSize
554  self._auxiliaryFile = auxiliaryFile
555  self._originalName = None
556 
557  # User setter to get valid value check
558  self.io = io
559 
560  self._exe = executor
561 
562 
569 
570  self._metadataKeys = {'file_size': self._getSize,
571  'integrity': self._getIntegrity,
572  'file_guid': self._generateGUID,
573  '_exists': self._exists,
574  }
575  self._fileMetadata = {}
576  if multipleOK is None:
577  if self._io == 'input':
578  self._multipleOK = True
579  else:
580  self._multipleOK = False
581  else:
582  self._multipleOK = multipleOK
583 
584 
585  super(argFile, self).__init__(value=value, splitter=splitter, runarg=runarg, name=name)
586 
587 

Member Function Documentation

◆ __eq__()

def python.trfArgClasses.argument.__eq__ (   self,
  other 
)
inherited

Comparison is based on value attribute.

Definition at line 161 of file trfArgClasses.py.

161  def __eq__(self,other):
162  return self.value == other.value
163 

◆ __gt__()

def python.trfArgClasses.argument.__gt__ (   self,
  other 
)
inherited

Definition at line 170 of file trfArgClasses.py.

170  def __gt__(self, other):
171  return self.value > other.value
172 

◆ __lt__()

def python.trfArgClasses.argument.__lt__ (   self,
  other 
)
inherited

Definition at line 167 of file trfArgClasses.py.

167  def __lt__(self, other):
168  return self.value < other.value
169 

◆ __nq__()

def python.trfArgClasses.argument.__nq__ (   self,
  other 
)
inherited

Definition at line 164 of file trfArgClasses.py.

164  def __nq__(self, other):
165  return self.value != other.value
166 

◆ __repr__()

def python.trfArgClasses.argList.__repr__ (   self)
inherited

Repr conversion.

Return a python parsable string

Reimplemented from python.trfArgClasses.argument.

Definition at line 409 of file trfArgClasses.py.

409  def __repr__(self):
410  return '[' + ','.join([ repr(s) for s in self._value ]) + ']'
411 
412 

◆ __str__()

def python.trfArgClasses.argFile.__str__ (   self)

String representation of a file argument.

Reimplemented from python.trfArgClasses.argList.

Definition at line 1201 of file trfArgClasses.py.

1201  def __str__(self):
1202  return "{0}={1} (Type {2}, Dataset {3}, IO {4})".format(self.name, self.value, self.type, self.dataset, self.io)
1203 
1204 

◆ _exists()

def python.trfArgClasses.argFile._exists (   self,
  files 
)
private

Try to determine if a file actually exists...

For a posix file, just call stat; for anything else call TFile.Open A small optimisation is to retieve the file_size metadatum at the same time.

Parameters

Definition at line 1176 of file trfArgClasses.py.

1176  def _exists(self, files):
1177  msg.debug('Testing existance for {0}'.format(files))
1178  for fname in files:
1179  if self._urlType == 'posix':
1180  try:
1181  size = os.stat(fname).st_size
1182  self._fileMetadata[fname]['file_size'] = size
1183  self._fileMetadata[fname]['_exists'] = True
1184  msg.debug('POSIX file {0} exists'.format(fname))
1185  except OSError as e:
1186  msg.error('Got exception {0!s} raised while stating file {1} - probably it does not exist'.format(e, fname))
1187  self._fileMetadata[fname]['_exists'] = False
1188  else:
1189  # OK, let's see if ROOT can do it...
1190  msg.debug('Calling ROOT TFile.GetSize({0})'.format(fname))
1191  size = ROOTGetSize(fname)
1192  if size is None:
1193  self._fileMetadata[fname]['_exists'] = False
1194  msg.error('Non-POSIX file {0} could not be opened - probably it does not exist'.format(fname))
1195  else:
1196  msg.debug('Non-POSIX file {0} exists'.format(fname))
1197  self._fileMetadata[fname]['file_size'] = size
1198  self._fileMetadata[fname]['_exists'] = True
1199 

◆ _generateGUID()

def python.trfArgClasses.argFile._generateGUID (   self,
  files 
)
private

Generate a GUID on demand - no intrinsic for this file type

Use uuid.uuid4() call to generate a GUID

Note
This generation method will be superceeded in any file type which actually has an intrinsic GUID (e.g. BS or POOL files)

Definition at line 1165 of file trfArgClasses.py.

1165  def _generateGUID(self, files):
1166  for fname in files:
1167  msg.debug('Generating a GUID for file {0}'.format(fname))
1168  self._fileMetadata[fname]['file_guid'] = str(uuid.uuid4()).upper()
1169 
1170 

◆ _getDatasetFromFilename()

def python.trfArgClasses.argFile._getDatasetFromFilename (   self,
  reset = False 
)
private

Look for dataset name in dataset::filename Tier0 convention.

At the moment all files must be in the same dataset if it's specified. (To change this dataset will need to become a per-file metadatum.)

Note
dsn::lfn notation must be used for all input values and all dsn values must be the same
Parameters

Definition at line 1089 of file trfArgClasses.py.

1089  def _getDatasetFromFilename(self, reset = False):
1090  if reset:
1091  self._dataset = None
1092  newValue = []
1093  for filename in self._value:
1094  if filename.find('#') > -1:
1095  (dataset, fname) = filename.split('#', 1)
1096  newValue.append(fname)
1097  msg.debug('Current dataset: {0}; New dataset {1}'.format(self._dataset, dataset))
1098  if self._dataset and (self._dataset != dataset):
1099  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_ARG_DATASET'),
1100  'Found inconsistent dataset assignment in argFile setup: %s != %s' % (self._dataset, dataset))
1101  self._dataset = dataset
1102  if len(newValue) == 0:
1103  return
1104  elif len(newValue) != len (self._value):
1105  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_ARG_DATASET'),
1106  'Found partial dataset assignment in argFile setup from {0} (dsn#lfn notation must be uniform for all inputs)'.format(self._value))
1107  self._value = newValue
1108 

◆ _getIntegrity()

def python.trfArgClasses.argFile._getIntegrity (   self,
  files 
)
private

File integrity checker.

For a 'plain' file, integrity just checks that we can read it

Parameters

Reimplemented in python.trfArgClasses.argBZ2File, python.trfArgClasses.argNTUPFile, python.trfArgClasses.argHISTFile, python.trfArgClasses.argPOOLFile, and python.trfArgClasses.argBSFile.

Definition at line 1131 of file trfArgClasses.py.

1131  def _getIntegrity(self, files):
1132  for fname in files:
1133  is_binary = False
1134  with open(fname) as f:
1135  try:
1136  while True:
1137  chunk = len(f.read(1024*1024))
1138  msg.debug('Read {0} bytes from {1}'.format(chunk, fname))
1139  if chunk == 0:
1140  break
1141  self._fileMetadata[fname]['integrity'] = True
1142  except OSError as e:
1143  msg.error('Got exception {0!s} raised while checking integrity of file {1}'.format(e, fname))
1144  self._fileMetadata[fname]['integrity'] = False
1145  except UnicodeDecodeError:
1146  msg.debug('Problem reading file as unicode, attempting with binary')
1147  is_binary = True
1148  if is_binary:
1149  with open(fname,'rb') as f:
1150  try:
1151  while True:
1152  chunk = len(f.read(1024*1024))
1153  msg.debug('Read {0} bytes from {1}'.format(chunk, fname))
1154  if chunk == 0:
1155  break
1156  self._fileMetadata[fname]['integrity'] = True
1157  except OSError as e:
1158  msg.error('Got exception {0!s} raised while checking integrity of file {1}'.format(e, fname))
1159  self._fileMetadata[fname]['integrity'] = False
1160 

◆ _getSize()

def python.trfArgClasses.argFile._getSize (   self,
  files 
)
private

Determines the size of files.

Currently only for statable files (posix fs). Caches the

Parameters
filesList of paths to the files for which the size is determined.
Returns
None (internal self._fileMetadata cache is updated)

Definition at line 1113 of file trfArgClasses.py.

1113  def _getSize(self, files):
1114  for fname in files:
1115  if self._urlType == 'posix':
1116  try:
1117  self._fileMetadata[fname]['size'] = os.stat(fname).st_size
1118  except OSError as e:
1119  msg.error('Got exception {0!s} raised while stating file {1}'.format(e, fname))
1120  self._fileMetadata[fname]['size'] = None
1121  else:
1122  # OK, let's see if ROOT can do it...
1123  msg.debug('Calling ROOT TFile.GetSize({0})'.format(fname))
1124  self._fileMetadata[fname]['size'] = ROOTGetSize(fname)
1125 
1126 

◆ _mergeArgs()

def python.trfArgClasses.argFile._mergeArgs (   self,
  argdict,
  copyArgs = None 
)
private

Utility to strip arguments which should not be passed to the selfMerge methods of our child classes.

Parameters
copyArgsIf None copy all arguments by default, otherwise only copy the listed keys

Definition at line 1209 of file trfArgClasses.py.

1209  def _mergeArgs(self, argdict, copyArgs=None):
1210  if copyArgs:
1211  myargdict = {}
1212  for arg in copyArgs:
1213  if arg in argdict:
1214  myargdict[arg] = copy.copy(argdict[arg])
1215 
1216  else:
1217  myargdict = copy.copy(argdict)
1218  # Never do event count checks for self merging
1219  myargdict['checkEventCount'] = argSubstepBool('False', runarg=False)
1220  newopts = []
1221  if 'athenaopts' in myargdict:
1222  # Need to ensure that "nprocs" is not passed to merger
1223  # and prevent multiple '--threads' options when there are multiple sub-steps in 'athenopts'
1224  for subStep in myargdict['athenaopts'].value:
1225  hasNprocs = False
1226  hasNthreads = False
1227  for opt in myargdict['athenaopts'].value[subStep]:
1228  if opt.startswith('--nprocs'):
1229  hasNprocs = True
1230  continue
1231  # Keep at least one '--threads'
1232  elif opt.startswith('--threads'):
1233  hasNthreads = True
1234  if opt in newopts:
1235  continue
1236  newopts.append(opt)
1237  # If we have hybrid MP+MT job make sure --threads is not passed to merger
1238  if hasNprocs and hasNthreads:
1239  tmpopts = []
1240  for opt in newopts:
1241  if opt.startswith('--threads'):
1242  continue
1243  tmpopts.append(opt)
1244  newopts = tmpopts
1245  myargdict['athenaopts'] = argSubstepList(newopts, runarg=False)
1246  return myargdict
1247 
1248 

◆ _readMetadata()

def python.trfArgClasses.argFile._readMetadata (   self,
  files,
  metadataKeys 
)
private

Check metadata is in the cache or generate it if it's missing.

Returns
: dictionary of files with metadata, for any unknown keys 'UNDEFINED' is returned

Definition at line 996 of file trfArgClasses.py.

996  def _readMetadata(self, files, metadataKeys):
997  msg.debug('Retrieving metadata keys {1!s} for files {0!s}'.format(files, metadataKeys))
998  for fname in files:
999  if fname not in self._fileMetadata:
1000  self._fileMetadata[fname] = {}
1001  for fname in files:
1002  # Always try for a simple existence test first before producing misleading error messages
1003  # from metadata populator functions
1004  if '_exists' not in self._fileMetadata[fname]:
1005  self._metadataKeys['_exists'](files)
1006  if self._fileMetadata[fname]['_exists'] is False:
1007  # N.B. A log ERROR message has printed by the existence test, so do not repeat that news here
1008  for key in metadataKeys:
1009  if key != '_exists':
1010  self._fileMetadata[fname][key] = None
1011  else:
1012  # OK, file seems to exist at least...
1013  for key in metadataKeys:
1014  if key not in self._metadataKeys:
1015  msg.debug('Metadata key {0} is unknown for {1}'.format(key, self.__class__.__name__))
1016  self._fileMetadata[fname][key] = 'UNDEFINED'
1017  else:
1018  if key in self._fileMetadata[fname]:
1019  msg.debug('Found cached value for {0}:{1} = {2!s}'.format(fname, key, self._fileMetadata[fname][key]))
1020  else:
1021  msg.debug('No cached value for {0}:{1}. Calling generator function {2} ({3})'.format(fname, key, self._metadataKeys[key].__name__, self._metadataKeys[key]))
1022  try:
1023  # For efficiency call this routine with all files we have
1024  msg.info("Metadata generator called to obtain {0} for {1}".format(key, files))
1025  self._metadataKeys[key](files)
1026  except trfExceptions.TransformMetadataException as e:
1027  msg.error('Calling {0!s} raised an exception: {1!s}'.format(self._metadataKeys[key].__name__, e))
1028  if key not in self._fileMetadata[fname]:
1029  msg.warning('Call to function {0} for {1} file {2} failed to populate metadata key {3}'.format(self._metadataKeys[key].__name__, self.__class__.__name__, fname, key))
1030  self._fileMetadata[fname][key] = None
1031  msg.debug('Now have {0}:{1} = {2!s}'.format(fname, key, self._fileMetadata[fname][key]))
1032 
1033 

◆ _resetMetadata()

def python.trfArgClasses.argFile._resetMetadata (   self,
  files = [] 
)
private

Resets all metadata files in this instance.

Metadata dictionary entry is reset for any files given (default all files) and any files that are no longer in this instance have any metadata removed (useful for self merging).

Note
Metadata is set to {}, except for the case when an explicit GUID option was given

Definition at line 913 of file trfArgClasses.py.

913  def _resetMetadata(self, files=[]):
914  if files == [] or '_fileMetadata' not in dir(self):
915  self._fileMetadata = {}
916  for fname in self.value:
917  self._fileMetadata[fname] = {}
918  else:
919  for fname in files:
920  if fname in self.value:
921  self._fileMetadata[fname] = {}
922  elif fname in self._fileMetadata:
923  del self._fileMetadata[fname]
924  msg.debug('Metadata dictionary now {0}'.format(self._fileMetadata))
925 
926  # If we have the special guid option, then manually try to set GUIDs we find
927  if self._guid is not None:
928  msg.debug('Now trying to set file GUID metadata using {0}'.format(self._guid))
929  for fname, guid in self._guid.items():
930  if fname in self._value:
931  self._fileMetadata[fname]['file_guid'] = guid
932  else:
933  msg.warning('Explicit GUID {0} was passed for file {1}, but this file is not a member of this instance'.format(guid, fname))
934 

◆ _setMetadata()

def python.trfArgClasses.argFile._setMetadata (   self,
  files = None,
  metadataKeys = {} 
)
private

Set metadata values into the cache.

Manually sets the metadata cache values to the values given in the metadata key dictionary here. This is useful for setting values to make checks on file metadata handling.

Note
To really suppress any external function calls that gather metadata be careful to also set the _exists metadatum to True.
Warning
No checks are done on the values or keys given here, so you'd better know what you are doing.
Parameters
filesFiles to set metadata for (None means "all")
metadataKeysDictionary with metadata keys and values

Definition at line 1044 of file trfArgClasses.py.

1044  def _setMetadata(self, files=None, metadataKeys={}):
1045  if files is None:
1046  files = self._value
1047  for fname in files:
1048  if fname not in self._fileMetadata:
1049  self._fileMetadata[fname] = {}
1050  for k, v in metadataKeys.items():
1051  msg.debug('Manualy setting {0} for file {1} to {2}'.format(k, fname, v))
1052  self._fileMetadata[fname][k] = v
1053 
1054 

◆ append()

def python.trfArgClasses.argList.append (   self,
  addme 
)
inherited

Append a value to the list.

Parameters
addmeItem to add

Definition at line 398 of file trfArgClasses.py.

398  def append(self, addme):
399  self._value.append(addme)
400 

◆ auxiliaryFile()

def python.trfArgClasses.argFile.auxiliaryFile (   self)

Definition at line 874 of file trfArgClasses.py.

874  def auxiliaryFile(self):
875  return self._auxiliaryFile
876 

◆ dataset() [1/2]

def python.trfArgClasses.argFile.dataset (   self)

Definition at line 814 of file trfArgClasses.py.

814  def dataset(self):
815  return self._dataset
816 

◆ dataset() [2/2]

def python.trfArgClasses.argFile.dataset (   self,
  value 
)

Definition at line 818 of file trfArgClasses.py.

818  def dataset(self, value):
819  self._dataset = value
820 

◆ executor()

def python.trfArgClasses.argFile.executor (   self)

Executor status getter.

Definition at line 638 of file trfArgClasses.py.

638  def executor(self):
639  return self._exe
640 

◆ getMetadata()

def python.trfArgClasses.argFile.getMetadata (   self,
  files = None,
  metadataKeys = None,
  maskMetadataKeys = None,
  populate = True,
  flush = False 
)

Return specific keys for specific files.

Parameters
filesList of files to return metadata for (default - all files in this instance)
metadataKeysKeys to return (default - all keys valid for this class of files)
maskMetadataKeysKeys to NOT return (useful when metadataKeys is left as default)
populateIf missing keys should be generated by calling the population subroutines
flushIf cached data should be flushed and the generators rerun

Definition at line 941 of file trfArgClasses.py.

941  def getMetadata(self, files = None, metadataKeys = None, maskMetadataKeys = None, populate = True, flush = False):
942  # Normalise the files and keys parameter
943  if files is None:
944  files = self._value
945  elif isinstance(files, str):
946  files = (files,)
947  msg.debug('getMetadata will examine these files: {0!s}'.format(files))
948 
949  if metadataKeys is None:
950  metadataKeys = list(self._metadataKeys)
951  elif isinstance(metadataKeys, str):
952  metadataKeys = [metadataKeys,]
953  if maskMetadataKeys is not None:
954  metadataKeys = [k for k in metadataKeys if k not in maskMetadataKeys]
955  msg.debug('getMetadata will retrieve these keys: {0!s}'.format(metadataKeys))
956 
957  if flush is True:
958  msg.debug('Flushing cached metadata values')
959  self._resetMetadata()
960 
961  if populate is True:
962  msg.debug('Checking metadata values')
963  self._readMetadata(files, metadataKeys)
964 
965  metadata = {}
966  for fname in files:
967  metadata[fname] = {}
968  for mdkey in metadataKeys:
969  try:
970  metadata[fname][mdkey] = self._fileMetadata[fname][mdkey]
971  except KeyError:
972  # This should not happen, unless we skipped populating
973  if populate:
974  msg.error('Did not find metadata key {0!s} for file {1!s} - setting to None'.format(mdkey, fname))
975  metadata[fname][mdkey] = None
976  return metadata
977 

◆ getnentries()

def python.trfArgClasses.argFile.getnentries (   self,
  fast = False 
)

Explicit getter, offering fast switch.

Definition at line 890 of file trfArgClasses.py.

890  def getnentries(self, fast=False):
891  totalEvents = 0
892  for fname in self._value:
893  events = self.getSingleMetadata(fname=fname, metadataKey='nentries', populate = not fast)
894  if events is None:
895  msg.debug('Got events=None for file {0} - returning None for this instance'.format(fname))
896  return None
897  if events == 'UNDEFINED':
898  msg.debug('Got events=UNDEFINED for file {0} - returning UNDEFINED for this instance'.format(fname))
899  return 'UNDEFINED'
900  if not isinstance(events, int):
901  msg.warning('Got unexpected events metadata for file {0}: {1!s} - returning None for this instance'.format(fname, events))
902  return None
903  totalEvents += events
904 
905  return totalEvents
906 
907 

◆ getSingleMetadata()

def python.trfArgClasses.argFile.getSingleMetadata (   self,
  fname,
  metadataKey,
  populate = True,
  flush = False 
)

Convenience function to extract a single metadata key for a single file.

Retrieves a single metadata item for a single file, returning it directly

Returns
Single metadata value
Parameters
fnameFile to return metadata for
metadataKeyKey to return
populateIf missing key should be generated by calling the population subroutines
flushIf cached data should be flushed and the generator rerun

Definition at line 985 of file trfArgClasses.py.

985  def getSingleMetadata(self, fname, metadataKey, populate = True, flush = False):
986  if not (isinstance(fname, str) and isinstance(metadataKey, str)):
987  raise trfExceptions.TransformInternalException(trfExit.nameToCode('TRF_INTERNAL'),
988  'Illegal call to getSingleMetadata function: {0!s} {1!s}'.format(fname, metadataKey))
989  md = self.getMetadata(files = fname, metadataKeys = metadataKey, populate = populate, flush = flush)
990  return md[fname][metadataKey]
991 
992 

◆ io() [1/2]

def python.trfArgClasses.argFile.io (   self)

Definition at line 803 of file trfArgClasses.py.

803  def io(self):
804  return (self._io)
805 

◆ io() [2/2]

def python.trfArgClasses.argFile.io (   self,
  value 
)

Definition at line 807 of file trfArgClasses.py.

807  def io(self, value):
808  if value not in ('input', 'output', 'temporary'):
809  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_RUNTIME_ERROR'),
810  'File arguments must be specified as input, output or temporary - got {0}'.format(value))
811  self._io = value
812 

◆ isCached()

def python.trfArgClasses.argFile.isCached (   self,
  files = None,
  metadataKeys = None 
)

Test if certain metadata elements are already cached.

Will test for a cached value for all files and all keys given, aborting as soon as it finds a single uncached value.

Parameters
filesFiles to check (defaults to all files)
metadataKeysKeys to check (defaults to all keys)
Returns
Boolean if all keys are cached for all files

Definition at line 1061 of file trfArgClasses.py.

1061  def isCached(self, files = None, metadataKeys = None):
1062  msg.debug('Testing for cached values for files {0} and keys {1}'.format(files, metadataKeys))
1063  if files is None:
1064  files = self._value
1065  elif isinstance(files, str):
1066  files = (files,)
1067  if metadataKeys is None:
1068  metadataKeys = list(self._metadataKeys)
1069  elif isinstance(metadataKeys, str):
1070  metadataKeys = (metadataKeys,)
1071 
1072  isCachedFlag = True
1073  for fname in files:
1074  for key in metadataKeys:
1075  if key not in self._fileMetadata[fname]:
1076  isCachedFlag = False
1077  break
1078  if isCachedFlag is False:
1079  break
1080 
1081  return isCachedFlag
1082 

◆ isRunarg()

def python.trfArgClasses.argument.isRunarg (   self)
inherited

Return runarg status.

Definition at line 134 of file trfArgClasses.py.

134  def isRunarg(self):
135  return self._runarg
136 

◆ mergeTargetSize() [1/2]

def python.trfArgClasses.argFile.mergeTargetSize (   self)

mergeTargeSize value getter

Definition at line 613 of file trfArgClasses.py.

613  def mergeTargetSize(self):
614  return self._mergeTargetSize
615 

◆ mergeTargetSize() [2/2]

def python.trfArgClasses.argFile.mergeTargetSize (   self,
  value 
)

mergeTargeSize value setter

Definition at line 618 of file trfArgClasses.py.

618  def mergeTargetSize(self, value):
619  if value is None:
620  self._mergeTargetSize = 0
621  else:
622  self._mergeTargetSize = value
623 

◆ metadata()

def python.trfArgClasses.argFile.metadata (   self)

Returns the whole kit and kaboodle...

Note
Populates the whole metadata dictionary for this instance

Definition at line 880 of file trfArgClasses.py.

880  def metadata(self):
881  self.getMetadata()
882  return self._fileMetadata
883 

◆ multipleOK() [1/2]

def python.trfArgClasses.argFile.multipleOK (   self)

multipleOK getter

Returns
Current value

Definition at line 603 of file trfArgClasses.py.

603  def multipleOK(self):
604  return self._multipleOK
605 

◆ multipleOK() [2/2]

def python.trfArgClasses.argFile.multipleOK (   self,
  value 
)

multipleOK value setter

Definition at line 608 of file trfArgClasses.py.

608  def multipleOK(self, value):
609  self._multipleOK = value
610 

◆ name() [1/2]

def python.trfArgClasses.argFile.name (   self)

Name getter.

Reimplemented from python.trfArgClasses.argument.

Definition at line 847 of file trfArgClasses.py.

847  def name(self):
848  return self._name
849 

◆ name() [2/2]

def python.trfArgClasses.argFile.name (   self,
  value 
)

Name setter.

Note
This property setter will also set the type and subtype of the argFile if they are not yet set. This means that for most arguments the type and subtype are automatically set correctly.

Reimplemented from python.trfArgClasses.argument.

Definition at line 855 of file trfArgClasses.py.

855  def name(self, value):
856  self._name = value
857  m = re.match(r'(input|output|tmp.)([A-Za-z0-9_]+?)(File)?$', value)
858  if m:
859  msg.debug("ArgFile name setter matched this: {0}".format(m.groups()))
860  if self._type is None:
861  dtype = m.group(2).split('_', 1)[0]
862  # But DRAW/DESD/DAOD are really just RAW, ESD, AOD in format
863  if re.match(r'D(RAW|ESD|AOD)', dtype):
864  dtype = dtype[1:]
865  msg.debug("Autoset data type to {0}".format(dtype))
866  self._type = dtype
867  if self._subtype is None:
868  msg.debug("Autoset data subtype to {0}".format(m.group(2)))
869  self._subtype = m.group(2)
870  else:
871  msg.debug("ArgFile name setter did not match against '{0}'".format(value))
872 

◆ nentries()

def python.trfArgClasses.argFile.nentries (   self)

Return total number of events in all constituent files.

Definition at line 886 of file trfArgClasses.py.

886  def nentries(self):
887  return self.getnentries()
888 

◆ originalName()

def python.trfArgClasses.argFile.originalName (   self,
  value 
)

Definition at line 826 of file trfArgClasses.py.

826  def originalName(self, value):
827  self._originalName = value
828 

◆ orignalName()

def python.trfArgClasses.argFile.orignalName (   self)

Definition at line 822 of file trfArgClasses.py.

822  def orignalName(self):
823  return self._originalName
824 

◆ prodsysDescription()

def python.trfArgClasses.argFile.prodsysDescription (   self)

Reimplemented from python.trfArgClasses.argList.

Reimplemented in python.trfArgClasses.argFTKIPFile, python.trfArgClasses.argBZ2File, python.trfArgClasses.argNTUPFile, python.trfArgClasses.argHISTFile, python.trfArgClasses.argPOOLFile, python.trfArgClasses.argBSFile, and python.trfArgClasses.argAthenaFile.

Definition at line 625 of file trfArgClasses.py.

625  def prodsysDescription(self):
626  if isinstance(self._type, dict):
627  if self._type=={}:
628  desc = {'type' : 'file', 'subtype' : "NONE" }
629  else:
630  desc = {'type' : 'file', 'subtype' : dict((str(k).upper(), str(v).upper()) for (k,v) in self._type.items())}
631  else:
632  desc = {'type' : 'file', 'subtype' : str(self._type).upper()}
633  desc['multiple'] = self._multipleOK
634  return desc
635 

◆ subtype() [1/2]

def python.trfArgClasses.argFile.subtype (   self)

Definition at line 838 of file trfArgClasses.py.

838  def subtype(self):
839  return self._subtype
840 

◆ subtype() [2/2]

def python.trfArgClasses.argFile.subtype (   self,
  value 
)

Definition at line 842 of file trfArgClasses.py.

842  def subtype(self, value):
843  self._subtype = value
844 

◆ type() [1/2]

def python.trfArgClasses.argFile.type (   self)

Definition at line 830 of file trfArgClasses.py.

830  def type(self):
831  return self._type
832 

◆ type() [2/2]

def python.trfArgClasses.argFile.type (   self,
  value 
)

Definition at line 834 of file trfArgClasses.py.

834  def type(self, value):
835  self._type = value
836 

◆ value() [1/2]

def python.trfArgClasses.argFile.value (   self)

Argument value getter.

Returns
Current value

Reimplemented from python.trfArgClasses.argList.

Definition at line 591 of file trfArgClasses.py.

591  def value(self):
592  return self._value
593 

◆ value() [2/2]

def python.trfArgClasses.argFile.value (   self,
  value 
)

Argument value setter.

Calls the valueSetter function with the standard options

Reimplemented from python.trfArgClasses.argList.

Definition at line 597 of file trfArgClasses.py.

597  def value(self, value):
598  self.valueSetter(value)
599 

◆ valueSetter()

def python.trfArgClasses.argFile.valueSetter (   self,
  value 
)

Set the argFile value, but allow parameters here.

Note
Normally athena only takes a single value for an output file, but when AthenaMP runs it can produce multiple output files - this is allowed by setting allowMultiOutputs = True
The setter protects against the same file being added multiple times

Definition at line 645 of file trfArgClasses.py.

645  def valueSetter(self, value):
646 
647  if isinstance(value, (list, tuple)):
648  if len(value) > 0 and isinstance(value[0], dict): # Tier-0 style expanded argument with metadata
649  self._value=[]
650  for myfile in value:
651  try:
652  self._value.append(myfile['lfn'])
653  self._resetMetadata(files = [myfile['lfn']])
654  except KeyError:
655  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_ARG_CONV_FAIL'),
656  'Filename (key "lfn") not found in Tier-0 file dictionary: {0}'.format(myfile))
657  for k, v in myfile.items():
658  if k == 'guid':
659  self._setMetadata([myfile['lfn']], {'file_guid': v})
660  elif k == 'events':
661  self._setMetadata([myfile['lfn']], {'nentries': v})
662  elif k == 'checksum':
663  self._setMetadata([myfile['lfn']], {'checksum': v})
664  elif k == 'dsn':
665  if not self._dataset:
666  self.dataset = v
667  elif self.dataset != v:
668  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_ARG_DATASET'),
669  'Inconsistent dataset names in Tier-0 dictionary: {0} != {1}'.format(self.dataset, v))
670  else:
671  self._value = list(value)
672  self._getDatasetFromFilename(reset = False)
673  self._resetMetadata()
674  elif value is None:
675  self._value = []
676  return
677  else:
678  try:
679  if value.lower().startswith('lfn'):
680  # Resolve physical filename using pool file catalog.
681  from PyUtils.PoolFile import file_name
682  protocol, pfn = file_name(value)
683  self._value = [pfn]
684  self._getDatasetFromFilename(reset = False)
685  self._resetMetadata()
686  else:
687  self._value = value.split(self._splitter)
688  self._getDatasetFromFilename(reset = False)
689  self._resetMetadata()
690  except (AttributeError, TypeError):
691  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_ARG_CONV_FAIL'),
692  'Failed to convert %s to a list' % str(value))
693 
694 
695  deDuplicatedValue = []
696  for fname in self._value:
697  if fname not in deDuplicatedValue:
698  deDuplicatedValue.append(fname)
699  else:
700  msg.warning("Removing duplicated file {0} from file list".format(fname))
701  if len(self._value) != len(deDuplicatedValue):
702  self._value = deDuplicatedValue
703  msg.warning('File list after duplicate removal: {0}'.format(self._value))
704 
705  # Find our URL type (if we actually have files!)
706  # At the moment this is assumed to be the same for all files in this instance
707  # although in principle one could mix different access methods in the one input file type
708  if len(self._value) > 0:
709  self._urlType = urlType(self._value[0])
710  else:
711  self._urlType = None
712 
713 
714  if self._io == 'input':
715 
719  if self._urlType == 'posix':
720  msg.debug('Found POSIX filesystem input - activating globbing')
721  newValue = []
722  for filename in self._value:
723  # Simple case
724  globbedFiles = glob.glob(filename)
725  if len(globbedFiles) == 0: # No files globbed for this 'filename' argument.
726  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_INPUT_FILE_ERROR'),
727  'Input file argument {0} globbed to NO input files - probably the file(s) are missing'.format(filename))
728 
729  globbedFiles.sort()
730  newValue.extend(globbedFiles)
731 
732  self._value = newValue
733  msg.debug ('File input is globbed to %s' % self._value)
734 
735  elif self._urlType == 'root':
736  msg.debug('Found root filesystem input - activating globbing')
737  newValue = []
738  for filename in self._value:
739  if str(filename).startswith("root"):
740  msg.debug('Found input file name starting with "root," setting XRD_RUNFORKHANDLER=1, which enables fork handlers for xrootd in direct I/O')
741  os.environ["XRD_RUNFORKHANDLER"] = "1"
742  if str(filename).startswith("https") or str(filename).startswith("davs") or not(str(filename).endswith('/')) and '*' not in filename and '?' not in filename:
743  msg.debug('Seems that only one file was given: {0}'.format(filename))
744  newValue.extend(([filename]))
745  else:
746  # Hopefully this recognised wildcards...
747  path = filename
748  fileMask = ''
749  if '*' in filename or '?' in filename:
750  msg.debug('Split input into path for listdir() and a filemask to select available files.')
751  path = filename[0:filename.rfind('/')+1]
752  msg.debug('path: {0}'.format(path))
753  fileMask = filename[filename.rfind('/')+1:len(filename)]
754  msg.debug('Will select according to: {0}'.format(fileMask))
755 
756  cmd = ['/afs/cern.ch/project/eos/installation/atlas/bin/eos.select' ]
757  if not os.access ('/afs/cern.ch/project/eos/installation/atlas/bin/eos.select', os.X_OK ):
758  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_INPUT_FILE_ERROR'),
759  'No execute access to "eos.select" - could not glob EOS input files.')
760 
761  cmd.extend(['ls'])
762  cmd.extend([path])
763 
764  myFiles = []
765  try:
766  proc = subprocess.Popen(args = cmd,bufsize = 1, shell = False, stdout = subprocess.PIPE, stderr = subprocess.STDOUT)
767  rc = proc.wait()
768  output = proc.stdout.readlines()
769  if rc!=0:
770  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_INPUT_FILE_ERROR'),
771  'EOS list command ("{0!s}") failed: rc {1}, output {2}'.format(cmd, rc, output))
772  msg.debug("eos returned: {0}".format(output))
773  for line in output:
774  if "root" in line:
775  myFiles += [str(path)+str(line.rstrip('\n'))]
776 
777  patt = re.compile(fileMask.replace('*','.*').replace('?','.'))
778  for srmFile in myFiles:
779  if fileMask != '':
780  if(patt.search(srmFile)) is not None:
781  #if fnmatch.fnmatch(srmFile, fileMask):
782  msg.debug('match: ',srmFile)
783  newValue.extend(([srmFile]))
784  else:
785  newValue.extend(([srmFile]))
786 
787  msg.debug('Selected files: ', newValue)
788  except (AttributeError, TypeError, OSError):
789  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_RUNTIME_ERROR'),
790  'Failed to convert %s to a list' % str(value))
791  if len(self._value) > 0 and len(newValue) == 0:
792  # Woops - no files!
793  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_INPUT_FILE_ERROR'),
794  'Input file argument(s) {0!s} globbed to NO input files - ls command failed')
795  self._value = newValue
796  msg.debug ('File input is globbed to %s' % self._value)
797  # Check if multiple outputs are ok for this object
798  elif self._multipleOK is False and len(self._value) > 1:
799  raise trfExceptions.TransformArgException(trfExit.nameToCode('TRF_OUTPUT_FILE_ERROR'),
800  'Multiple file arguments are not supported for {0} (was given: {1}'.format(self, self._value))
801 

Member Data Documentation

◆ _auxiliaryFile

python.trfArgClasses.argFile._auxiliaryFile
private

Definition at line 553 of file trfArgClasses.py.

◆ _dataset

python.trfArgClasses.argFile._dataset
private

Definition at line 547 of file trfArgClasses.py.

◆ _exe

python.trfArgClasses.argFile._exe
private

Definition at line 559 of file trfArgClasses.py.

◆ _fileMetadata

python.trfArgClasses.argFile._fileMetadata
private

Definition at line 574 of file trfArgClasses.py.

◆ _guid

python.trfArgClasses.argFile._guid
private

Definition at line 551 of file trfArgClasses.py.

◆ _io

python.trfArgClasses.argFile._io
private

Input file globbing and expansion.

Definition at line 576 of file trfArgClasses.py.

◆ _mergeTargetSize

python.trfArgClasses.argFile._mergeTargetSize
private

Definition at line 552 of file trfArgClasses.py.

◆ _metadataKeys

python.trfArgClasses.argFile._metadataKeys
private
Note
Variable listing set of file metadata which corresponds to this class, Key is the metadata variable name, the value is the function to call to populate/refresh this metadata value. Function must take a single parameter, which is the list of files to get metadata for. It must return a metadata dictionary: {file1 : {key1: value1, key2: value2}, file2: ...} Keys which start with _ are for transform internal use and should not appear in jobReports

Definition at line 569 of file trfArgClasses.py.

◆ _multipleOK

python.trfArgClasses.argFile._multipleOK
private

Definition at line 577 of file trfArgClasses.py.

◆ _name

python.trfArgClasses.argFile._name
private

Definition at line 856 of file trfArgClasses.py.

◆ _originalName

python.trfArgClasses.argFile._originalName
private

Definition at line 554 of file trfArgClasses.py.

◆ _runarg

python.trfArgClasses.argument._runarg
privateinherited

Definition at line 110 of file trfArgClasses.py.

◆ _splitter

python.trfArgClasses.argList._splitter
privateinherited

Definition at line 357 of file trfArgClasses.py.

◆ _subtype

python.trfArgClasses.argFile._subtype
private

Definition at line 550 of file trfArgClasses.py.

◆ _supressEmptyStrings

python.trfArgClasses.argList._supressEmptyStrings
privateinherited

Definition at line 358 of file trfArgClasses.py.

◆ _type

python.trfArgClasses.argFile._type
private

Definition at line 549 of file trfArgClasses.py.

◆ _urlType

python.trfArgClasses.argFile._urlType
private
Note
TODO: Non-posix URLs Problem is not so much the [] expansion, but the invisible .N attempt number One can only deal with this with a listdir() functionality N.B. Current transforms only do globbing on posix fs too (see trfutil.expandStringToList())

Definition at line 548 of file trfArgClasses.py.

◆ _value

python.trfArgClasses.argFile._value
private
Note
First do parsing of string vs. lists to get list of files
Check for duplicates (N.B. preserve the order, just remove the duplicates)

Definition at line 649 of file trfArgClasses.py.

◆ dataset

python.trfArgClasses.argFile.dataset

Definition at line 666 of file trfArgClasses.py.

◆ io

python.trfArgClasses.argFile.io

Definition at line 557 of file trfArgClasses.py.

◆ value

python.trfArgClasses.argument.value
inherited
Note
We have a default of None here, but all derived classes should definitely have their own value setter and translate this value to something sensible for their underlying value type. N.B. As most argument classes use this default constructor it must call the @value .setter function!

Definition at line 118 of file trfArgClasses.py.


The documentation for this class was generated from the following file:
replace
std::string replace(std::string s, const std::string &s2, const std::string &s3)
Definition: hcg.cxx:307
python.trfFileUtils.ROOTGetSize
def ROOTGetSize(filename)
Get the size of a file via ROOT's TFile.
Definition: trfFileUtils.py:285
vtune_athena.format
format
Definition: vtune_athena.py:14
athena.value
value
Definition: athena.py:124
upper
int upper(int c)
Definition: LArBadChannelParser.cxx:49
dumpHVPathFromNtuple.append
bool append
Definition: dumpHVPathFromNtuple.py:91
python.HanMetadata.getMetadata
def getMetadata(f, key)
Definition: HanMetadata.py:12
python.checkMetadata.metadata
metadata
Definition: checkMetadata.py:175
physics_parameters.file_name
string file_name
Definition: physics_parameters.py:32
PlotCalibFromCool.nentries
nentries
Definition: PlotCalibFromCool.py:798
PyAthena::repr
std::string repr(PyObject *o)
returns the string representation of a python object equivalent of calling repr(o) in python
Definition: PyAthenaUtils.cxx:106
histSizes.list
def list(name, path='/')
Definition: histSizes.py:38
beamspotman.dataset
dataset
Definition: beamspotman.py:286
beamspotman.dir
string dir
Definition: beamspotman.py:623
TCS::join
std::string join(const std::vector< std::string > &v, const char c=',')
Definition: Trigger/TrigT1/L1Topo/L1TopoCommon/Root/StringUtils.cxx:10
name
std::string name
Definition: Control/AthContainers/Root/debug.cxx:228
python.trfFileUtils.urlType
def urlType(filename)
Return the LAN access type for a file URL.
Definition: trfFileUtils.py:316
TrigJetMonitorAlgorithm.items
items
Definition: TrigJetMonitorAlgorithm.py:79
python.processes.powheg.ZZ.ZZ.__init__
def __init__(self, base_directory, **kwargs)
Constructor: all process options are set here.
Definition: ZZ.py:18
Trk::open
@ open
Definition: BinningType.h:40
python.CaloScaleNoiseConfig.type
type
Definition: CaloScaleNoiseConfig.py:78
if
if(febId1==febId2)
Definition: LArRodBlockPhysicsV0.cxx:567
str
Definition: BTagTrackIpAccessor.cxx:11
Trk::split
@ split
Definition: LayerMaterialProperties.h:38
subproc.subtype
string subtype
Definition: subproc.py:19