ATLAS Offline Software
Public Member Functions | Static Public Member Functions | Private Types | Private Member Functions | Private Attributes | List of all members
EL::DuplicateChecker Class Reference

#include <DuplicateChecker.h>

Inheritance diagram for EL::DuplicateChecker:
Collaboration diagram for EL::DuplicateChecker:

Public Member Functions

void testInvariant () const
 test the invariant of this object More...
 
 DuplicateChecker ()
 standard constructor More...
 
const std::string & eventInfoName () const
 the name of the EventInfo structure to use More...
 
void setEventInfoName (const std::string &val_eventInfoName)
 set the value of eventInfoName More...
 
const std::string & outputTreeName () const
 the name of the output tree to create, or the empty string if none is created More...
 
void setOutputTreeName (const std::string &val_outputTreeName)
 set the value of outputTreeName More...
 
void addKnownDuplicate (const std::string &sampleName, const std::string &fileName, Long64_t entry, number_type runNumber, number_type eventNumber)
 add a known duplicate event More...
 
void addKnownDuplicatesFile (const std::string &duplicatesFile)
 add a file with known duplicates More...
 
IWorkerwk () const
 description: the worker that is controlling us guarantee: no-fail More...
 
void book (const TH1 &hist)
 book the given histogram More...
 
TH1hist (const std::string &name) const
 get the histogram with the given name More...
 
asg::SgTEventevtStore () const
 get the (main) event store for this algorithm More...
 
virtual const std::string & name () const
 

Static Public Member Functions

static bool processSummary (const std::string &submitdir, const std::string &treeName)
 process the summary tree from the given submission More...
 
static bool processSummary (const SH::SampleHandler &sh, const std::string &outputFile)
 process the summary tree from the given submission More...
 

Private Types

typedef uint32_t number_type
 the integer type to use for run and event numbers More...
 

Private Member Functions

virtual StatusCode setupJob (Job &job) override
 effects: give the algorithm a chance to intialize the job with anything this algorithm needs. More...
 
virtual StatusCode changeInput (bool firstFile) override
 effects: do all changes to work with a new input file, e.g. More...
 
virtual StatusCode initialize () override
 effects: do everything that needs to be done before running the algorithm, e.g. More...
 
virtual StatusCode execute () override
 effects: process the next event guarantee: basic failures: algorithm dependent More...
 
void read_run_event_number ()
 get the run and event number for the current event More...
 
 ClassDef (DuplicateChecker, 1)
 
virtual StatusCode fileExecute ()
 effects: do all the processing that needs to be done once per file More...
 
virtual StatusCode endOfFile ()
 effects: do the post-processing for each input file guarantee: basic failures: algorithm dependent rationale: this is mainly used for specialized services that need to save partial results for each input file More...
 
virtual StatusCode histInitialize ()
 effects: this is a pre-initialization routine that is called before changeInput is called. More...
 
virtual StatusCode postExecute ()
 effects: do the post-processing for the event guarantee: basic failures: algorithm dependent rationale: this is mainly used for specialized services that need to get input from subsequent algorithms before filling their event data More...
 
virtual StatusCode finalize ()
 effects: do everything that needs to be done after completing work on this worker guarantee: basic failures: algorithm dependent rationale: currently there is no use foreseen, but this routine is provided regardless More...
 
virtual StatusCode histFinalize ()
 effects: this is a post-initialization routine that is called after finalize has been called. More...
 
virtual bool hasName (const std::string &name) const
 returns: whether this algorithm has the given name guarantee: basic failures: algorithm dependent rationale: this is to allow an algorithm to be known by multiple names. More...
 
void sysSetupJob (Job &job)
 effects: give the algorithm a chance to intialize the job with anything this algorithm needs. More...
 

Private Attributes

std::string m_eventInfoName
 the value returned by eventInfoName More...
 
std::string m_outputTreeName
 the value returned by outputTreeName More...
 
std::map< std::pair< std::string, std::string >, std::map< Long64_t, std::pair< number_type, number_type > > > m_duplicates
 the list of known duplicates to skip More...
 
std::map< Long64_t, std::pair< number_type, number_type > > * m_currentDuplicates = nullptr
 the list of the duplicates in the current file to skip, or the null pointer if there are none More...
 
std::set< std::pair< number_type, number_type > > m_processed
 the list of run-event numbers already encountered More...
 
xAOD::TEventm_event = nullptr
 the event we are reading from More...
 
TTree * m_outputTree = nullptr
 the output tree, if we are creating one More...
 
std::string m_inputFileName
 the name of the input file (connected to m_outputTree, if present) More...
 
Long64_t m_inputFileIndex
 the index in the input file (connected to m_outputTree, if present) More...
 
number_type m_runNumber
 the run number of the current event (connected to m_outputTree, if present) More...
 
number_type m_eventNumber
 the event number of the current event (connected to m_outputTree, if present) More...
 
Bool_t m_processEvent
 whether the current event is/should be processed (connected to m_outputTree, if present) More...
 
IWorkerm_wk
 
asg::SgTEventm_evtStorePtr = nullptr
 the value of evtStore More...
 
asg::SgTEvent m_evtStore
 when configured, the object returned by evtStore More...
 
MsgStream * m_msg = nullptr
 the message stream, if it has been instantiated More...
 
std::string m_msgName
 the algorithm name for which the message stream has been instantiated More...
 
int m_msgLevel = 3
 the message level configured More...
 
std::string m_nameCache
 the cache for name More...
 

Detailed Description

Todo:
add documentation

Definition at line 30 of file DuplicateChecker.h.

Member Typedef Documentation

◆ number_type

typedef uint32_t EL::DuplicateChecker::number_type
private

the integer type to use for run and event numbers

Definition at line 38 of file DuplicateChecker.h.

Constructor & Destructor Documentation

◆ DuplicateChecker()

EL::DuplicateChecker::DuplicateChecker ( )

standard constructor

Guarantee
strong
Failures
out of memory I

Member Function Documentation

◆ addKnownDuplicate()

void EL::DuplicateChecker::addKnownDuplicate ( const std::string &  sampleName,
const std::string &  fileName,
Long64_t  entry,
number_type  runNumber,
number_type  eventNumber 
)

add a known duplicate event

Guarantee
strong
Failures
out of memory II

◆ addKnownDuplicatesFile()

void EL::DuplicateChecker::addKnownDuplicatesFile ( const std::string &  duplicatesFile)

add a file with known duplicates

Guarantee
strong
Failures
i/o errors
out of memory III

◆ book()

void EL::Algorithm::book ( const TH1 hist)
inherited

book the given histogram

Guarantee
strong
Failures
histogram booking error

◆ changeInput()

virtual StatusCode EL::DuplicateChecker::changeInput ( bool  firstFile)
overrideprivatevirtual

effects: do all changes to work with a new input file, e.g.

set new branch addresses. if firstFile is set, this method is called just before init() is called

Warning: If a file is split across multiple jobs this will be called more than once. This only happens for specific batch drivers and/or if it is explicitly configured by the user. With PROOF it could even happen multiple times within the same job, and while PROOF is no longer supported that behavior may come back if support for a similar framework is added in the future. As such, this method should not be used for accounting that relies to be called exactly once per file, take a look at fileExecute() if you want something that is guaranteed to be executed exactly once per input file.

Warning: The execution order of changeInput and fileExecute is currently unspecified.

guarantee: basic failures: algorithm dependent

Reimplemented from EL::Algorithm.

◆ ClassDef()

EL::DuplicateChecker::ClassDef ( DuplicateChecker  ,
 
)
private

◆ endOfFile()

virtual StatusCode EL::Algorithm::endOfFile ( )
privatevirtualinherited

effects: do the post-processing for each input file guarantee: basic failures: algorithm dependent rationale: this is mainly used for specialized services that need to save partial results for each input file

Reimplemented in EL::MetricsSvc.

◆ eventInfoName()

const std::string& EL::DuplicateChecker::eventInfoName ( ) const

the name of the EventInfo structure to use

This is mostly meant, so that in my unit test code I can point it to my own specially prepared EventInfo

Guarantee
no-fail

◆ evtStore()

asg::SgTEvent* EL::Algorithm::evtStore ( ) const
inherited

get the (main) event store for this algorithm

This is mostly to mirror the method of the same name in AthAlgorithm, allowing to make the tutorial instructions more dual-use.

Guarantee
strong
Failures
out of memory I
job not configured for xAODs

◆ execute()

virtual StatusCode EL::DuplicateChecker::execute ( )
overrideprivatevirtual

effects: process the next event guarantee: basic failures: algorithm dependent

Reimplemented from EL::Algorithm.

◆ fileExecute()

virtual StatusCode EL::Algorithm::fileExecute ( )
privatevirtualinherited

effects: do all the processing that needs to be done once per file

Warning: The user should not expect this to be called at any particular point in execution. If a file is split between multiple jobs this will be called in only one of these jobs, and not the others. It usually gets called before the first event in a file, but that is not guaranteed and relying on this is a bug. Take a look at changeInput if you want something that is guaranteed to be executed at the beginning of each input file.

Warning: The execution order of changeInput and fileExecute is currently unspecified.

guarantee: basic failures: algorithm dependent rationale: this is to read per-file accounting data, e.g. the list of lumi-blocks processed

Reimplemented in EL::UnitTestAlg1, EL::UnitTestAlg, EL::UnitTestAlgXAOD, and EL::MetricsSvc.

◆ finalize()

virtual StatusCode EL::Algorithm::finalize ( )
privatevirtualinherited

effects: do everything that needs to be done after completing work on this worker guarantee: basic failures: algorithm dependent rationale: currently there is no use foreseen, but this routine is provided regardless

Reimplemented in EL::UnitTestAlg1, EL::UnitTestAlg, and EL::UnitTestAlgXAOD.

◆ hasName()

virtual bool EL::Algorithm::hasName ( const std::string &  name) const
privatevirtualinherited

returns: whether this algorithm has the given name guarantee: basic failures: algorithm dependent rationale: this is to allow an algorithm to be known by multiple names.

this is needed for NTupleSvc, so that it can be located with and without the output tree name.

Reimplemented in EL::NTupleSvc.

◆ hist()

TH1* EL::Algorithm::hist ( const std::string &  name) const
inherited

get the histogram with the given name

Guarantee
strong
Failures
histogram not found

◆ histFinalize()

virtual StatusCode EL::Algorithm::histFinalize ( )
privatevirtualinherited

effects: this is a post-initialization routine that is called after finalize has been called.

guarantee: basic failures: algorithm dependent rationale: unlike finalize(), this method is called all the time, even on empty input files.

Reimplemented in EL::UnitTestAlg1, EL::UnitTestAlg, EL::UnitTestAlgXAOD, and EL::MetricsSvc.

◆ histInitialize()

virtual StatusCode EL::Algorithm::histInitialize ( )
privatevirtualinherited

effects: this is a pre-initialization routine that is called before changeInput is called.

guarantee: basic failures: algorithm dependent rationale: unlike initialize(), this method is called all the time, even on empty input files. so you should create any histograms or n-tuples here that subsequent code expects

Reimplemented in EL::UnitTestAlg1, EL::UnitTestAlg, EL::UnitTestAlgXAOD, EL::MetricsSvc, and EL::VomsProxySvc.

◆ initialize()

virtual StatusCode EL::DuplicateChecker::initialize ( )
overrideprivatevirtual

effects: do everything that needs to be done before running the algorithm, e.g.

create output n-tuples and histograms. this method is called only once right after changeInput(true) is called guarantee: basic failures: algorithm dependent rationale: in principle all this work could be done on changeInput(true). However, providing this method should make it easier for the user to set up all his outputs and to do so only once.

Reimplemented from EL::Algorithm.

◆ msg() [1/2]

MsgStream& EL::Algorithm::msg ( ) const
inherited

messaging interface

this is the interface to work with the standard messaging macros from AsgTools. Instead of enums I pass ints, so that I can avoid the include dependency (forward declarations are only allowed for enum classes AFAIK).

the standard message stream for this object

Guarantee
strong
Failures
code not compiled with AsgTools support

◆ msg() [2/2]

MsgStream& EL::Algorithm::msg ( int  level) const
inherited

the message stream for this object, configured for the given level

Guarantee
strong
Failures
code not compiled with AsgTools support

◆ msgLvl()

bool EL::Algorithm::msgLvl ( int  lvl) const
inherited

whether we are configured to print messages at the given level

Guarantee
no-fail

◆ name()

virtual const std::string& EL::Algorithm::name ( ) const
virtualinherited

◆ outputTreeName()

const std::string& EL::DuplicateChecker::outputTreeName ( ) const

the name of the output tree to create, or the empty string if none is created

The output tree contains a list of run and event numbers for all events, and whether they were processed by this job. This can be used to check whether duplicate events were processed (or whether we somehow eliminated events as duplicates that we shouldn't have). It can also be used to create a list of duplicate events for future processing rounds.

◆ postExecute()

virtual StatusCode EL::Algorithm::postExecute ( )
privatevirtualinherited

effects: do the post-processing for the event guarantee: basic failures: algorithm dependent rationale: this is mainly used for specialized services that need to get input from subsequent algorithms before filling their event data

Reimplemented in EL::NTupleSvc.

◆ processSummary() [1/2]

static bool EL::DuplicateChecker::processSummary ( const SH::SampleHandler sh,
const std::string &  outputFile 
)
static

process the summary tree from the given submission

This will create a file "duplicates" inside the submission directory that contains the list of duplicates that can be fed into future submissions to filter them out.

Returns
whether the job was successfully, i.e. each input event was read exactly once and all duplicates were skipped
Guarantee
basic
Failures
i/o errors

This version of the method provides a lower level interface, in which the list of inputs is given via a sample handler (with the tree name properly set), and the output file name freely choosable.

◆ processSummary() [2/2]

static bool EL::DuplicateChecker::processSummary ( const std::string &  submitdir,
const std::string &  treeName 
)
static

process the summary tree from the given submission

This will create a file "duplicates" inside the submission directory that contains the list of duplicates that can be fed into future submissions to filter them out.

Returns
whether the job was successfully, i.e. each input event was read exactly once and all duplicates were skipped
Guarantee
basic
Failures
i/o errors

◆ read_run_event_number()

void EL::DuplicateChecker::read_run_event_number ( )
private

get the run and event number for the current event

◆ setEventInfoName()

void EL::DuplicateChecker::setEventInfoName ( const std::string &  val_eventInfoName)

set the value of eventInfoName

Guarantee
strong
Failures
out of memory II

◆ setMsgLevel()

void EL::Algorithm::setMsgLevel ( int  level)
inherited

set the message level for the message stream for this object

Guarantee
no-fail

◆ setOutputTreeName()

void EL::DuplicateChecker::setOutputTreeName ( const std::string &  val_outputTreeName)

set the value of outputTreeName

Guarantee
strong
Failures
out of memory II

◆ setupJob()

virtual StatusCode EL::DuplicateChecker::setupJob ( Job job)
overrideprivatevirtual

effects: give the algorithm a chance to intialize the job with anything this algorithm needs.

this method is automatically called before the algorithm is actually added to the job. guarantee: basic failures: algorithm dependent rationale: this is currently used to give algorithms a chance to register their output datasets, but can also be used for other stuff.

Reimplemented from EL::Algorithm.

◆ sysSetupJob()

void EL::Algorithm::sysSetupJob ( Job job)
privateinherited

effects: give the algorithm a chance to intialize the job with anything this algorithm needs.

this method is automatically called before the algorithm is actually added to the job. guarantee: basic failures: algorithm dependent rationale: this is currently used to give algorithms a chance to register their output datasets, but can also be used for other stuff.

◆ testInvariant()

void EL::DuplicateChecker::testInvariant ( ) const

test the invariant of this object

Guarantee
no-fail

◆ wk()

IWorker* EL::Algorithm::wk ( ) const
inherited

description: the worker that is controlling us guarantee: no-fail

Member Data Documentation

◆ m_currentDuplicates

std::map<Long64_t,std::pair<number_type,number_type> >* EL::DuplicateChecker::m_currentDuplicates = nullptr
private

the list of the duplicates in the current file to skip, or the null pointer if there are none

Definition at line 183 of file DuplicateChecker.h.

◆ m_duplicates

std::map<std::pair<std::string,std::string>,std::map<Long64_t,std::pair<number_type,number_type> > > EL::DuplicateChecker::m_duplicates
private

the list of known duplicates to skip

Definition at line 178 of file DuplicateChecker.h.

◆ m_event

xAOD::TEvent* EL::DuplicateChecker::m_event = nullptr
private

the event we are reading from

Definition at line 192 of file DuplicateChecker.h.

◆ m_eventInfoName

std::string EL::DuplicateChecker::m_eventInfoName
private

the value returned by eventInfoName

Definition at line 170 of file DuplicateChecker.h.

◆ m_eventNumber

number_type EL::DuplicateChecker::m_eventNumber
private

the event number of the current event (connected to m_outputTree, if present)

Definition at line 217 of file DuplicateChecker.h.

◆ m_evtStore

asg::SgTEvent EL::Algorithm::m_evtStore
mutableprivateinherited

when configured, the object returned by evtStore

Definition at line 329 of file Algorithm.h.

◆ m_evtStorePtr

asg::SgTEvent* EL::Algorithm::m_evtStorePtr = nullptr
mutableprivateinherited

the value of evtStore

Definition at line 325 of file Algorithm.h.

◆ m_inputFileIndex

Long64_t EL::DuplicateChecker::m_inputFileIndex
private

the index in the input file (connected to m_outputTree, if present)

Definition at line 207 of file DuplicateChecker.h.

◆ m_inputFileName

std::string EL::DuplicateChecker::m_inputFileName
private

the name of the input file (connected to m_outputTree, if present)

Definition at line 202 of file DuplicateChecker.h.

◆ m_msg

MsgStream* EL::Algorithm::m_msg = nullptr
mutableprivateinherited

the message stream, if it has been instantiated

Definition at line 333 of file Algorithm.h.

◆ m_msgLevel

int EL::Algorithm::m_msgLevel = 3
privateinherited

the message level configured

Definition at line 342 of file Algorithm.h.

◆ m_msgName

std::string EL::Algorithm::m_msgName
mutableprivateinherited

the algorithm name for which the message stream has been instantiated

Definition at line 338 of file Algorithm.h.

◆ m_nameCache

std::string EL::Algorithm::m_nameCache
mutableprivateinherited

the cache for name

Definition at line 346 of file Algorithm.h.

◆ m_outputTree

TTree* EL::DuplicateChecker::m_outputTree = nullptr
private

the output tree, if we are creating one

Definition at line 197 of file DuplicateChecker.h.

◆ m_outputTreeName

std::string EL::DuplicateChecker::m_outputTreeName
private

the value returned by outputTreeName

Definition at line 174 of file DuplicateChecker.h.

◆ m_processed

std::set<std::pair<number_type,number_type> > EL::DuplicateChecker::m_processed
private

the list of run-event numbers already encountered

Definition at line 188 of file DuplicateChecker.h.

◆ m_processEvent

Bool_t EL::DuplicateChecker::m_processEvent
private

whether the current event is/should be processed (connected to m_outputTree, if present)

Definition at line 222 of file DuplicateChecker.h.

◆ m_runNumber

number_type EL::DuplicateChecker::m_runNumber
private

the run number of the current event (connected to m_outputTree, if present)

Definition at line 212 of file DuplicateChecker.h.

◆ m_wk

IWorker* EL::Algorithm::m_wk
privateinherited

Definition at line 321 of file Algorithm.h.


The documentation for this class was generated from the following file: