ATLAS Offline Software
Loading...
Searching...
No Matches
TrigCostSvc Class Reference

AthenaMT service to collect trigger cost data from all threads and summarise it at the end of the event. More...

#include <TrigCostSvc.h>

Inheritance diagram for TrigCostSvc:
Collaboration diagram for TrigCostSvc:

Classes

class  ThreadHashCompare
 Static hash and equal members as required by tbb::concurrent_hash_map. More...

Public Member Functions

 TrigCostSvc (const std::string &name, ISvcLocator *pSvcLocator)
 Standard ATLAS Service constructor.
virtual ~TrigCostSvc ()
 Destructor.
virtual StatusCode initialize () override
 Initialise, create enough storage to store m_eventSlots.
virtual StatusCode finalize () override
 Finalize, act on m_saveHashes.
virtual StatusCode startEvent (const EventContext &context, const bool enableMonitoring=true) override
 Implementation of ITrigCostSvc::startEvent.
virtual StatusCode processAlg (const EventContext &context, const std::string &caller, const AuditType type) override
 Implementation of ITrigCostSvc::processAlg.
virtual StatusCode endEvent (const EventContext &context, SG::WriteHandle< xAOD::TrigCompositeContainer > &costOutputHandle, SG::WriteHandle< xAOD::TrigCompositeContainer > &rosOutputHandle) override
 Implementation of ITrigCostSvc::endEvent.
virtual bool isMonitoredEvent (const EventContext &context, const bool includeMultiSlot=true) const override
virtual StatusCode monitorROS (const EventContext &context, robmonitor::ROBDataMonitorStruct payload) override
 Implementation of ITrigCostSvc::monitorROS.
virtual StatusCode generateTimeoutReport (const EventContext &context, std::string &report) override
virtual StatusCode discardEvent (const EventContext &context) override
 Discard a cost monitored event.

Private Member Functions

StatusCode monitor (const EventContext &context, const AlgorithmIdentifier &ai, const TrigTimeStamp &now, const AuditType type)
 Internal call to save monitoring data for a given AlgorithmIdentifier.
StatusCode checkSlot (const EventContext &context) const
 Sanity check that the job is respecting the number of slots which were declared at config time.
int32_t getROIID (const EventContext &context)
 @breif Internal function to return a RoI from an extended event context context

Private Attributes

size_t m_eventSlots
 Number of concurrent processing slots.
std::unique_ptr< std::atomic< bool >[] > m_eventMonitored
 Used to cache if the event in a given slot is being monitored.
std::unique_ptr< std::shared_mutex[] > m_slotMutex
 Used to control and protect whole-table operations.
std::mutex m_globalMutex
 Used to protect all-slot modifications.
TrigCostDataStore< AlgorithmPayloadm_algStartInfo
 Thread-safe store of algorithm start payload.
TrigCostDataStore< TrigTimeStampm_algStopTime
 Thread-safe store of algorithm stop times.
TrigCostDataStore< std::vector< robmonitor::ROBDataMonitorStruct > > m_rosData
 Thread-safe store of ROS data.
tbb::concurrent_hash_map< std::thread::id, AlgorithmIdentifier, ThreadHashComparem_threadToAlgMap
 Keeps track of what is running right now in each thread.
std::unordered_map< uint32_t, uint32_t > m_threadToCounterMap
 Map thread's hash ID to a counting numeral.
size_t m_threadCounter
 Count how many unique thread ID we have seen.
Gaudi::Property< bool > m_monitorAllEvents {this, "MonitorAllEvents", false, "Monitor every HLT event, e.g. for offline validation."}
Gaudi::Property< bool > m_enableMultiSlot {this, "EnableMultiSlot", false, "Monitored events in the MasterSlot collect data from events running in other slots."}
Gaudi::Property< bool > m_saveHashes {this, "SaveHashes", false, "Store a copy of the hash dictionary for easier debugging"}
Gaudi::Property< size_t > m_masterSlot {this, "MasterSlot", 0, "The slot responsible for saving MultiSlot data"}
Gaudi::Property< std::string > m_costSupervisorAlgName {this, "CostSupervisorAlgName", "TrigCostSupervisorAlg", "The name of cost monitoring supervising algorithm, starting at the begining of the event"}
Gaudi::Property< std::string > m_costFinalizeAlgName {this, "CostFinalizeAlgName", "TrigCostFinalizeAlg", "The name of cost monitoring finalize algorithm, starting at the end of the event"}

Detailed Description

AthenaMT service to collect trigger cost data from all threads and summarise it at the end of the event.

The main hooks into this service are: HLTSeeding - To clear the internal storage and flag the event for monitoring. TrigCostAuditor - To inform the service when algorithms start and stop executing HLTROBDataProviderSvc - To inform the service about requests for data ROBs HLTSummaryAlg - To inform the service when the HLT has finished, and to receive the persistent payload

Definition at line 35 of file TrigCostSvc.h.

Constructor & Destructor Documentation

◆ TrigCostSvc()

TrigCostSvc::TrigCostSvc ( const std::string & name,
ISvcLocator * pSvcLocator )

Standard ATLAS Service constructor.

Parameters
[in]nameThe service's name
[in]svclocA pointer to a service location service

Definition at line 15 of file TrigCostSvc.cxx.

15 :
16base_class(name, pSvcLocator), // base_class = AthService
26{
27 ATH_MSG_DEBUG("TrigCostSvc regular constructor");
28}
#define ATH_MSG_DEBUG(x)
std::mutex m_globalMutex
Used to protect all-slot modifications.
TrigCostDataStore< AlgorithmPayload > m_algStartInfo
Thread-safe store of algorithm start payload.
TrigCostDataStore< TrigTimeStamp > m_algStopTime
Thread-safe store of algorithm stop times.
size_t m_eventSlots
Number of concurrent processing slots.
std::unique_ptr< std::shared_mutex[] > m_slotMutex
Used to control and protect whole-table operations.
std::unique_ptr< std::atomic< bool >[] > m_eventMonitored
Used to cache if the event in a given slot is being monitored.
size_t m_threadCounter
Count how many unique thread ID we have seen.
tbb::concurrent_hash_map< std::thread::id, AlgorithmIdentifier, ThreadHashCompare > m_threadToAlgMap
Keeps track of what is running right now in each thread.
std::unordered_map< uint32_t, uint32_t > m_threadToCounterMap
Map thread's hash ID to a counting numeral.

◆ ~TrigCostSvc()

TrigCostSvc::~TrigCostSvc ( )
virtual

Destructor.

Currently nothing to delete.

Definition at line 32 of file TrigCostSvc.cxx.

32 {
33 // delete[] m_eventMonitored;
34 ATH_MSG_DEBUG("TrigCostSvc destructor()");
35}

Member Function Documentation

◆ checkSlot()

StatusCode TrigCostSvc::checkSlot ( const EventContext & context) const
private

Sanity check that the job is respecting the number of slots which were declared at config time.

Parameters
[in]contextThe event context
Returns
Success if the m_eventMonitored array is range, Failure if access request would overflow

Definition at line 511 of file TrigCostSvc.cxx.

511 {
512 if (context.slot() >= m_eventSlots) {
513 ATH_MSG_FATAL("Job is using event slot #" << context.slot() << ", but we only reserved space for: " << m_eventSlots);
514 return StatusCode::FAILURE;
515 }
516 return StatusCode::SUCCESS;
517}
#define ATH_MSG_FATAL(x)

◆ discardEvent()

StatusCode TrigCostSvc::discardEvent ( const EventContext & context)
overridevirtual

Discard a cost monitored event.

Parameters
[in]contextThe event context

Definition at line 489 of file TrigCostSvc.cxx.

489 {
490
491 if (m_monitorAllEvents) {
492 ATH_MSG_DEBUG("All events are monitored - event will not be discarded");
493 return StatusCode::SUCCESS;
494 }
495
496 ATH_MSG_DEBUG("Cost Event will be discarded");
497 ATH_CHECK(checkSlot(context));
498 {
499 std::unique_lock lockUnique( m_slotMutex[ context.slot() ] );
500
501 // Reset eventMonitored flags
502 m_eventMonitored[ context.slot() ] = false;
503
504 // tables are cleared at the start of the event
505 }
506 return StatusCode::SUCCESS;
507}
#define ATH_CHECK
Evaluate an expression and check for errors.
Gaudi::Property< bool > m_monitorAllEvents
StatusCode checkSlot(const EventContext &context) const
Sanity check that the job is respecting the number of slots which were declared at config time.

◆ endEvent()

StatusCode TrigCostSvc::endEvent ( const EventContext & context,
SG::WriteHandle< xAOD::TrigCompositeContainer > & costOutputHandle,
SG::WriteHandle< xAOD::TrigCompositeContainer > & rosOutputHandle )
overridevirtual

Implementation of ITrigCostSvc::endEvent.

Parameters
[in]contextThe event context
[out]costOutputHandleWrite handle to fill with execution summary if the event was monitored
[out]rosOutputHandleWrite handle to fill with ROS requests summary if the event was monitored

Definition at line 208 of file TrigCostSvc.cxx.

208 {
209 ATH_CHECK(checkSlot(context));
210 if (m_eventMonitored[ context.slot() ] == false) {
211 // This event was not monitored - nothing to do.
212 ATH_MSG_DEBUG("Not a monitored event.");
213 return StatusCode::SUCCESS;
214 }
215
216 // As we will miss the AuditType::After of the TrigCostFinalizeAlg (which is calling this TrigCostSvc::endEvent), let's add it now.
217 // This will be our canonical final timestamps for measuring this event. Similar was done for HLTSeeding at the start
218 ATH_CHECK(processAlg(context, m_costFinalizeAlgName, AuditType::After));
219
220 // Reset eventMonitored flags
221 m_eventMonitored[ context.slot() ] = false;
222
223 // Now that this atomic is set to FALSE, additional algs in this instance which trigger this service will
224 // not be able to call TrigCostSvc::monitor
225
226 // ... but processAlg might already be running in other threads...
227 // Wait to obtain an exclusive lock.
228 std::unique_lock lockUnique( m_slotMutex[ context.slot() ] );
229
230 // we can now perform whole-map inspection of this event's TrigCostDataStores without the danger that it will be changed further
231
232 // Let's start by getting the global STOP time we just wrote
233 uint64_t eventStopTime = 0;
234 {
235 const AlgorithmIdentifier myAi = AlgorithmIdentifierMaker::make(context, m_costFinalizeAlgName, msg());
236 ATH_CHECK( myAi.isValid() );
237 tbb::concurrent_hash_map<AlgorithmIdentifier, TrigTimeStamp, AlgorithmIdentifierHashCompare>::const_accessor stopTimeAcessor;
238 if (m_algStopTime.retrieve(myAi, stopTimeAcessor, msg()).isFailure()) {
239 ATH_MSG_ERROR("No end time for '" << myAi.m_caller << "', '" << myAi.m_store << "'"); // Error as we JUST entered this info!
240 } else { // retrieve was a success
241 //coverity[FORWARD_NULL:FALSE]
242 eventStopTime = stopTimeAcessor->second.microsecondsSinceEpoch();
243 }
244 }
245
246 // And the global START time for the event
247 uint64_t eventStartTime = 0;
248 {
249 const AlgorithmIdentifier hltSeedingAi = AlgorithmIdentifierMaker::make(context, m_costSupervisorAlgName, msg());
250 ATH_CHECK( hltSeedingAi.isValid() );
251 tbb::concurrent_hash_map<AlgorithmIdentifier, AlgorithmPayload, AlgorithmIdentifierHashCompare>::const_accessor startAcessor;
252 if (m_algStartInfo.retrieve(hltSeedingAi, startAcessor, msg()).isFailure()) {
253 ATH_MSG_ERROR("No alg info for '" << hltSeedingAi.m_caller << "', '" << hltSeedingAi.m_store << "'"); // Error as we know this info must be present
254 } else { // retrieve was a success
255 //coverity[FORWARD_NULL:FALSE]
256 eventStartTime = startAcessor->second.m_algStartTime.microsecondsSinceEpoch();
257 }
258 }
259
260 // Read payloads. Write to persistent format
261 tbb::concurrent_hash_map< AlgorithmIdentifier, AlgorithmPayload, AlgorithmIdentifierHashCompare>::const_iterator beginIt;
262 tbb::concurrent_hash_map< AlgorithmIdentifier, AlgorithmPayload, AlgorithmIdentifierHashCompare>::const_iterator endIt;
263 tbb::concurrent_hash_map< AlgorithmIdentifier, AlgorithmPayload, AlgorithmIdentifierHashCompare>::const_iterator it;
264 ATH_CHECK(m_algStartInfo.getIterators(context, msg(), beginIt, endIt));
265
266 ATH_MSG_DEBUG("Monitored event with " << std::distance(beginIt, endIt) << " AlgorithmPayload objects.");
267
268 std::map<size_t, size_t> aiToHandleIndex;
269 for (it = beginIt; it != endIt; ++it) {
270 const AlgorithmIdentifier& ai = it->first;
271 const AlgorithmPayload& ap = it->second;
272 uint64_t startTime = ap.m_algStartTime.microsecondsSinceEpoch();
273
274 // Can we find the end time for this alg? If not, it is probably still running. Hence we use "now" as the default time.
275 uint64_t stopTime = eventStopTime;
276 {
277 tbb::concurrent_hash_map<AlgorithmIdentifier, TrigTimeStamp, AlgorithmIdentifierHashCompare>::const_accessor stopTimeAcessor;
278 if (m_algStopTime.retrieve(ai, stopTimeAcessor, msg()).isFailure()) {
279 ATH_MSG_DEBUG("No end time for '" << ai.m_caller << "', '" << ai.m_store << "'");
280 } else { // retrieve was a success
281 stopTime = stopTimeAcessor->second.microsecondsSinceEpoch();
282 }
283 // stopTimeAcessor goes out of scope - lock released
284 }
285
286 // It is possible (when in the master-slot) to catch just the END of an Alg's exec from another slot, and then the START of the same
287 // alg executing in the next event in that same other-slot.
288 // This gives us an end time which is before the start time. Disregard these entries.
289 if (startTime > stopTime) {
290 ATH_MSG_VERBOSE("Disregard start-time:" << startTime << " > stop-time:" << stopTime
291 << " for " << TrigConf::HLTUtils::hash2string( ai.callerHash(msg()), "ALG") << " in slot " << ap.m_slot << ", this is slot " << context.slot());
292 continue;
293 }
294
295 // Lock the start and stop times to be no later than eventStopTime.
296 // E.g. it's possible for an alg in another slot to start or stop running after 'processAlg(context, m_costFinalizeAlgName, AuditType::After))'
297 // but before 'lockUnique( m_slotMutex[ context.slot() ] )', creating a timestamp after the nominal end point for this event.
298 // If the alg starts afterwards, we disregard it in lieu of setting to have zero walltime.
299 // If the alg stops afterwards, we truncate its stop time to be no later than eventStopTime
300 if (startTime > eventStopTime) {
301 ATH_MSG_VERBOSE("Disregard " << TrigConf::HLTUtils::hash2string( ai.callerHash(msg()), "ALG") << " as it started after endEvent() was finished being called" );
302 continue;
303 }
304 if (stopTime > eventStopTime) {
305 ATH_MSG_VERBOSE(TrigConf::HLTUtils::hash2string( ai.callerHash(msg()), "ALG") << " stopped after endEvent() was called, but before the cost container was locked,"
306 << " truncating its ending time stamp from " << stopTime << " to " << eventStopTime);
307 stopTime = eventStopTime;
308 }
309
310 // Do the same, locking the start and stop times to be no earlier than eventStartTime
311 // If the alg stops before eventStartTime, we disregard it in lieu of setting it to have zero walltime
312 // If the alg starts before eventStartTime, we truncate its start time to be no later than eventStopTime
313 if (stopTime < eventStartTime) {
314 ATH_MSG_VERBOSE("Disregard " << TrigConf::HLTUtils::hash2string( ai.callerHash(msg()), "ALG") << " as it stopped before startEvent() was finished being called" );
315 continue;
316 }
317 if (startTime < eventStartTime) {
318 ATH_MSG_VERBOSE(TrigConf::HLTUtils::hash2string( ai.callerHash(msg()), "ALG") << " started just after the cost container was unlocked, but before the HLTSeeding record was written."
319 << " truncating its starting time stamp from " << startTime << " to " << eventStartTime);
320 startTime = eventStartTime;
321 }
322
323 // Make a new TrigComposite to persist monitoring payload for this alg
325 costOutputHandle->push_back( tc );
326 // tc is now owned by storegate and, and has an aux store provided by the TrigCompositeCollection
327
328 const uint32_t threadID = static_cast<uint32_t>( std::hash< std::thread::id >()(ap.m_algThreadID) );
329 uint32_t threadEnumerator = 0;
330 {
331 // We can have multiple slots get here at the same time
332 std::lock_guard<std::mutex> lock(m_globalMutex);
333 const std::unordered_map<uint32_t, uint32_t>::const_iterator mapIt = m_threadToCounterMap.find(threadID);
334 if (mapIt == m_threadToCounterMap.end()) {
335 threadEnumerator = m_threadCounter;
336 m_threadToCounterMap.insert( std::make_pair(threadID, m_threadCounter++) );
337 } else {
338 threadEnumerator = mapIt->second;
339 }
340 }
341
342 bool result = true;
343 result &= tc->setDetail("alg", ai.callerHash(msg()));
344 result &= tc->setDetail("store", ai.storeHash(msg()));
345 result &= tc->setDetail("view", ai.m_viewID);
346 result &= tc->setDetail("thread", threadEnumerator);
347 result &= tc->setDetail("thash", threadID);
348 result &= tc->setDetail("slot", ap.m_slot);
349 result &= tc->setDetail("roi", ap.m_algROIID);
350 result &= tc->setDetail("start", startTime);
351 result &= tc->setDetail("stop", stopTime);
352 if (!result) ATH_MSG_WARNING("Failed to append one or more details to trigger cost TC");
353
354 aiToHandleIndex[ai.m_hash] = costOutputHandle->size() - 1;
355 }
356
357 typedef tbb::concurrent_hash_map< AlgorithmIdentifier, std::vector<robmonitor::ROBDataMonitorStruct>, AlgorithmIdentifierHashCompare>::const_iterator ROBConstIt;
358 ROBConstIt beginRob;
359 ROBConstIt endRob;
360
361 ATH_CHECK(m_rosData.getIterators(context, msg(), beginRob, endRob));
362
363 for (ROBConstIt it = beginRob; it != endRob; ++it) {
364 size_t aiHash = it->first.m_hash;
365
366 if (aiToHandleIndex.count(aiHash) == 0) {
367 ATH_MSG_WARNING("Algorithm with hash " << aiHash << " not found!");
368 }
369
370 // Save ROB data via TrigComposite
371 for (const robmonitor::ROBDataMonitorStruct& robData : it->second) {
373 rosOutputHandle->push_back(tc);
374
375 // Retrieve ROB requests data into primitives vectors
376 std::vector<uint32_t> robs_id;
377 std::vector<uint32_t> robs_size;
378 std::vector<unsigned> robs_history;
379 std::vector<unsigned short> robs_status;
380
381 robs_id.reserve(robData.requested_ROBs.size());
382 robs_size.reserve(robData.requested_ROBs.size());
383 robs_history.reserve(robData.requested_ROBs.size());
384 robs_status.reserve(robData.requested_ROBs.size());
385
386 for (const auto& rob : robData.requested_ROBs) {
387 robs_id.push_back(rob.second.rob_id);
388 robs_size.push_back(rob.second.rob_size);
389 robs_history.push_back(rob.second.rob_history);
390 robs_status.push_back(rob.second.isStatusOk());
391 }
392
393 bool result = true;
394 result &= tc->setDetail("alg_idx", aiToHandleIndex[aiHash]);
395 result &= tc->setDetail("lvl1ID", robData.lvl1ID);
396 result &= tc->setDetail<std::vector<uint32_t>>("robs_id", robs_id);
397 result &= tc->setDetail<std::vector<uint32_t>>("robs_size", robs_size);
398 result &= tc->setDetail<std::vector<unsigned>>("robs_history", robs_history);
399 result &= tc->setDetail<std::vector<unsigned short>>("robs_status", robs_status);
400 result &= tc->setDetail("start", robData.start_time);
401 result &= tc->setDetail("stop", robData.end_time);
402
403 if (!result) ATH_MSG_WARNING("Failed to append one or more details to trigger cost ROS TC");
404 }
405 }
406
407 if (msg().level() <= MSG::VERBOSE) {
408 ATH_MSG_VERBOSE("--- Trig Cost Event Summary ---");
409 for ( const xAOD::TrigComposite* tc : *costOutputHandle ) {
410 ATH_MSG_VERBOSE("Algorithm:'" << TrigConf::HLTUtils::hash2string( tc->getDetail<TrigConf::HLTHash>("alg"), "ALG") << "'");
411 ATH_MSG_VERBOSE(" Store:'" << TrigConf::HLTUtils::hash2string( tc->getDetail<TrigConf::HLTHash>("store"), "STORE") << "'");
412 ATH_MSG_VERBOSE(" View ID:" << tc->getDetail<int16_t>("view"));
413 ATH_MSG_VERBOSE(" Thread #:" << tc->getDetail<uint32_t>("thread") );
414 ATH_MSG_VERBOSE(" Thread ID Hash:" << tc->getDetail<uint32_t>("thash") );
415 ATH_MSG_VERBOSE(" Slot:" << tc->getDetail<uint32_t>("slot") );
416 ATH_MSG_VERBOSE(" RoI ID Hash:" << tc->getDetail<int32_t>("roi") );
417 ATH_MSG_VERBOSE(" Start Time:" << tc->getDetail<uint64_t>("start") << " mu s");
418 ATH_MSG_VERBOSE(" Stop Time:" << tc->getDetail<uint64_t>("stop") << " mu s");
419 }
420 }
421
422 return StatusCode::SUCCESS;
423}
#define ATH_MSG_ERROR(x)
#define ATH_MSG_VERBOSE(x)
#define ATH_MSG_WARNING(x)
static Double_t tc
static const std::string hash2string(HLTHash, const std::string &category="TE")
hash function translating identifiers into names (via internal dictionary)
virtual StatusCode processAlg(const EventContext &context, const std::string &caller, const AuditType type) override
Implementation of ITrigCostSvc::processAlg.
Gaudi::Property< std::string > m_costFinalizeAlgName
Gaudi::Property< std::string > m_costSupervisorAlgName
TrigCostDataStore< std::vector< robmonitor::ROBDataMonitorStruct > > m_rosData
Thread-safe store of ROS data.
uint64_t start_time
map of ROBs requested
std::map< const uint32_t, robmonitor::ROBDataStruct > requested_ROBs
name of requesting algorithm
uint64_t end_time
start time of ROB request (microsec since epoch)
TrigComposite_v1 TrigComposite
Declare the latest version of the class.
setEventNumber uint32_t
static AlgorithmIdentifier make(const EventContext &context, const std::string &caller, MsgStream &msg, const int16_t slotOverride=-1)
Construct an AlgorithmIdentifier.
std::string m_caller
Name of the algorithm.
std::string m_store
Name of the algorithm's store.
TrigConf::HLTHash callerHash(MsgStream &msg) const
TrigConf::HLTHash storeHash(MsgStream &msg) const
size_t m_hash
Hash of algorithm + store + realSlot.
StatusCode isValid() const
int16_t m_viewID
If not within an event view, then the m_iewID = s_noView = -1.
MsgStream & msg
Definition testRead.cxx:32

◆ finalize()

StatusCode TrigCostSvc::finalize ( )
overridevirtual

Finalize, act on m_saveHashes.

Definition at line 66 of file TrigCostSvc.cxx.

66 {
67 ATH_MSG_DEBUG("TrigCostSvc finalize()");
68 if (m_saveHashes) {
70 ATH_MSG_INFO("Calling hashes2file, saving dump of job's HLT hashing dictionary to disk.");
71 }
72 return StatusCode::SUCCESS;
73}
#define ATH_MSG_INFO(x)
static void hashes2file(const std::string &fileName="hashes2string.txt")
debugging output of internal dictionary
Gaudi::Property< bool > m_saveHashes

◆ generateTimeoutReport()

StatusCode TrigCostSvc::generateTimeoutReport ( const EventContext & context,
std::string & report )
overridevirtual
Returns
Generate timeout report with the most time consuming algorithms
Parameters
[in]contextThe event context
[out]reportCreated report with algorithms and times (in ms)

Definition at line 427 of file TrigCostSvc.cxx.

427 {
428
429 ATH_CHECK(checkSlot(context));
430 if (!m_eventMonitored[context.slot()]) {
431 ATH_MSG_DEBUG("Not a monitored event.");
432 report = "";
433 return StatusCode::SUCCESS;
434 }
435
436 std::unique_lock lockUnique(m_slotMutex[context.slot()]);
437
438 tbb::concurrent_hash_map< AlgorithmIdentifier, AlgorithmPayload, AlgorithmIdentifierHashCompare>::const_iterator beginIt;
439 tbb::concurrent_hash_map< AlgorithmIdentifier, AlgorithmPayload, AlgorithmIdentifierHashCompare>::const_iterator endIt;
440 tbb::concurrent_hash_map< AlgorithmIdentifier, AlgorithmPayload, AlgorithmIdentifierHashCompare>::const_iterator it;
441 ATH_CHECK(m_algStartInfo.getIterators(context, msg(), beginIt, endIt));
442
443 // Create map that sorts in descending order
444 std::map<uint64_t, std::string, std::greater<uint64_t>> timeToAlgMap;
445
446 for (it = beginIt; it != endIt; ++it) {
447 const AlgorithmIdentifier& ai = it->first;
448 const AlgorithmPayload& ap = it->second;
449
450 // Don't look at any records from other slots
451 if (ai.m_realSlot != context.slot()) continue;
452
453 uint64_t startTime = ap.m_algStartTime.microsecondsSinceEpoch();
454 uint64_t stopTime = 0;
455 {
456 tbb::concurrent_hash_map<AlgorithmIdentifier, TrigTimeStamp, AlgorithmIdentifierHashCompare>::const_accessor stopTimeAcessor;
457 if (m_algStopTime.retrieve(ai, stopTimeAcessor, msg()).isFailure()) {
458 ATH_MSG_DEBUG("No end time for '" << ai.m_caller << "', '" << ai.m_store << "'");
459 } else { // retrieve was a success
460 //coverity[FORWARD_NULL:FALSE]
461 stopTime = stopTimeAcessor->second.microsecondsSinceEpoch();
462 }
463 // stopTimeAcessor goes out of scope - lock released
464 }
465
466 if (stopTime == 0) continue;
467
468 timeToAlgMap[stopTime-startTime] = ai.m_caller;
469 }
470
471 // Save top 5 times to the report
472 report = "Timeout detected with the following algorithms consuming the most time: ";
473 int algCounter = 0;
474 for(const std::pair<const uint64_t, std::string>& p : timeToAlgMap){
475 // Save time in miliseconds instead of microseconds
476 report += p.second + " (" + std::to_string(std::lround(p.first/1e3)) + " ms)";
477 ++algCounter;
478 if (algCounter >= 5){
479 break;
480 }
481 report += ", ";
482 }
483
484 return StatusCode::SUCCESS;
485}
list report
Definition checkTP.py:125
size_t m_realSlot
The actual slot of the algorithm.

◆ getROIID()

int32_t TrigCostSvc::getROIID ( const EventContext & context)
private

@breif Internal function to return a RoI from an extended event context context

Parameters
[in]contextThe event context
Returns
RoIId from the ATLAS extended event context. Or, AlgorithmIdentifier::s_noView = -1 for no RoIIdentifier

Definition at line 521 of file TrigCostSvc.cxx.

521 {
522 if (Atlas::hasExtendedEventContext(context)) {
523 const IRoiDescriptor* roi = Atlas::getExtendedEventContext(context).roiDescriptor();
524 if (roi) return static_cast<int32_t>(roi->roiId());
525 }
527}
const IRoiDescriptor * roiDescriptor() const
Get cached pointer to View's Region of Interest Descriptor or nullptr if not describing a View.
virtual unsigned int roiId() const =0
identifiers
const ExtendedEventContext & getExtendedEventContext(const EventContext &ctx)
Retrieve an extended context from a context object.
bool hasExtendedEventContext(const EventContext &ctx)
Test whether a context object has an extended context installed.
static constexpr int16_t s_noView
Constant value used to express an Algorithm which is not running in a View.

◆ initialize()

StatusCode TrigCostSvc::initialize ( )
overridevirtual

Initialise, create enough storage to store m_eventSlots.

Definition at line 40 of file TrigCostSvc.cxx.

40 {
41 ATH_MSG_DEBUG("TrigCostSvc initialize()");
43 // TODO Remove this when the configuration is correctly propagated in config-then-run jobs
44 if (!m_eventSlots) {
45 ATH_MSG_WARNING("numConcurrentEvents() == 0. This is a misconfiguration, probably coming from running from pickle. "
46 "Setting local m_eventSlots to a 'large' number until this is fixed to allow the job to proceed.");
47 m_eventSlots = 100;
48 }
49 ATH_MSG_INFO("Initializing TrigCostSvc with " << m_eventSlots << " event slots");
50
51 // We cannot have a vector here as atomics are not movable nor copyable. Unique heap arrays are supported by C++
52 m_eventMonitored = std::make_unique< std::atomic<bool>[] >( m_eventSlots );
53 m_slotMutex = std::make_unique< std::shared_mutex[] >( m_eventSlots );
54
55 for (size_t i = 0; i < m_eventSlots; ++i) m_eventMonitored[i] = false;
56
59 ATH_CHECK(m_rosData.initialize(m_eventSlots));
60
61 return StatusCode::SUCCESS;
62}
size_t getNSlots()
Return the number of event slots.

◆ isMonitoredEvent()

bool TrigCostSvc::isMonitoredEvent ( const EventContext & context,
const bool includeMultiSlot = true ) const
overridevirtual
Returns
If the current context is flagged as being monitored.
Parameters
[in]contextThe event context

Definition at line 531 of file TrigCostSvc.cxx.

531 {
532 if (m_eventMonitored[ context.slot() ]) {
533 return true;
534 }
535 if (includeMultiSlot && m_enableMultiSlot) {
537 }
538 return false;
539}
Gaudi::Property< bool > m_enableMultiSlot
Gaudi::Property< size_t > m_masterSlot

◆ monitor()

StatusCode TrigCostSvc::monitor ( const EventContext & context,
const AlgorithmIdentifier & ai,
const TrigTimeStamp & now,
const AuditType type )
private

Internal call to save monitoring data for a given AlgorithmIdentifier.

Parameters
[in]contextThe event context
[in]aiThe AlgorithmIdentifier key to store
[in]nowThe timestamp to store (amoung other values)
[in]typeThe type of the audit event to store
Returns
Success if the data are saved

Definition at line 141 of file TrigCostSvc.cxx.

141 {
142
143 if (type == AuditType::Before) {
144
145 AlgorithmPayload ap {
146 now,
147 std::this_thread::get_id(),
148 getROIID(context),
149 static_cast<uint32_t>(context.slot())
150 };
151 ATH_CHECK( m_algStartInfo.insert(ai, ap, msg()) );
152
153 // Cache the AlgorithmIdentifier which has just started executing on this thread
154 if (ai.m_realSlot == ai.m_slotToSaveInto) {
155 tbb::concurrent_hash_map<std::thread::id, AlgorithmIdentifier, ThreadHashCompare>::accessor acc;
156 m_threadToAlgMap.insert(acc, ap.m_algThreadID);
157 acc->second = ai;
158 }
159
160 } else if (type == AuditType::After) {
161
162 ATH_CHECK( m_algStopTime.insert(ai, now, msg()) );
163
164 } else {
165
166 ATH_MSG_ERROR("Only expecting AuditType::Before or AuditType::After");
167 return StatusCode::FAILURE;
168
169 }
170
171 return StatusCode::SUCCESS;
172}
int32_t getROIID(const EventContext &context)
@breif Internal function to return a RoI from an extended event context context
size_t m_slotToSaveInto
The slot which is used for the purposes of recording data on this algorithm's execution.

◆ monitorROS()

StatusCode TrigCostSvc::monitorROS ( const EventContext & context,
robmonitor::ROBDataMonitorStruct payload )
overridevirtual

Implementation of ITrigCostSvc::monitorROS.

Parameters
[in]contextThe event context
[in]payloadROB data to be associated with ROS

Definition at line 177 of file TrigCostSvc.cxx.

177 {
178 ATH_CHECK(checkSlot(context));
179 ATH_MSG_DEBUG( "Received ROB payload " << payload );
180
181 // Associate payload with an algorithm
182 AlgorithmIdentifier theAlg;
183 {
184 tbb::concurrent_hash_map<std::thread::id, AlgorithmIdentifier, ThreadHashCompare>::const_accessor acc;
185 bool result = m_threadToAlgMap.find(acc, std::this_thread::get_id());
186 //checking the return type 'result' is sufficient to know whether acc is bound
187 if (!result){
188 ATH_MSG_WARNING( "Cannot find algorithm on this thread (id=" << std::this_thread::get_id() << "). Request "<< payload <<" won't be monitored");
189 return StatusCode::SUCCESS;
190 }
191 //coverity[FORWARD_NULL:FALSE]
192 theAlg = acc->second;
193 }
194
195 // Record data in TrigCostDataStore
196 ATH_MSG_DEBUG( "Adding ROBs from" << payload.requestor_name << " to " << theAlg.m_hash );
197 {
198 std::shared_lock lockShared( m_slotMutex[ context.slot() ] );
199 ATH_CHECK( m_rosData.push_back(theAlg, std::move(payload), msg()) );
200 }
201
202 return StatusCode::SUCCESS;
203}

◆ processAlg()

StatusCode TrigCostSvc::processAlg ( const EventContext & context,
const std::string & caller,
const AuditType type )
overridevirtual

Implementation of ITrigCostSvc::processAlg.

Parameters
[in]contextThe event context
[in]callerName of the algorithm to audit CPU usage for
[in]typeIf we are Before or After the algorithm's execution

Definition at line 106 of file TrigCostSvc.cxx.

106 {
107 ATH_CHECK(checkSlot(context));
108
109 TrigTimeStamp now;
110
111 // Do per-event within-slot monitoring
112 if (m_eventMonitored[ context.slot() ]) {
113 // Multiple simultaneous calls allowed here, adding their data to the concurrent map.
114 std::shared_lock lockShared( m_slotMutex[ context.slot() ] );
115
116 AlgorithmIdentifier ai = AlgorithmIdentifierMaker::make(context, caller, msg());
117 ATH_CHECK( ai.isValid() );
118
119 ATH_CHECK(monitor(context, ai, now, type));
120
121 ATH_MSG_VERBOSE("Caller '" << caller << "', '" << ai.m_store << "', slot:" << context.slot() << " "
122 << (type == AuditType::Before ? "BEGAN" : "ENDED") << " at " << now.microsecondsSinceEpoch());
123 }
124
125 // MultiSlot mode: do per-event monitoring of all slots, but saving the data within the master-slot
126 if (m_enableMultiSlot && context.slot() != m_masterSlot && m_eventMonitored[ m_masterSlot ]) {
127 std::shared_lock lockShared( m_slotMutex[ m_masterSlot ] );
128
129 // Note: we override the storage location of these data from all other slots to be saved in the MasterSlot
130 AlgorithmIdentifier ai = AlgorithmIdentifierMaker::make(context, caller, msg(), m_masterSlot);
131 ATH_CHECK( ai.isValid() );
132
133 ATH_CHECK(monitor(context, ai, now, type));
134 }
135
136 return StatusCode::SUCCESS;
137}
StatusCode monitor(const EventContext &context, const AlgorithmIdentifier &ai, const TrigTimeStamp &now, const AuditType type)
Internal call to save monitoring data for a given AlgorithmIdentifier.

◆ startEvent()

StatusCode TrigCostSvc::startEvent ( const EventContext & context,
const bool enableMonitoring = true )
overridevirtual

Implementation of ITrigCostSvc::startEvent.

Parameters
[in]contextThe event context
[in]enableMonitoringSets if the event should be monitored or not. Not monitoring will save CPU
Returns
Success unless monitoring is enabled and the service's data stores can not be cleared for some reason

Definition at line 77 of file TrigCostSvc.cxx.

77 {
78 const bool monitoredEvent = (enableMonitoring || m_monitorAllEvents);
79 ATH_CHECK(checkSlot(context));
80
81 m_eventMonitored[ context.slot() ] = false;
82
83 {
84 // "clear" is a whole table operation, we need it all to ourselves
85 std::unique_lock lockUnique( m_slotMutex[ context.slot() ] );
86 if (monitoredEvent) {
87 // Empty transient thread-safe stores in preparation for recording this event's cost data
88 ATH_CHECK(m_algStartInfo.clear(context, msg()));
89 ATH_CHECK(m_algStopTime.clear(context, msg()));
90 ATH_CHECK(m_rosData.clear(context, msg()));
91 }
92
93 // Enable collection of data in this slot for monitoredEvents
94 m_eventMonitored[ context.slot() ] = monitoredEvent;
95 }
96
97 // As we missed the AuditType::Before of the TrigCostSupervisorAlg (which is calling this TrigCostSvc::startEvent), let's add it now.
98 // This will be our canonical initial timestamps for measuring this event. Similar will be done for DecisionSummaryMakerAlg at the end
99 ATH_CHECK(processAlg(context, m_costSupervisorAlgName, AuditType::Before));
100
101 return StatusCode::SUCCESS;
102}

Member Data Documentation

◆ m_algStartInfo

TrigCostDataStore<AlgorithmPayload> TrigCostSvc::m_algStartInfo
private

Thread-safe store of algorithm start payload.

Definition at line 150 of file TrigCostSvc.h.

◆ m_algStopTime

TrigCostDataStore<TrigTimeStamp> TrigCostSvc::m_algStopTime
private

Thread-safe store of algorithm stop times.

Definition at line 151 of file TrigCostSvc.h.

◆ m_costFinalizeAlgName

Gaudi::Property<std::string> TrigCostSvc::m_costFinalizeAlgName {this, "CostFinalizeAlgName", "TrigCostFinalizeAlg", "The name of cost monitoring finalize algorithm, starting at the end of the event"}
private

Definition at line 165 of file TrigCostSvc.h.

165{this, "CostFinalizeAlgName", "TrigCostFinalizeAlg", "The name of cost monitoring finalize algorithm, starting at the end of the event"};

◆ m_costSupervisorAlgName

Gaudi::Property<std::string> TrigCostSvc::m_costSupervisorAlgName {this, "CostSupervisorAlgName", "TrigCostSupervisorAlg", "The name of cost monitoring supervising algorithm, starting at the begining of the event"}
private

Definition at line 164 of file TrigCostSvc.h.

164{this, "CostSupervisorAlgName", "TrigCostSupervisorAlg", "The name of cost monitoring supervising algorithm, starting at the begining of the event"};

◆ m_enableMultiSlot

Gaudi::Property<bool> TrigCostSvc::m_enableMultiSlot {this, "EnableMultiSlot", false, "Monitored events in the MasterSlot collect data from events running in other slots."}
private

Definition at line 161 of file TrigCostSvc.h.

161{this, "EnableMultiSlot", false, "Monitored events in the MasterSlot collect data from events running in other slots."};

◆ m_eventMonitored

std::unique_ptr< std::atomic<bool>[] > TrigCostSvc::m_eventMonitored
private

Used to cache if the event in a given slot is being monitored.

Definition at line 147 of file TrigCostSvc.h.

◆ m_eventSlots

size_t TrigCostSvc::m_eventSlots
private

Number of concurrent processing slots.

Cached from Gaudi

Definition at line 146 of file TrigCostSvc.h.

◆ m_globalMutex

std::mutex TrigCostSvc::m_globalMutex
private

Used to protect all-slot modifications.

Definition at line 149 of file TrigCostSvc.h.

◆ m_masterSlot

Gaudi::Property<size_t> TrigCostSvc::m_masterSlot {this, "MasterSlot", 0, "The slot responsible for saving MultiSlot data"}
private

Definition at line 163 of file TrigCostSvc.h.

163{this, "MasterSlot", 0, "The slot responsible for saving MultiSlot data"};

◆ m_monitorAllEvents

Gaudi::Property<bool> TrigCostSvc::m_monitorAllEvents {this, "MonitorAllEvents", false, "Monitor every HLT event, e.g. for offline validation."}
private

Definition at line 160 of file TrigCostSvc.h.

160{this, "MonitorAllEvents", false, "Monitor every HLT event, e.g. for offline validation."};

◆ m_rosData

TrigCostDataStore<std::vector<robmonitor::ROBDataMonitorStruct> > TrigCostSvc::m_rosData
private

Thread-safe store of ROS data.

Definition at line 152 of file TrigCostSvc.h.

◆ m_saveHashes

Gaudi::Property<bool> TrigCostSvc::m_saveHashes {this, "SaveHashes", false, "Store a copy of the hash dictionary for easier debugging"}
private

Definition at line 162 of file TrigCostSvc.h.

162{this, "SaveHashes", false, "Store a copy of the hash dictionary for easier debugging"};

◆ m_slotMutex

std::unique_ptr< std::shared_mutex[] > TrigCostSvc::m_slotMutex
private

Used to control and protect whole-table operations.

Definition at line 148 of file TrigCostSvc.h.

◆ m_threadCounter

size_t TrigCostSvc::m_threadCounter
private

Count how many unique thread ID we have seen.

Definition at line 157 of file TrigCostSvc.h.

◆ m_threadToAlgMap

tbb::concurrent_hash_map<std::thread::id, AlgorithmIdentifier, ThreadHashCompare> TrigCostSvc::m_threadToAlgMap
private

Keeps track of what is running right now in each thread.

Definition at line 154 of file TrigCostSvc.h.

◆ m_threadToCounterMap

std::unordered_map<uint32_t, uint32_t> TrigCostSvc::m_threadToCounterMap
private

Map thread's hash ID to a counting numeral.

Definition at line 156 of file TrigCostSvc.h.


The documentation for this class was generated from the following files: