ATLAS Offline Software
Public Member Functions | Private Types | Private Member Functions | Private Attributes | List of all members
PerfMonMTSvc Class Reference

#include <PerfMonMTSvc.h>

Inheritance diagram for PerfMonMTSvc:
Collaboration diagram for PerfMonMTSvc:

Public Member Functions

 PerfMonMTSvc (const std::string &name, ISvcLocator *pSvcLocator)
 Standard Gaudi Service constructor. More...
 
virtual ~PerfMonMTSvc ()=default
 
virtual void handle (const Incident &incident) override
 Incident service handle for post-finalize. More...
 
virtual StatusCode initialize () override
 Standard Gaudi Service initialization. More...
 
virtual StatusCode finalize () override
 Standard Gaudi Service finalization. More...
 
virtual void startAud (const std::string &stepName, const std::string &compName) override
 Start Auditing. More...
 
virtual void stopAud (const std::string &stepName, const std::string &compName) override
 Stop Auditing. More...
 
void startSnapshotAud (const std::string &stepName, const std::string &compName)
 Snapshot Auditing: Take snapshots at the beginning and at the end of each step. More...
 
void stopSnapshotAud (const std::string &stepName, const std::string &compName)
 
void startCompAud (const std::string &stepName, const std::string &compName, const EventContext &ctx)
 Component Level Auditing: Take measurements at the beginning and at the end of each component call. More...
 
void stopCompAud (const std::string &stepName, const std::string &compName, const EventContext &ctx)
 
void report ()
 Report the results. More...
 
void report2Log ()
 Report to log. More...
 
void report2Log_Description () const
 
void report2Log_ComponentLevel ()
 
void report2Log_EventLevel_instant () const
 
void report2Log_EventLevel ()
 
void report2Log_Summary ()
 
void report2Log_CpuInfo () const
 
void report2Log_EnvInfo () const
 
void report2JsonFile ()
 Report to the JSON File. More...
 
void report2JsonFile_Summary (nlohmann::json &j) const
 
void report2JsonFile_ComponentLevel (nlohmann::json &j) const
 
void report2JsonFile_EventLevel (nlohmann::json &j) const
 
void aggregateSlotData ()
 A few helper functions. More...
 
void divideData2Steps ()
 
std::string scaleTime (double timeMeas) const
 
std::string scaleMem (int64_t memMeas) const
 
std::string get_info_from_file (const std::string &fileName, const std::string &fieldName) const
 A few helper methods to get system information These should be carried to PerfMonMTUtils at some point. More...
 
std::string get_cpu_model_info () const
 
int get_cpu_core_info () const
 
uint64_t get_memory_info () const
 
PMonMT::StepComp generate_state (const std::string &stepName, const std::string &compName) const
 

Private Types

enum  Snapshots {
  CONFIGURE, INITIALIZE, FIRSTEVENT, EXECUTE,
  FINALIZE, NSNAPSHOTS
}
 
typedef std::map< PMonMT::StepComp, PMonMT::ComponentData * > data_map_t
 
typedef std::map< PMonMT::StepComp, std::unique_ptr< PMonMT::ComponentData > > data_map_unique_t
 

Private Member Functions

int getCpuEfficiency () const
 

Private Attributes

PMonMT::SnapshotMeasurement m_measurementSnapshots
 Measurement to capture snapshots. More...
 
PMonMT::SnapshotMeasurement m_measurementEvents
 Measurement to capture events. More...
 
Gaudi::Property< bool > m_doEventLoopMonitoring
 Do event loop monitoring. More...
 
Gaudi::Property< bool > m_doComponentLevelMonitoring
 Do component level monitoring. More...
 
Gaudi::Property< bool > m_reportResultsToJSON {this, "reportResultsToJSON", true, "Report results into the json file."}
 Report results to JSON. More...
 
Gaudi::Property< std::string > m_jsonFileName
 Name of the JSON file. More...
 
Gaudi::Property< bool > m_printDetailedTables
 Print detailed tables. More...
 
Gaudi::Property< uint64_t > m_memFitLowerLimit
 Lower limit (in number of events) for the memory fit. More...
 
Gaudi::Property< uint64_t > m_checkPointThreshold
 Frequency of event level monitoring. More...
 
Gaudi::Property< double > m_wallTimeOffset {this, "wallTimeOffset", 0, "Job start wall time in miliseconds."}
 Offset for the wall-time, comes from configuration. More...
 
Gaudi::Property< int > m_printNComps
 Print the top N components. More...
 
Gaudi::Property< int > m_numberOfThreads {this, "numberOfThreads", 1, "Number of threads in the job."}
 Get the number of threads. More...
 
Gaudi::Property< int > m_numberOfSlots {this, "numberOfSlots", 1, "Number of slots in the job."}
 Get the number of slots. More...
 
Gaudi::Property< uint64_t > m_eventLoopMsgLimit {this, "eventLoopMsgLimit", 10, "Maximum number of event-level messages."}
 Set the number of messages for the event-level report. More...
 
const std::set< std::string > m_exclusionSet
 Exclude some common components from monitoring In the future this might be converted to a inclusion set which would allow user to monitor only a set of algorithms... More...
 
int m_motherPID
 Snapshots data. More...
 
std::vector< PMonMT::SnapshotDatam_snapshotData
 
std::vector< std::string > m_snapshotStepNames = {"Configure", "Initialize", "FirstEvent", "Execute", "Finalize"}
 
PMonMT::EventLevelData m_eventLevelData {}
 
std::mutex m_mutex_capture
 
std::atomic< bool > m_isFirstEvent
 
std::atomic< uint64_t > m_eventCounter
 
std::atomic< uint64_t > m_eventLoopMsgCounter
 
std::atomic< double > m_checkPointTime
 
std::atomic< bool > m_isEvtLoopStopped
 
data_map_t m_compLevelDataMap
 
std::vector< data_map_unique_tm_compLevelDataMapVec
 
data_map_t m_compLevelDataMap_ini
 
data_map_t m_compLevelDataMap_1stevt
 
data_map_t m_compLevelDataMap_evt
 
data_map_t m_compLevelDataMap_fin
 
data_map_t m_compLevelDataMap_plp
 
data_map_t m_compLevelDataMap_cbk
 
std::vector< data_map_tm_stdoutVec_serial
 
PerfMon::LinFitSglPass m_fit_vmem
 
PerfMon::LinFitSglPass m_fit_pss
 

Detailed Description

Definition at line 41 of file PerfMonMTSvc.h.

Member Typedef Documentation

◆ data_map_t

Definition at line 189 of file PerfMonMTSvc.h.

◆ data_map_unique_t

typedef std::map<PMonMT::StepComp, std::unique_ptr<PMonMT::ComponentData> > PerfMonMTSvc::data_map_unique_t
private

Definition at line 190 of file PerfMonMTSvc.h.

Member Enumeration Documentation

◆ Snapshots

Enumerator
CONFIGURE 
INITIALIZE 
FIRSTEVENT 
EXECUTE 
FINALIZE 
NSNAPSHOTS 

Definition at line 163 of file PerfMonMTSvc.h.

Constructor & Destructor Documentation

◆ PerfMonMTSvc()

PerfMonMTSvc::PerfMonMTSvc ( const std::string &  name,
ISvcLocator *  pSvcLocator 
)

Standard Gaudi Service constructor.

Definition at line 33 of file PerfMonMTSvc.cxx.

34  : base_class(name, pSvcLocator), m_isFirstEvent{false}, m_eventCounter{0}, m_eventLoopMsgCounter{0}, m_checkPointTime{0}, m_isEvtLoopStopped{false} {
35  // Five main snapshots : Configure, Initialize, FirstEvent, Execute, and Finalize
36  m_motherPID = getpid();
37  m_snapshotData.resize(NSNAPSHOTS); // Default construct
38 
39  // Initial capture upon construction
43 }

◆ ~PerfMonMTSvc()

virtual PerfMonMTSvc::~PerfMonMTSvc ( )
virtualdefault

Member Function Documentation

◆ aggregateSlotData()

void PerfMonMTSvc::aggregateSlotData ( )

A few helper functions.

Definition at line 782 of file PerfMonMTSvc.cxx.

782  {
783  // Loop over data from all slots
784  for (const auto& slotData : m_compLevelDataMapVec) {
785  for (const auto& it : slotData) {
786  // Copy the first slot data and sum the rest
787  if(m_compLevelDataMap.find(it.first) == m_compLevelDataMap.end()) {
788  m_compLevelDataMap.insert({it.first, it.second.get()});
789  } else {
790  m_compLevelDataMap[it.first]->add2CallCount(it.second->getCallCount());
791  m_compLevelDataMap[it.first]->add2DeltaCPU(it.second->getDeltaCPU());
792  m_compLevelDataMap[it.first]->add2DeltaWall(it.second->getDeltaWall());
793  m_compLevelDataMap[it.first]->add2DeltaVmem(it.second->getDeltaVmem());
794  m_compLevelDataMap[it.first]->add2DeltaMalloc(it.second->getDeltaMalloc());
795  }
796  // Do a quick consistency check here and print any suspicious measurements.
797  // Timing measurements should always be positive definite
798  if(it.second->getDeltaCPU() < 0) {
799  ATH_MSG_WARNING("Negative CPU-time measurement of " << it.second->getDeltaCPU() <<
800  " ms for component " << it.first.compName <<
801  " in step " << it.first.stepName);
802  }
803  if(it.second->getDeltaWall() < 0) {
804  ATH_MSG_WARNING("Negative Wall-time measurement of " << it.second->getDeltaWall() <<
805  " ms for component " << it.first.compName <<
806  " in step " << it.first.stepName);
807  }
808  }
809  }
810 }

◆ divideData2Steps()

void PerfMonMTSvc::divideData2Steps ( )

Definition at line 815 of file PerfMonMTSvc.cxx.

815  {
816  for (const auto &it : m_compLevelDataMap) {
817  if (it.first.stepName == "Initialize")
818  m_compLevelDataMap_ini[it.first] = it.second;
819  else if (it.first.stepName == "FirstEvent")
820  m_compLevelDataMap_1stevt[it.first] = it.second;
821  else if (it.first.stepName == "Execute")
822  m_compLevelDataMap_evt[it.first] = it.second;
823  else if (it.first.stepName == "Finalize")
824  m_compLevelDataMap_fin[it.first] = it.second;
825  else if (it.first.stepName == "preLoadProxy")
826  m_compLevelDataMap_plp[it.first] = it.second;
827  else if (it.first.stepName == "Callback")
828  m_compLevelDataMap_cbk[it.first] = it.second;
829  }
836 }

◆ finalize()

StatusCode PerfMonMTSvc::finalize ( )
overridevirtual

Standard Gaudi Service finalization.

Definition at line 94 of file PerfMonMTSvc.cxx.

94  {
95  // Print where we are
96  ATH_MSG_INFO("Finalizing " << name());
97 
98  return StatusCode::SUCCESS;
99 }

◆ generate_state()

PMonMT::StepComp PerfMonMTSvc::generate_state ( const std::string &  stepName,
const std::string &  compName 
) const

Definition at line 772 of file PerfMonMTSvc.cxx.

772  {
773  PMonMT::StepComp currentState;
774  currentState.stepName = (m_isFirstEvent && stepName == "Execute") ? "FirstEvent" : stepName;
775  currentState.compName = compName;
776  return currentState;
777 }

◆ get_cpu_core_info()

int PerfMonMTSvc::get_cpu_core_info ( ) const

Definition at line 925 of file PerfMonMTSvc.cxx.

925  {
926  std::string val = get_info_from_file("/proc/cpuinfo","processor");
927  if (val.empty()) return 0;
928  return std::stoi(val) + 1;
929 }

◆ get_cpu_model_info()

std::string PerfMonMTSvc::get_cpu_model_info ( ) const

Definition at line 920 of file PerfMonMTSvc.cxx.

920  {
921  return get_info_from_file("/proc/cpuinfo","model name") +
922  get_info_from_file("/proc/cpuinfo","cache size");
923 }

◆ get_info_from_file()

std::string PerfMonMTSvc::get_info_from_file ( const std::string &  fileName,
const std::string &  fieldName 
) const

A few helper methods to get system information These should be carried to PerfMonMTUtils at some point.

Definition at line 892 of file PerfMonMTSvc.cxx.

893  {
894  // Helper function to read files of type Key : Value
895  // Returns the last instance if there are multiple matches
896  // This is because we use this method to get the processor count
897  std::string result{""};
898 
899  std::ifstream file{fileName};
900  std::string line{""};
901 
902  while (std::getline(file, line)) {
903  if (line.empty()) continue;
904  size_t splitIdx = line.find(':');
905  if (splitIdx != std::string::npos) {
906  std::string val = line.substr(splitIdx + 1);
907  if (val.empty()) continue;
908  if (line.size() >= fieldName.size() &&
909  line.compare(0, fieldName.size(), fieldName) == 0) {
910  result = std::move(val);
911  }
912  }
913  }
914 
915  file.close();
916 
917  return result;
918 }

◆ get_memory_info()

uint64_t PerfMonMTSvc::get_memory_info ( ) const

Definition at line 931 of file PerfMonMTSvc.cxx.

931  {
932  std::string val = get_info_from_file("/proc/meminfo","MemTotal");
933  if (val.empty()) return 0;
934  val.resize(val.size() - 3); // strip the trailing kB
935  return std::stoull(val);
936 }

◆ getCpuEfficiency()

int PerfMonMTSvc::getCpuEfficiency ( ) const
private

Definition at line 352 of file PerfMonMTSvc.cxx.

352  {
353 
354  // In AthenaMT only the event-loop is executed concurrently
355  // In this metric, we scale the event-loop wall-time by
356  // the number of slots to take the concurrency into account
357  // Then we divide the total cpu-time by this number
358  // It's A metric not THE metric...
359 
360  const double totalCpuTime =
361  m_snapshotData[CONFIGURE].getDeltaCPU() +
362  m_snapshotData[INITIALIZE].getDeltaCPU() +
363  m_snapshotData[FIRSTEVENT].getDeltaCPU() +
364  m_snapshotData[EXECUTE].getDeltaCPU() +
365  m_snapshotData[FINALIZE].getDeltaCPU();
366 
367  const double scaledWallTime =
368  m_snapshotData[CONFIGURE].getDeltaWall() * 1. +
369  m_snapshotData[INITIALIZE].getDeltaWall() * 1. +
370  m_snapshotData[FIRSTEVENT].getDeltaWall() * 1. +
371  m_snapshotData[EXECUTE].getDeltaWall() * m_numberOfSlots +
372  m_snapshotData[FINALIZE].getDeltaWall() * 1.;
373 
374  return ( scaledWallTime > 0 ? totalCpuTime / scaledWallTime * 100. : 0 );
375 
376 }

◆ handle()

void PerfMonMTSvc::handle ( const Incident &  incident)
overridevirtual

Incident service handle for post-finalize.

Definition at line 104 of file PerfMonMTSvc.cxx.

104  {
105  // Begin event processing
106  if (inc.type() == IncidentType::BeginEvent) {
107  // Lock for data integrity
108  std::lock_guard<std::mutex> lock(m_mutex_capture);
109 
110  // Increment the internal counter
111  m_eventCounter++;
112 
113  // Get current time in seconds
114  double currentTime = PMonMT::get_wall_time()*0.001;
115 
116  // Monitor
118  // Overwrite the last measurement time
119  m_checkPointTime = currentTime;
120 
121  // Capture
124  // Report instantly - no more than m_eventLoopMsgLimit times
128  }
129  }
130  }
131  // End event processing (as signaled by SG clean-up)
132  // By convention the first event is executed serially
133  // Therefore, we treat it a little bit differently
134  else if (m_eventCounter == 1 && inc.type() == "EndAlgorithms") {
135  // In AthenaMP w/ fork-after-initialize, the loop starts
136  // in the mother process but the first event is actually
137  // executed in the worker. Here, we try to work around this
138  // by resetting the first event measurement if we think
139  // we're in AthenaMP. This is not an ideal approach but
140  // gets the job done for the fork-after-initialize case.
141  if (m_motherPID != getpid()) {
142  m_snapshotData[FIRSTEVENT].m_tmp_cpu = 0;
143  m_snapshotData[FIRSTEVENT].m_memMonTmpMap["vmem"] = 0;
144  m_snapshotData[FIRSTEVENT].m_memMonTmpMap["pss"] = 0;
145  m_snapshotData[FIRSTEVENT].m_memMonTmpMap["rss"] = 0;
146  m_snapshotData[FIRSTEVENT].m_memMonTmpMap["swap"] = 0;
147  }
151  // Normally this flag is set in stopCompAud but we don't
152  // go in there unless m_doComponentLevelMonitoring is true.
153  // If it's false, we toggle it here but
154  // this is mostly for completeness since in that mode
155  // this flag is not really used at the moment.
157  m_isFirstEvent = false;
158  }
159  }
160  // This incident is fired by only some loop managers to signal the end of event processing
161  else if (inc.type() == "EndEvtLoop") {
162  m_isEvtLoopStopped = true;
165  }
166  // Finalize ourself and print the metrics in SvcPostFinalize
167  else if (inc.type() == IncidentType::SvcPostFinalize) {
168  // Final capture upon post-finalization
171 
172  // Report everything
173  report();
174  }
175  return;
176 }

◆ initialize()

StatusCode PerfMonMTSvc::initialize ( )
overridevirtual

Standard Gaudi Service initialization.

Configure the auditor

Definition at line 48 of file PerfMonMTSvc.cxx.

48  {
49  // Print where we are
50  ATH_MSG_INFO("Initializing " << name());
51 
52  // Set to be listener to SvcPostFinalize
53  ServiceHandle<IIncidentSvc> incSvc("IncidentSvc/IncidentSvc", name());
54  ATH_CHECK(incSvc.retrieve());
55  const long highestPriority = static_cast<long>(-1);
56  const long lowestPriority = 0;
57  incSvc->addListener(this, IncidentType::BeginEvent, highestPriority);
58  incSvc->addListener(this, "EndAlgorithms", lowestPriority);
59  incSvc->addListener(this, "EndEvtLoop", highestPriority);
60  incSvc->addListener(this, IncidentType::SvcPostFinalize);
61 
62  // Check if /proc exists, if not memory statistics are not available
63  const bool procExists = PMonMT::doesDirectoryExist("/proc");
64  if(!procExists) {
65  ATH_MSG_INFO("The system doesn't support /proc. Therefore, memory measurements are not available");
66  }
67 
68  // Print some information minimal information about our configuration
69  ATH_MSG_INFO("Service is configured for [" << m_numberOfThreads.toString() << "] threads " <<
70  "analyzing [" << m_numberOfSlots.toString() << "] events concurrently");
71  ATH_MSG_INFO("Component-level measurements are [" << (m_doComponentLevelMonitoring ? "Enabled" : "Disabled") << "]");
73  ATH_MSG_INFO(" >> Component-level memory monitoring in the event-loop is disabled in jobs with more than 1 thread");
74  }
75 
76  // Thread specific component-level data map
77  m_compLevelDataMapVec.resize(m_numberOfThreads+1); // Default construct
78 
79  // Set wall time offset
81  if (m_wallTimeOffset > 0) {
82  m_snapshotData[CONFIGURE].add2DeltaWall(-m_wallTimeOffset);
83  }
84 
86  ATH_CHECK(auditorSvc()->addAuditor("PerfMonMTAuditor"));
87 
88  return StatusCode::SUCCESS;
89 }

◆ report()

void PerfMonMTSvc::report ( )

Report the results.

Definition at line 381 of file PerfMonMTSvc.cxx.

381  {
382  // Write into log file
383  report2Log();
384 
385  // Write into JSON
386  if (m_reportResultsToJSON) {
387  report2JsonFile();
388  }
389 }

◆ report2JsonFile()

void PerfMonMTSvc::report2JsonFile ( )

Report to the JSON File.

Definition at line 613 of file PerfMonMTSvc.cxx.

613  {
614  nlohmann::json j;
615 
616  // CPU and Wall-time
617  report2JsonFile_Summary(j); // Snapshots
618 
619  // Memory
621  report2JsonFile_ComponentLevel(j); // Component-level
622  }
624  report2JsonFile_EventLevel(j); // Event-level
625  }
626 
627  // Write and close the JSON file
628  std::ofstream o(m_jsonFileName);
629  o << std::setw(4) << j << std::endl;
630  o.close();
631 
632  // Compress the JSON file into tar.gz
633  auto cmd = "tar -czf " + m_jsonFileName + ".tar.gz " + m_jsonFileName + ";";
634  int rc = std::system(cmd.c_str());
635  if(rc!=0) {
636  ATH_MSG_WARNING("Couldn't compress the JSON file...");
637  return;
638  }
639 
640  // Remove the uncompressed JSON file to save disk-space
641  rc = std::remove(m_jsonFileName.toString().c_str());
642  if(rc!=0) {
643  ATH_MSG_WARNING("Couldn't remove the uncompressed JSON file...");
644  return;
645  }
646 }

◆ report2JsonFile_ComponentLevel()

void PerfMonMTSvc::report2JsonFile_ComponentLevel ( nlohmann::json j) const

Definition at line 721 of file PerfMonMTSvc.cxx.

721  {
722 
723  for (const auto& dataMapPerStep : m_stdoutVec_serial) {
724 
725  for(const auto& meas : dataMapPerStep){
726 
727  const std::string step = meas.first.stepName;
728  const std::string component = meas.first.compName;
729  const uint64_t count = meas.second->getCallCount();
730  const double cpuTime = meas.second->getDeltaCPU();
731  const double wallTime = meas.second->getDeltaWall();
732  const int64_t vmem = meas.second->getDeltaVmem();
733  const int64_t mall = meas.second->getDeltaMalloc();
734 
735  j["componentLevel"][step][component] = {{"count", count},
736  {"cpuTime", cpuTime},
737  {"wallTime", wallTime},
738  {"vmem", vmem},
739  {"malloc", mall}};
740  }
741 
742  }
743 
744 }

◆ report2JsonFile_EventLevel()

void PerfMonMTSvc::report2JsonFile_EventLevel ( nlohmann::json j) const

Definition at line 746 of file PerfMonMTSvc.cxx.

746  {
747 
748  for (const auto& it : m_eventLevelData.getEventLevelData()) {
749 
750  const uint64_t event = it.first;
751  const double cpuTime = it.second.cpu_time;
752  const double wallTime = it.second.wall_time;
753  const int64_t vmem = it.second.mem_stats.at("vmem");
754  const int64_t rss = it.second.mem_stats.at("rss");
755  const int64_t pss = it.second.mem_stats.at("pss");
756  const int64_t swap = it.second.mem_stats.at("swap");
757 
758  j["eventLevel"][std::to_string(event)] = {{"cpuTime", cpuTime},
759  {"wallTime", wallTime},
760  {"vmem", vmem},
761  {"rss", rss},
762  {"pss", pss},
763  {"swap", swap}};
764 
765 
766  }
767 }

◆ report2JsonFile_Summary()

void PerfMonMTSvc::report2JsonFile_Summary ( nlohmann::json j) const

Definition at line 651 of file PerfMonMTSvc.cxx.

651  {
652 
653  // Report snapshot level results
654  for(int i=0; i < NSNAPSHOTS; i++){
655 
656  const std::string step = m_snapshotStepNames[i];
657  const double dCPU = m_snapshotData[i].getDeltaCPU();
658  const double dWall = m_snapshotData[i].getDeltaWall();
659  const double cpuUtil = dCPU / dWall;
660  const int64_t dVmem = m_snapshotData[i].getMemMonDeltaMap("vmem");
661  const int64_t dRss = m_snapshotData[i].getMemMonDeltaMap("rss");
662  const int64_t dPss = m_snapshotData[i].getMemMonDeltaMap("pss");
663  const int64_t dSwap = m_snapshotData[i].getMemMonDeltaMap("swap");
664 
665  j["summary"]["snapshotLevel"][step] = {{"dCPU", dCPU},
666  {"dWall", dWall},
667  {"cpuUtil", cpuUtil},
668  {"dVmem", dVmem},
669  {"dRss", dRss},
670  {"dPss", dPss},
671  {"dSwap", dSwap}};
672 
673  }
674 
675  // Report the total number of events
676  const int64_t nEvents = m_eventCounter;
677  j["summary"]["nEvents"] = nEvents;
678 
679  // Report Peaks
680  const int64_t vmemPeak = m_eventLevelData.getEventLevelMemoryMax("vmem");
681  const int64_t rssPeak = m_eventLevelData.getEventLevelMemoryMax("rss");
682  const int64_t pssPeak = m_eventLevelData.getEventLevelMemoryMax("pss");
683  const int64_t swapPeak = m_eventLevelData.getEventLevelMemoryMax("swap");
684 
685  j["summary"]["peaks"] = {{"vmemPeak", vmemPeak},
686  {"rssPeak", rssPeak},
687  {"pssPeak", pssPeak},
688  {"swapPeak", swapPeak}};
689 
690  // Report leak estimates
691  const int64_t vmemLeak = m_fit_vmem.slope();
692  const int64_t pssLeak = m_fit_pss.slope();
693  const int64_t nPoints = m_fit_vmem.nPoints();
694 
695  j["summary"]["leakEstimates"] = {{"vmemLeak", vmemLeak},
696  {"pssLeak", pssLeak},
697  {"nPoints", nPoints}};
698 
699  // Report Sys info
700  const std::string cpuModel = get_cpu_model_info();
701  const int coreNum = get_cpu_core_info();
702  const int64_t totMem = get_memory_info();
703 
704  j["summary"]["sysInfo"] = {{"cpuModel", cpuModel},
705  {"coreNum", coreNum},
706  {"totMem", totMem}};
707 
708  // Report Enviroment info
709  const std::string mallocLib = std::filesystem::path(PMonMT::symb2lib("malloc")).filename().string();
710  const std::string mathLib = std::filesystem::path(PMonMT::symb2lib("atan2")).filename().string();
711 
712  j["summary"]["envInfo"] = {{"mallocLib", mallocLib},
713  {"mathLib", mathLib}};
714 
715  // Report CPU utilization efficiency;
716  const int cpuUtilEff = getCpuEfficiency();
717  j["summary"]["misc"] = {{"cpuUtilEff", cpuUtilEff}};
718 
719 }

◆ report2Log()

void PerfMonMTSvc::report2Log ( )

Report to log.

Definition at line 394 of file PerfMonMTSvc.cxx.

394  {
395  // Header
397 
398  // Component-level
401  }
402 
403  // Event-level
406  }
407 
408  // Summary and system information
412 }

◆ report2Log_ComponentLevel()

void PerfMonMTSvc::report2Log_ComponentLevel ( )

Definition at line 434 of file PerfMonMTSvc.cxx.

434  {
435 
436  ATH_MSG_INFO("=======================================================================================");
437  ATH_MSG_INFO(" Component Level Monitoring ");
438  ATH_MSG_INFO("=======================================================================================");
439 
440  ATH_MSG_INFO(std::format("{:<10} {:<15} {:<25} {:<40} {:<55} {:<75}","Step", "Count", "CPU Time [ms]",
441  "Vmem [kB]", "Malloc [kB]", "Component"));
442 
443  ATH_MSG_INFO("---------------------------------------------------------------------------------------");
444 
445  aggregateSlotData(); // aggregate data from slots
446  divideData2Steps(); // divive data into steps for ordered printing
447 
448  for (auto vec_itr : m_stdoutVec_serial) {
449  // Sort the results by CPU time for the time being
450  std::vector<std::pair<PMonMT::StepComp, PMonMT::ComponentData*>> pairs;
451  for (auto itr = vec_itr.begin(); itr != vec_itr.end(); ++itr) pairs.push_back(*itr);
452 
453  sort(pairs.begin(), pairs.end(),
454  [=](std::pair<PMonMT::StepComp, PMonMT::ComponentData*>& a,
455  std::pair<PMonMT::StepComp, PMonMT::ComponentData*>& b) {
456  return a.second->getDeltaCPU() > b.second->getDeltaCPU();
457  });
458 
459  int counter = 0;
460  for (auto it : pairs) {
461  // Only write out a certian number of components
462  if (counter >= m_printNComps) {
463  break;
464  }
465  counter++;
466 
467  ATH_MSG_INFO(std::format("{:<10} {:<15} {:<25.2f} {:<40.0f} {:<55.0f} {:<75}",it.first.stepName,
468  it.second->getCallCount(),it.second->getDeltaCPU(),it.second->getDeltaVmem(),
469  it.second->getDeltaMalloc(),it.first.compName));
470  }
471  if(counter>0) {
472  ATH_MSG_INFO("=======================================================================================");
473  }
474  }
475 }

◆ report2Log_CpuInfo()

void PerfMonMTSvc::report2Log_CpuInfo ( ) const

Definition at line 584 of file PerfMonMTSvc.cxx.

584  {
585 
586  ATH_MSG_INFO(" System Information ");
587  ATH_MSG_INFO("=======================================================================================");
588 
589  ATH_MSG_INFO(std::format("{:<34} {}", "CPU Model:", get_cpu_model_info()));
590  ATH_MSG_INFO(std::format("{:<35} {}", "Number of Available Cores:", get_cpu_core_info()));
591  ATH_MSG_INFO(std::format("{:<35} {}", "Total Memory:", scaleMem(get_memory_info())));
592  ATH_MSG_INFO("=======================================================================================");
593 }

◆ report2Log_Description()

void PerfMonMTSvc::report2Log_Description ( ) const

Definition at line 417 of file PerfMonMTSvc.cxx.

417  {
418  ATH_MSG_INFO("=======================================================================================");
419  ATH_MSG_INFO(" PerfMonMTSvc Report ");
420  ATH_MSG_INFO("=======================================================================================");
421  if (m_reportResultsToJSON) {
422  ATH_MSG_INFO("*** Full set of information can also be found in: " << m_jsonFileName.toString());
423  ATH_MSG_INFO("*** In order to make plots using the results run the following commands:");
424  ATH_MSG_INFO("*** $ perfmonmt-plotter.py -i " << m_jsonFileName.toString());
425  ATH_MSG_INFO("*** In order to print tables using the results run the following commands:");
426  ATH_MSG_INFO("*** $ perfmonmt-printer.py -i " << m_jsonFileName.toString());
427  ATH_MSG_INFO("=======================================================================================");
428  }
429 }

◆ report2Log_EnvInfo()

void PerfMonMTSvc::report2Log_EnvInfo ( ) const

Definition at line 598 of file PerfMonMTSvc.cxx.

598  {
599 
600  ATH_MSG_INFO(" Environment Information ");
601  ATH_MSG_INFO("=======================================================================================");
602 
603  ATH_MSG_INFO(std::format("{:<35} {}","Malloc Library:", std::filesystem::path(PMonMT::symb2lib("malloc")).filename().string()));
604  ATH_MSG_INFO(std::format("{:<35} {}","Math Library:", std::filesystem::path(PMonMT::symb2lib("atan2")).filename().string()));
605 
606  ATH_MSG_INFO("=======================================================================================");
607 
608 }

◆ report2Log_EventLevel()

void PerfMonMTSvc::report2Log_EventLevel ( )

Definition at line 497 of file PerfMonMTSvc.cxx.

497  {
498 
499  ATH_MSG_INFO(" Event Level Monitoring ");
500  ATH_MSG_INFO(" (Only the first " << m_eventLoopMsgLimit.toString() <<
501  " and the last measurements are explicitly printed)");
502  ATH_MSG_INFO("=======================================================================================");
503 
504  ATH_MSG_INFO(std::format("{:<16} {:<12} {:<12} {:<12} {:<12} {:<12} {:<12}","Event", "CPU [s]",
505  "Wall [s]", "Vmem [kB]", "Rss [kB]", "Pss [kB]", "Swap [kB]"));
506 
507  ATH_MSG_INFO("---------------------------------------------------------------------------------------");
508 
509  m_eventLoopMsgCounter = 0; // reset counter
510  uint64_t nMeasurements = m_eventLevelData.getNMeasurements();
511 
512  for (const auto& it : m_eventLevelData.getEventLevelData()) {
513  // Print
514  if(m_eventLoopMsgCounter < m_eventLoopMsgLimit || m_eventLoopMsgCounter == nMeasurements - 1) {
516  ATH_MSG_INFO(std::format("{:=<87}", "..."));
517  }
518  ATH_MSG_INFO(std::format("{:<16} {:>12.2f} {:>12.2f} {:>12} {:>12} {:>12} {:>12}", it.first,
519  it.second.cpu_time * 0.001,it.second.wall_time * 0.001,it.second.mem_stats.at("vmem"),
520  it.second.mem_stats.at("rss"),it.second.mem_stats.at("pss"),it.second.mem_stats.at("swap")));
521  }
523  // Add to leak estimate
524  if (it.first >= m_memFitLowerLimit) {
525  m_fit_vmem.addPoint(it.first, it.second.mem_stats.at("vmem"));
526  m_fit_pss.addPoint(it.first, it.second.mem_stats.at("pss"));
527  }
528  }
529  ATH_MSG_INFO("=======================================================================================");
530 }

◆ report2Log_EventLevel_instant()

void PerfMonMTSvc::report2Log_EventLevel_instant ( ) const

Definition at line 480 of file PerfMonMTSvc.cxx.

480  {
483 
484  int64_t vmem = m_eventLevelData.getEventLevelMemory(m_eventCounter, "vmem");
488 
489  ATH_MSG_INFO("Event [" << std::setw(5) << m_eventCounter << "] CPU Time: " << scaleTime(cpu_time) <<
490  ", Wall Time: " << scaleTime(wall_time) << ", Vmem: " << scaleMem(vmem) <<
491  ", Rss: " << scaleMem(rss) << ", Pss: " << scaleMem(pss) << ", Swap: " << scaleMem(swap));
492 }

◆ report2Log_Summary()

void PerfMonMTSvc::report2Log_Summary ( )

Definition at line 535 of file PerfMonMTSvc.cxx.

535  {
536 
537  ATH_MSG_INFO(" Snapshots Summary ");
538  ATH_MSG_INFO("=======================================================================================");
539 
540  ATH_MSG_INFO(std::format("{:<13} {:<12} {:<12} {:<7} {:<11} {:<11} {:<11} {:<11}","Step",
541  "dCPU [s]","dWall [s]","<CPU>","dVmem [kB]","dRss [kB]","dPss [kB]","dSwap [kB]"));
542 
543  ATH_MSG_INFO("---------------------------------------------------------------------------------------");
544 
545  for (unsigned int idx = 0; idx < NSNAPSHOTS; idx++) {
546  ATH_MSG_INFO(std::format("{:<13} {:<12.2f} {:<12.2f} {:<7.2f} {:<11} {:<11} {:<11} {:<11}",
547  m_snapshotStepNames[idx], m_snapshotData[idx].getDeltaCPU() * 0.001,
548  m_snapshotData[idx].getDeltaWall() * 0.001,
549  m_snapshotData[idx].getDeltaCPU() / m_snapshotData[idx].getDeltaWall(),
550  m_snapshotData[idx].getMemMonDeltaMap("vmem"),m_snapshotData[idx].getMemMonDeltaMap("rss"),
551  m_snapshotData[idx].getMemMonDeltaMap("pss"),m_snapshotData[idx].getMemMonDeltaMap("swap")));
552  }
553 
554  ATH_MSG_INFO("***************************************************************************************");
555  const double cpu_exec_total = m_snapshotData[FIRSTEVENT].getDeltaCPU() + m_snapshotData[EXECUTE].getDeltaCPU();
556  const double wall_exec_total = m_snapshotData[FIRSTEVENT].getDeltaWall() + m_snapshotData[EXECUTE].getDeltaWall();
557 
558  ATH_MSG_INFO(std::format("{:<35} {}", "Number of events processed:", static_cast<int>(m_eventCounter)));
559  ATH_MSG_INFO(std::format("{:<35} {:.0f}", "CPU usage per event [ms]:",
560  (m_eventCounter > 0 ? cpu_exec_total / m_eventCounter : 0)));
561  ATH_MSG_INFO(std::format("{:<35} {:.3f}", "Events per second:",
562  (wall_exec_total > 0 ? m_eventCounter / wall_exec_total * 1000. : 0)));
563  ATH_MSG_INFO(std::format("{:<35} {}", "CPU utilization efficiency [%]:", getCpuEfficiency()));
565  ATH_MSG_INFO("***************************************************************************************");
566  ATH_MSG_INFO(std::format("{:<35} {}", "Max Vmem:", scaleMem(m_eventLevelData.getEventLevelMemoryMax("vmem"))));
567  ATH_MSG_INFO(std::format("{:<35} {}", "Max Rss:", scaleMem(m_eventLevelData.getEventLevelMemoryMax("rss"))));
568  ATH_MSG_INFO(std::format("{:<35} {}", "Max Pss:", scaleMem(m_eventLevelData.getEventLevelMemoryMax("pss"))));
569  ATH_MSG_INFO(std::format("{:<35} {}", "Max Swap:", scaleMem(m_eventLevelData.getEventLevelMemoryMax("swap"))));
570  ATH_MSG_INFO("***************************************************************************************");
571  ATH_MSG_INFO(std::format("{:<35} {}", "Leak estimate per event Vmem:", scaleMem(m_fit_vmem.slope())));
572  ATH_MSG_INFO(std::format("{:<35} {}", "Leak estimate per event Pss:", scaleMem(m_fit_pss.slope())));
573  ATH_MSG_INFO(" >> Estimated using the last " << m_fit_vmem.nPoints()
574  << " measurements from the Event Level Monitoring");
575  ATH_MSG_INFO(" >> Events prior to the first " << m_memFitLowerLimit.toString() << " are omitted...");
576  }
577 
578  ATH_MSG_INFO("=======================================================================================");
579 }

◆ scaleMem()

std::string PerfMonMTSvc::scaleMem ( int64_t  memMeas) const

Definition at line 862 of file PerfMonMTSvc.cxx.

862  {
863 
864  // Check if there is anything to be done
865  if (memMeas == 0) {
866  return "0.00 KB" ;
867  }
868 
869  // Prepare for the result
870  std::ostringstream ss;
871  ss << std::fixed;
872  ss << std::setprecision(2);
873 
874  // The input is in KB
875  std::vector<std::string> significance = {"KB", "MB", "GB", "TB"};
876 
877  // Get the absolute value
878  int64_t absMemMeas = std::abs(memMeas);
879  // Find the order, note that this is an int operation
880  int64_t order = std::log(absMemMeas)/std::log(1024);
881  // Compute the final value preserving the sign
882  double value = memMeas/std::pow(1024, order);
883  // Convert the result to a string
884  ss << value;
885 
886  return ss.str() + " " + significance[order];
887 }

◆ scaleTime()

std::string PerfMonMTSvc::scaleTime ( double  timeMeas) const

Definition at line 838 of file PerfMonMTSvc.cxx.

838  {
839  // Not a huge fan of this, we should eventually unify the types
840  // Just want to be explicit about what's happening
841  auto ms = static_cast<int64_t>(timeMeas);
842 
843  // Compute hrs and offset
844  auto hrs = ms / 3600000;
845  ms -= hrs * 3600000;
846  // Compute mins and offset
847  auto mins = ms / 60000;
848  ms -= mins * 60000;
849  // Compute secs and offset
850  auto secs = ms / 1000;
851  ms -= secs * 1000;
852 
853  // Primarily care about H:M:S
854  std::stringstream ss;
855  ss.fill('0');
856  ss << std::setw(2) << hrs << "h" <<
857  std::setw(2) << mins << "m" <<
858  std::setw(2) << secs << "s";
859  return ss.str();
860 }

◆ startAud()

void PerfMonMTSvc::startAud ( const std::string &  stepName,
const std::string &  compName 
)
overridevirtual

Start Auditing.

Definition at line 180 of file PerfMonMTSvc.cxx.

180  {
181  // Snapshots, i.e. Initialize, Event Loop, etc.
182  startSnapshotAud(stepName, compName);
183 
184  /*
185  * Perform component monitoring only if the user asked for it.
186  * By default we don't monitor a set of common components.
187  * Once we adopt C++20, we can switch this from count to contains.
188  */
190  // Start component auditing
191  auto const &ctx = Gaudi::Hive::currentContext();
192  startCompAud(stepName, compName, ctx);
193  }
194 }

◆ startCompAud()

void PerfMonMTSvc::startCompAud ( const std::string &  stepName,
const std::string &  compName,
const EventContext &  ctx 
)

Component Level Auditing: Take measurements at the beginning and at the end of each component call.

Definition at line 251 of file PerfMonMTSvc.cxx.

251  {
252  // Get the thread index
253  const unsigned int ithread = (ctx.valid() && tbb::this_task_arena::current_thread_index() > -1) ? tbb::this_task_arena::current_thread_index() : 0;
254 
255  // Memory measurement is only done outside the loop except when there is only a single thread
256  const bool doMem = !ctx.valid() || (m_numberOfThreads == 1);
257 
258  // Generate State
259  PMonMT::StepComp currentState = generate_state(stepName, compName);
260 
261  // Check if this is the first time calling if so create the mesurement data if not use the existing one.
262  // Metrics are collected per thread then aggregated before reporting
263  data_map_unique_t& compLevelDataMap = m_compLevelDataMapVec[ithread];
264  if(compLevelDataMap.find(currentState) == compLevelDataMap.end()) {
265  compLevelDataMap.insert({currentState, std::make_unique<PMonMT::ComponentData>()});
266  }
267 
268  // Capture and store
270  meas.capture(); // No memory in the event-loop
271  if (doMem) {
272  // we made sure this is only run outside event loop or single-threaded
273  [[maybe_unused]] bool dummy ATLAS_THREAD_SAFE = meas.capture_memory();
274  }
275 
276  compLevelDataMap[currentState]->addPointStart(meas, doMem);
277 
278  // Debug
279  ATH_MSG_DEBUG("Start Audit - Component " << compName << " , "
280  "Step " << stepName << " , "
281  "Event " << ctx.evt() << " , "
282  "Slot " << ctx.slot() << " , "
283  "Context " << ctx.valid() << " , "
284  "Thread " << ithread << " , "
285  "Cpu " << meas.cpu_time << " ms, "
286  "Wall " << meas.wall_time << " ms, "
287  "Vmem " << meas.vmem << " kb, "
288  "Malloc " << meas.malloc << " kb");
289 }

◆ startSnapshotAud()

void PerfMonMTSvc::startSnapshotAud ( const std::string &  stepName,
const std::string &  compName 
)

Snapshot Auditing: Take snapshots at the beginning and at the end of each step.

Definition at line 214 of file PerfMonMTSvc.cxx.

214  {
215  // Last thing to be called before the event loop begins
216  if (compName == "AthOutSeq" && stepName == "Start") {
219  m_isFirstEvent = true;
220  }
221 
222  // Last thing to be called before finalize step begins
223  if (compName == "AthMasterSeq" && stepName == "Finalize") {
226  }
227 }

◆ stopAud()

void PerfMonMTSvc::stopAud ( const std::string &  stepName,
const std::string &  compName 
)
overridevirtual

Stop Auditing.

Definition at line 199 of file PerfMonMTSvc.cxx.

199  {
200  // Snapshots, i.e. Initialize, Event Loop, etc.
201  stopSnapshotAud(stepName, compName);
202 
203  // Check if we should monitor this component
205  // Stop component auditing
206  auto const &ctx = Gaudi::Hive::currentContext();
207  stopCompAud(stepName, compName, ctx);
208  }
209 }

◆ stopCompAud()

void PerfMonMTSvc::stopCompAud ( const std::string &  stepName,
const std::string &  compName,
const EventContext &  ctx 
)

Definition at line 294 of file PerfMonMTSvc.cxx.

294  {
295  // Get the thread index
296  const unsigned int ithread = (ctx.valid() && tbb::this_task_arena::current_thread_index() > -1) ? tbb::this_task_arena::current_thread_index() : 0;
297 
298  // Memory measurement is only done outside the loop except when there is only a single thread
299  const bool doMem = !ctx.valid() || (m_numberOfThreads == 1);
300 
301  // Capture
303  meas.capture(); // No memory in the event-loop
304  if (doMem) {
305  // we made sure this is only run outside event loop or single-threaded
306  [[maybe_unused]] bool dummy ATLAS_THREAD_SAFE = meas.capture_memory();
307  }
308 
309  // Generate State
310  PMonMT::StepComp currentState = generate_state(stepName, compName);
311 
312  // Store
313  data_map_unique_t& compLevelDataMap = m_compLevelDataMapVec[ithread];
314  compLevelDataMap[currentState]->addPointStop(meas, doMem);
315 
316  // Once the first time IncidentProcAlg3 is excuted, toggle m_isFirstEvent to false.
317  // Doing it this way, instead of at EndAlgorithms incident, makes sure there is no
318  // mismatch in start-stop calls to IncidentProcAlg3.
319  // It's a little ad-hoc but I don't think this workflow will change much anytime soon.
320  if ( m_isFirstEvent && compName == "IncidentProcAlg3" && stepName == "Execute") {
321  m_isFirstEvent = false;
322  }
323 
324  // Debug
325  ATH_MSG_DEBUG("Stop Audit - Component " << compName << " , "
326  "Step " << stepName << " , "
327  "Event " << ctx.evt() << " , "
328  "Slot " << ctx.slot() << " , "
329  "Context " << ctx.valid() << " , "
330  "Thread " << ithread << " , "
331  "Cpu (" << compLevelDataMap[currentState]->m_tmp_cpu << ":"
332  << meas.cpu_time << ":"
333  << (meas.cpu_time - compLevelDataMap[currentState]->m_tmp_cpu) << ":"
334  << compLevelDataMap[currentState]->m_delta_cpu << ") ms, "
335  "Wall (" << compLevelDataMap[currentState]->m_tmp_wall << ":"
336  << meas.wall_time << ":"
337  << (meas.wall_time - compLevelDataMap[currentState]->m_tmp_wall) << ":"
338  << compLevelDataMap[currentState]->m_delta_wall << ") ms, "
339  "Vmem (" << compLevelDataMap[currentState]->m_tmp_vmem << ":"
340  << meas.vmem << ":"
341  << (meas.vmem - compLevelDataMap[currentState]->m_tmp_vmem) << ":"
342  << compLevelDataMap[currentState]->m_delta_vmem << ") kb, "
343  "Malloc (" << compLevelDataMap[currentState]->m_tmp_malloc << ":"
344  << meas.malloc << ":"
345  << (meas.malloc - compLevelDataMap[currentState]->m_tmp_malloc) << ":"
346  << compLevelDataMap[currentState]->m_delta_malloc << ") kb");
347 }

◆ stopSnapshotAud()

void PerfMonMTSvc::stopSnapshotAud ( const std::string &  stepName,
const std::string &  compName 
)

Definition at line 232 of file PerfMonMTSvc.cxx.

232  {
233  // First thing to be called after the initialize step ends
234  if (compName == "AthMasterSeq" && stepName == "Initialize") {
237  }
238 
239  // First thing to be called after the event loop ends
240  // Some loop managers fire a dedicated incident to signal the end of the event loop
241  // That preceeds the AthMasterSeq Stop and if it is already handled we don't do anything here
242  if (compName == "AthMasterSeq" && stepName == "Stop" && m_eventCounter > 0 && !m_isEvtLoopStopped) {
245  }
246 }

Member Data Documentation

◆ m_checkPointThreshold

Gaudi::Property<uint64_t> PerfMonMTSvc::m_checkPointThreshold
private
Initial value:
{
this, "checkPointThreshold", 30,
"Least amount of time (in seconds) between event-level checks."}

Frequency of event level monitoring.

Definition at line 137 of file PerfMonMTSvc.h.

◆ m_checkPointTime

std::atomic<double> PerfMonMTSvc::m_checkPointTime
private

Definition at line 181 of file PerfMonMTSvc.h.

◆ m_compLevelDataMap

data_map_t PerfMonMTSvc::m_compLevelDataMap
private

Definition at line 194 of file PerfMonMTSvc.h.

◆ m_compLevelDataMap_1stevt

data_map_t PerfMonMTSvc::m_compLevelDataMap_1stevt
private

Definition at line 200 of file PerfMonMTSvc.h.

◆ m_compLevelDataMap_cbk

data_map_t PerfMonMTSvc::m_compLevelDataMap_cbk
private

Definition at line 204 of file PerfMonMTSvc.h.

◆ m_compLevelDataMap_evt

data_map_t PerfMonMTSvc::m_compLevelDataMap_evt
private

Definition at line 201 of file PerfMonMTSvc.h.

◆ m_compLevelDataMap_fin

data_map_t PerfMonMTSvc::m_compLevelDataMap_fin
private

Definition at line 202 of file PerfMonMTSvc.h.

◆ m_compLevelDataMap_ini

data_map_t PerfMonMTSvc::m_compLevelDataMap_ini
private

Definition at line 199 of file PerfMonMTSvc.h.

◆ m_compLevelDataMap_plp

data_map_t PerfMonMTSvc::m_compLevelDataMap_plp
private

Definition at line 203 of file PerfMonMTSvc.h.

◆ m_compLevelDataMapVec

std::vector<data_map_unique_t> PerfMonMTSvc::m_compLevelDataMapVec
private

Definition at line 198 of file PerfMonMTSvc.h.

◆ m_doComponentLevelMonitoring

Gaudi::Property<bool> PerfMonMTSvc::m_doComponentLevelMonitoring
private
Initial value:
{
this, "doComponentLevelMonitoring", false,
"True if component level monitoring is enabled, false o/w. Component monitoring may cause a decrease in the "
"performance due to the usage of locks."}

Do component level monitoring.

Definition at line 120 of file PerfMonMTSvc.h.

◆ m_doEventLoopMonitoring

Gaudi::Property<bool> PerfMonMTSvc::m_doEventLoopMonitoring
private
Initial value:
{
this, "doEventLoopMonitoring", true,
"True if event loop monitoring is enabled, false o/w. Event loop monitoring may cause a decrease in the "
"performance due to the usage of locks."}

Do event loop monitoring.

Definition at line 115 of file PerfMonMTSvc.h.

◆ m_eventCounter

std::atomic<uint64_t> PerfMonMTSvc::m_eventCounter
private

Definition at line 175 of file PerfMonMTSvc.h.

◆ m_eventLevelData

PMonMT::EventLevelData PerfMonMTSvc::m_eventLevelData {}
private

Definition at line 166 of file PerfMonMTSvc.h.

◆ m_eventLoopMsgCounter

std::atomic<uint64_t> PerfMonMTSvc::m_eventLoopMsgCounter
private

Definition at line 178 of file PerfMonMTSvc.h.

◆ m_eventLoopMsgLimit

Gaudi::Property<uint64_t> PerfMonMTSvc::m_eventLoopMsgLimit {this, "eventLoopMsgLimit", 10, "Maximum number of event-level messages."}
private

Set the number of messages for the event-level report.

Definition at line 150 of file PerfMonMTSvc.h.

◆ m_exclusionSet

const std::set<std::string> PerfMonMTSvc::m_exclusionSet
private
Initial value:
= {"AthMasterSeq", "AthAlgEvtSeq", "AthAllAlgSeq", "AthAlgSeq", "AthOutSeq",
"AthCondSeq", "AthBeginSeq", "AthEndSeq", "AthenaEventLoopMgr", "AthenaHiveEventLoopMgr", "AthMpEvtLoopMgr", "AthenaMtesEventLoopMgr",
"PerfMonMTSvc"}

Exclude some common components from monitoring In the future this might be converted to a inclusion set which would allow user to monitor only a set of algorithms...

Definition at line 155 of file PerfMonMTSvc.h.

◆ m_fit_pss

PerfMon::LinFitSglPass PerfMonMTSvc::m_fit_pss
private

Definition at line 210 of file PerfMonMTSvc.h.

◆ m_fit_vmem

PerfMon::LinFitSglPass PerfMonMTSvc::m_fit_vmem
private

Definition at line 209 of file PerfMonMTSvc.h.

◆ m_isEvtLoopStopped

std::atomic<bool> PerfMonMTSvc::m_isEvtLoopStopped
private

Definition at line 184 of file PerfMonMTSvc.h.

◆ m_isFirstEvent

std::atomic<bool> PerfMonMTSvc::m_isFirstEvent
private

Definition at line 172 of file PerfMonMTSvc.h.

◆ m_jsonFileName

Gaudi::Property<std::string> PerfMonMTSvc::m_jsonFileName
private
Initial value:
{this, "jsonFileName", "PerfMonMTSvc_result.json",
"Name of the JSON file that contains the results."}

Name of the JSON file.

Definition at line 127 of file PerfMonMTSvc.h.

◆ m_measurementEvents

PMonMT::SnapshotMeasurement PerfMonMTSvc::m_measurementEvents
private

Measurement to capture events.

Definition at line 112 of file PerfMonMTSvc.h.

◆ m_measurementSnapshots

PMonMT::SnapshotMeasurement PerfMonMTSvc::m_measurementSnapshots
private

Measurement to capture snapshots.

Definition at line 109 of file PerfMonMTSvc.h.

◆ m_memFitLowerLimit

Gaudi::Property<uint64_t> PerfMonMTSvc::m_memFitLowerLimit
private
Initial value:
{
this, "memFitLowerLimit", 25,
"Lower limit (in number of events) for the memory fit."}

Lower limit (in number of events) for the memory fit.

Definition at line 133 of file PerfMonMTSvc.h.

◆ m_motherPID

int PerfMonMTSvc::m_motherPID
private

Snapshots data.

Definition at line 160 of file PerfMonMTSvc.h.

◆ m_mutex_capture

std::mutex PerfMonMTSvc::m_mutex_capture
private

Definition at line 169 of file PerfMonMTSvc.h.

◆ m_numberOfSlots

Gaudi::Property<int> PerfMonMTSvc::m_numberOfSlots {this, "numberOfSlots", 1, "Number of slots in the job."}
private

Get the number of slots.

Definition at line 148 of file PerfMonMTSvc.h.

◆ m_numberOfThreads

Gaudi::Property<int> PerfMonMTSvc::m_numberOfThreads {this, "numberOfThreads", 1, "Number of threads in the job."}
private

Get the number of threads.

Definition at line 146 of file PerfMonMTSvc.h.

◆ m_printDetailedTables

Gaudi::Property<bool> PerfMonMTSvc::m_printDetailedTables
private
Initial value:
{this, "printDetailedTables", true,
"Print detailed component-level metrics."}

Print detailed tables.

Definition at line 130 of file PerfMonMTSvc.h.

◆ m_printNComps

Gaudi::Property<int> PerfMonMTSvc::m_printNComps
private
Initial value:
{
this, "printNComps", 50, "Maximum number of components to be printed."}

Print the top N components.

Definition at line 143 of file PerfMonMTSvc.h.

◆ m_reportResultsToJSON

Gaudi::Property<bool> PerfMonMTSvc::m_reportResultsToJSON {this, "reportResultsToJSON", true, "Report results into the json file."}
private

Report results to JSON.

Definition at line 125 of file PerfMonMTSvc.h.

◆ m_snapshotData

std::vector<PMonMT::SnapshotData> PerfMonMTSvc::m_snapshotData
private

Definition at line 161 of file PerfMonMTSvc.h.

◆ m_snapshotStepNames

std::vector<std::string> PerfMonMTSvc::m_snapshotStepNames = {"Configure", "Initialize", "FirstEvent", "Execute", "Finalize"}
private

Definition at line 162 of file PerfMonMTSvc.h.

◆ m_stdoutVec_serial

std::vector<data_map_t> PerfMonMTSvc::m_stdoutVec_serial
private

Definition at line 206 of file PerfMonMTSvc.h.

◆ m_wallTimeOffset

Gaudi::Property<double> PerfMonMTSvc::m_wallTimeOffset {this, "wallTimeOffset", 0, "Job start wall time in miliseconds."}
private

Offset for the wall-time, comes from configuration.

Definition at line 141 of file PerfMonMTSvc.h.


The documentation for this class was generated from the following files:
nEvents
const int nEvents
Definition: fbtTestBasics.cxx:78
PerfMonMTSvc::m_doComponentLevelMonitoring
Gaudi::Property< bool > m_doComponentLevelMonitoring
Do component level monitoring.
Definition: PerfMonMTSvc.h:120
PerfMonMTSvc::report2JsonFile
void report2JsonFile()
Report to the JSON File.
Definition: PerfMonMTSvc.cxx:613
PerfMonMTSvc::m_isFirstEvent
std::atomic< bool > m_isFirstEvent
Definition: PerfMonMTSvc.h:172
PerfMonMTSvc::stopSnapshotAud
void stopSnapshotAud(const std::string &stepName, const std::string &compName)
Definition: PerfMonMTSvc.cxx:232
PerfMonMTSvc::report2Log_Description
void report2Log_Description() const
Definition: PerfMonMTSvc.cxx:417
PerfMonMTSvc::divideData2Steps
void divideData2Steps()
Definition: PerfMonMTSvc.cxx:815
PerfMonMTSvc::getCpuEfficiency
int getCpuEfficiency() const
Definition: PerfMonMTSvc.cxx:352
AddEmptyComponent.compName
compName
Definition: AddEmptyComponent.py:32
PerfMonMTSvc::report2Log_ComponentLevel
void report2Log_ComponentLevel()
Definition: PerfMonMTSvc.cxx:434
get_generator_info.result
result
Definition: get_generator_info.py:21
PMonMT::symb2lib
const char * symb2lib(const char *symbol, const char *failstr)
Definition: PerfMonMTUtils.h:435
PerfMonMTSvc::m_compLevelDataMap_fin
data_map_t m_compLevelDataMap_fin
Definition: PerfMonMTSvc.h:202
athena.path
path
python interpreter configuration --------------------------------------—
Definition: athena.py:128
PerfMonMTSvc::m_numberOfThreads
Gaudi::Property< int > m_numberOfThreads
Get the number of threads.
Definition: PerfMonMTSvc.h:146
PowhegControl_ttHplus_NLO.ss
ss
Definition: PowhegControl_ttHplus_NLO.py:83
PerfMonMTSvc::m_measurementSnapshots
PMonMT::SnapshotMeasurement m_measurementSnapshots
Measurement to capture snapshots.
Definition: PerfMonMTSvc.h:109
PerfMonMTSvc::m_compLevelDataMap
data_map_t m_compLevelDataMap
Definition: PerfMonMTSvc.h:194
vtune_athena.format
format
Definition: vtune_athena.py:14
ATH_MSG_INFO
#define ATH_MSG_INFO(x)
Definition: AthMsgStreamMacros.h:31
PerfMon::LinFitSglPass::addPoint
void addPoint(const double &, const double &)
Definition: LinFitSglPass.h:56
PerfMonMTSvc::m_motherPID
int m_motherPID
Snapshots data.
Definition: PerfMonMTSvc.h:160
PerfMonMTSvc::report2Log_CpuInfo
void report2Log_CpuInfo() const
Definition: PerfMonMTSvc.cxx:584
PerfMonMTSvc::report2Log_EnvInfo
void report2Log_EnvInfo() const
Definition: PerfMonMTSvc.cxx:598
json
nlohmann::json json
Definition: HistogramDef.cxx:9
PerfMonMTSvc::m_eventCounter
std::atomic< uint64_t > m_eventCounter
Definition: PerfMonMTSvc.h:175
PerfMonMTSvc::get_cpu_core_info
int get_cpu_core_info() const
Definition: PerfMonMTSvc.cxx:925
rerun_display.cmd
string cmd
Definition: rerun_display.py:67
PerfMonMTSvc::m_checkPointTime
std::atomic< double > m_checkPointTime
Definition: PerfMonMTSvc.h:181
PerfMonMTSvc::m_wallTimeOffset
Gaudi::Property< double > m_wallTimeOffset
Offset for the wall-time, comes from configuration.
Definition: PerfMonMTSvc.h:141
PerfMonMTSvc::report2JsonFile_ComponentLevel
void report2JsonFile_ComponentLevel(nlohmann::json &j) const
Definition: PerfMonMTSvc.cxx:721
skel.it
it
Definition: skel.GENtoEVGEN.py:407
PerfMonMTSvc::m_fit_vmem
PerfMon::LinFitSglPass m_fit_vmem
Definition: PerfMonMTSvc.h:209
PMonMT::StepComp
Definition: PerfMonMTUtils.h:63
PerfMonMTSvc::get_cpu_model_info
std::string get_cpu_model_info() const
Definition: PerfMonMTSvc.cxx:920
PerfMonMTSvc::stopCompAud
void stopCompAud(const std::string &stepName, const std::string &compName, const EventContext &ctx)
Definition: PerfMonMTSvc.cxx:294
PerfMonMTSvc::CONFIGURE
@ CONFIGURE
Definition: PerfMonMTSvc.h:163
PerfMonMTSvc::m_measurementEvents
PMonMT::SnapshotMeasurement m_measurementEvents
Measurement to capture events.
Definition: PerfMonMTSvc.h:112
athena.value
value
Definition: athena.py:124
python.RatesEmulationExample.lock
lock
Definition: RatesEmulationExample.py:148
PerfMonMTSvc::m_eventLoopMsgLimit
Gaudi::Property< uint64_t > m_eventLoopMsgLimit
Set the number of messages for the event-level report.
Definition: PerfMonMTSvc.h:150
PerfMonMTSvc::FINALIZE
@ FINALIZE
Definition: PerfMonMTSvc.h:163
PerfMonMTSvc::report2Log_EventLevel
void report2Log_EventLevel()
Definition: PerfMonMTSvc.cxx:497
PerfMonMTSvc::startSnapshotAud
void startSnapshotAud(const std::string &stepName, const std::string &compName)
Snapshot Auditing: Take snapshots at the beginning and at the end of each step.
Definition: PerfMonMTSvc.cxx:214
PerfMonMTSvc::m_isEvtLoopStopped
std::atomic< bool > m_isEvtLoopStopped
Definition: PerfMonMTSvc.h:184
PerfMonMTSvc::report2Log_Summary
void report2Log_Summary()
Definition: PerfMonMTSvc.cxx:535
dq_defect_bulk_create_defects.line
line
Definition: dq_defect_bulk_create_defects.py:27
std::sort
void sort(typename DataModel_detail::iterator< DVL > beg, typename DataModel_detail::iterator< DVL > end)
Specialization of sort for DataVector/List.
Definition: DVL_algorithms.h:554
PerfMonMTSvc::m_compLevelDataMap_ini
data_map_t m_compLevelDataMap_ini
Definition: PerfMonMTSvc.h:199
XMLtoHeader.count
count
Definition: XMLtoHeader.py:84
PMonMT::ComponentMeasurement::cpu_time
double cpu_time
Definition: PerfMonMTUtils.h:77
PerfMonMTSvc::m_compLevelDataMap_evt
data_map_t m_compLevelDataMap_evt
Definition: PerfMonMTSvc.h:201
python.CreateTierZeroArgdict.pairs
pairs
Definition: CreateTierZeroArgdict.py:201
PerfMonMTSvc::m_snapshotData
std::vector< PMonMT::SnapshotData > m_snapshotData
Definition: PerfMonMTSvc.h:161
PMonMT::ComponentMeasurement
Definition: PerfMonMTUtils.h:74
PerfMonMTSvc::INITIALIZE
@ INITIALIZE
Definition: PerfMonMTSvc.h:163
PerfMonMTSvc::report2Log_EventLevel_instant
void report2Log_EventLevel_instant() const
Definition: PerfMonMTSvc.cxx:480
PixelModuleFeMask_create_db.remove
string remove
Definition: PixelModuleFeMask_create_db.py:83
PMonMT::get_wall_time
double get_wall_time()
Definition: PerfMonMTUtils.h:330
PMonMT::EventLevelData::getEventLevelData
const EventMeasMap_t & getEventLevelData() const
Definition: PerfMonMTUtils.h:220
PerfMonMTSvc::m_fit_pss
PerfMon::LinFitSglPass m_fit_pss
Definition: PerfMonMTSvc.h:210
PerfMonMTSvc::report2Log
void report2Log()
Report to log.
Definition: PerfMonMTSvc.cxx:394
event
POOL::TEvent event(POOL::TEvent::kClassAccess)
PMonMT::ComponentMeasurement::wall_time
double wall_time
Definition: PerfMonMTUtils.h:77
lumiFormat.i
int i
Definition: lumiFormat.py:85
PMonMT::EventLevelData::getEventLevelMemoryMax
int64_t getEventLevelMemoryMax(const std::string &stat) const
Definition: PerfMonMTUtils.h:241
PerfMonMTSvc::m_reportResultsToJSON
Gaudi::Property< bool > m_reportResultsToJSON
Report results to JSON.
Definition: PerfMonMTSvc.h:125
ATH_MSG_DEBUG
#define ATH_MSG_DEBUG(x)
Definition: AthMsgStreamMacros.h:29
mc.order
order
Configure Herwig7.
Definition: mc.Herwig7_Dijet.py:12
PerfMonMTSvc::m_jsonFileName
Gaudi::Property< std::string > m_jsonFileName
Name of the JSON file.
Definition: PerfMonMTSvc.h:127
taskman.fieldName
fieldName
Definition: taskman.py:491
CalibDbCompareRT.dummy
dummy
Definition: CalibDbCompareRT.py:59
PerfMonMTSvc::report2JsonFile_EventLevel
void report2JsonFile_EventLevel(nlohmann::json &j) const
Definition: PerfMonMTSvc.cxx:746
file
TFile * file
Definition: tile_monitor.h:29
PerfMonMTSvc::m_eventLoopMsgCounter
std::atomic< uint64_t > m_eventLoopMsgCounter
Definition: PerfMonMTSvc.h:178
xAOD::uint64_t
uint64_t
Definition: EventInfo_v1.cxx:123
ATH_CHECK
#define ATH_CHECK
Definition: AthCheckMacros.h:40
PMonMT::SnapshotMeasurement::capture
void capture()
Definition: PerfMonMTUtils.h:177
PerfMonMTSvc::scaleMem
std::string scaleMem(int64_t memMeas) const
Definition: PerfMonMTSvc.cxx:862
PerfMonMTSvc::scaleTime
std::string scaleTime(double timeMeas) const
Definition: PerfMonMTSvc.cxx:838
PerfMonMTSvc::get_memory_info
uint64_t get_memory_info() const
Definition: PerfMonMTSvc.cxx:931
WriteCalibToCool.swap
swap
Definition: WriteCalibToCool.py:94
PerfMonMTSvc::NSNAPSHOTS
@ NSNAPSHOTS
Definition: PerfMonMTSvc.h:163
PerfMon::LinFitSglPass::nPoints
unsigned nPoints() const
Definition: LinFitSglPass.h:31
PerfMonMTSvc::report2JsonFile_Summary
void report2JsonFile_Summary(nlohmann::json &j) const
Definition: PerfMonMTSvc.cxx:651
PMonMT::StepComp::stepName
std::string stepName
Definition: PerfMonMTUtils.h:64
PerfMonMTSvc::m_snapshotStepNames
std::vector< std::string > m_snapshotStepNames
Definition: PerfMonMTSvc.h:162
PerfMonMTSvc::m_exclusionSet
const std::set< std::string > m_exclusionSet
Exclude some common components from monitoring In the future this might be converted to a inclusion s...
Definition: PerfMonMTSvc.h:155
name
std::string name
Definition: Control/AthContainers/Root/debug.cxx:240
ActsTrk::to_string
std::string to_string(const DetectorType &type)
Definition: GeometryDefs.h:34
plotBeamSpotMon.b
b
Definition: plotBeamSpotMon.py:76
PMonMT::EventLevelData::getEventLevelMemory
int64_t getEventLevelMemory(const uint64_t event_count, const std::string &stat) const
Definition: PerfMonMTUtils.h:236
PerfMonMTSvc::m_mutex_capture
std::mutex m_mutex_capture
Definition: PerfMonMTSvc.h:169
PMonMT::ComponentMeasurement::vmem
double vmem
Definition: PerfMonMTUtils.h:78
PerfMonMTSvc::m_numberOfSlots
Gaudi::Property< int > m_numberOfSlots
Get the number of slots.
Definition: PerfMonMTSvc.h:148
PerfMonMTSvc::m_printNComps
Gaudi::Property< int > m_printNComps
Print the top N components.
Definition: PerfMonMTSvc.h:143
PerfMonMTSvc::m_printDetailedTables
Gaudi::Property< bool > m_printDetailedTables
Print detailed tables.
Definition: PerfMonMTSvc.h:130
PerfMonMTSvc::aggregateSlotData
void aggregateSlotData()
A few helper functions.
Definition: PerfMonMTSvc.cxx:782
PMonMT::EventLevelData::set_wall_time_offset
void set_wall_time_offset(const double wall_time_offset)
Definition: PerfMonMTUtils.h:218
PerfMonMTSvc::m_stdoutVec_serial
std::vector< data_map_t > m_stdoutVec_serial
Definition: PerfMonMTSvc.h:206
PerfMonMTSvc::m_eventLevelData
PMonMT::EventLevelData m_eventLevelData
Definition: PerfMonMTSvc.h:166
a
TList * a
Definition: liststreamerinfos.cxx:10
PMonMT::EventLevelData::getNMeasurements
uint64_t getNMeasurements() const
Definition: PerfMonMTUtils.h:224
PerfMonMTSvc::m_doEventLoopMonitoring
Gaudi::Property< bool > m_doEventLoopMonitoring
Do event loop monitoring.
Definition: PerfMonMTSvc.h:115
PerfMonMTSvc::EXECUTE
@ EXECUTE
Definition: PerfMonMTSvc.h:163
PerfMonMTSvc::m_checkPointThreshold
Gaudi::Property< uint64_t > m_checkPointThreshold
Frequency of event level monitoring.
Definition: PerfMonMTSvc.h:137
ATH_MSG_WARNING
#define ATH_MSG_WARNING(x)
Definition: AthMsgStreamMacros.h:32
Pythia8_RapidityOrderMPI.val
val
Definition: Pythia8_RapidityOrderMPI.py:14
PMonMT::ComponentMeasurement::malloc
double malloc
Definition: PerfMonMTUtils.h:78
PerfMon::LinFitSglPass::slope
double slope() const
Definition: LinFitSglPass.h:75
python.CaloCondTools.log
log
Definition: CaloCondTools.py:20
LArNewCalib_DelayDump_OFC_Cali.idx
idx
Definition: LArNewCalib_DelayDump_OFC_Cali.py:69
CaloCellTimeCorrFiller.filename
filename
Definition: CaloCellTimeCorrFiller.py:23
PerfMonMTSvc::m_compLevelDataMap_1stevt
data_map_t m_compLevelDataMap_1stevt
Definition: PerfMonMTSvc.h:200
PerfMonMTSvc::m_memFitLowerLimit
Gaudi::Property< uint64_t > m_memFitLowerLimit
Lower limit (in number of events) for the memory fit.
Definition: PerfMonMTSvc.h:133
LArCellBinning.step
step
Definition: LArCellBinning.py:158
PMonMT::StepComp::compName
std::string compName
Definition: PerfMonMTUtils.h:65
PerfMonMTSvc::FIRSTEVENT
@ FIRSTEVENT
Definition: PerfMonMTSvc.h:163
ATLAS_THREAD_SAFE
#define ATLAS_THREAD_SAFE
Definition: checker_macros.h:211
jobOptions.fileName
fileName
Definition: jobOptions.SuperChic_ALP2.py:39
test_pyathena.counter
counter
Definition: test_pyathena.py:15
PerfMonMTSvc::get_info_from_file
std::string get_info_from_file(const std::string &fileName, const std::string &fieldName) const
A few helper methods to get system information These should be carried to PerfMonMTUtils at some poin...
Definition: PerfMonMTSvc.cxx:892
pow
constexpr int pow(int base, int exp) noexcept
Definition: ap_fixedTest.cxx:15
PMonMT::doesDirectoryExist
bool doesDirectoryExist(const std::string &dir)
Definition: PerfMonMTUtils.h:415
PerfMonMTSvc::m_compLevelDataMapVec
std::vector< data_map_unique_t > m_compLevelDataMapVec
Definition: PerfMonMTSvc.h:198
PerfMonMTSvc::m_compLevelDataMap_cbk
data_map_t m_compLevelDataMap_cbk
Definition: PerfMonMTSvc.h:204
PerfMonMTSvc::data_map_unique_t
std::map< PMonMT::StepComp, std::unique_ptr< PMonMT::ComponentData > > data_map_unique_t
Definition: PerfMonMTSvc.h:190
PerfMonMTSvc::generate_state
PMonMT::StepComp generate_state(const std::string &stepName, const std::string &compName) const
Definition: PerfMonMTSvc.cxx:772
PMonMT::ComponentMeasurement::capture
void capture()
Definition: PerfMonMTUtils.h:81
python.trfValidateRootFile.rc
rc
Definition: trfValidateRootFile.py:375
PerfMonMTSvc::report
void report()
Report the results.
Definition: PerfMonMTSvc.cxx:381
PerfMonMTSvc::startCompAud
void startCompAud(const std::string &stepName, const std::string &compName, const EventContext &ctx)
Component Level Auditing: Take measurements at the beginning and at the end of each component call.
Definition: PerfMonMTSvc.cxx:251
PMonMT::EventLevelData::getEventLevelWallTime
double getEventLevelWallTime(const uint64_t event_count) const
Definition: PerfMonMTUtils.h:232
PerfMonMTSvc::m_compLevelDataMap_plp
data_map_t m_compLevelDataMap_plp
Definition: PerfMonMTSvc.h:203
PMonMT::EventLevelData::recordEvent
void recordEvent(const SnapshotMeasurement &meas, const int eventCount)
Definition: PerfMonMTUtils.h:203
PMonMT::EventLevelData::getEventLevelCpuTime
double getEventLevelCpuTime(const uint64_t event_count) const
Definition: PerfMonMTUtils.h:228
ServiceHandle< IIncidentSvc >
python.SystemOfUnits.ms
float ms
Definition: SystemOfUnits.py:148