ATLAS Offline Software
Classes | Public Member Functions | Protected Member Functions | Private Member Functions | Friends | List of all members
CoreDumpSvc Class Reference

Service to print additional information before a crash. More...

#include <CoreDumpSvc.h>

Inheritance diagram for CoreDumpSvc:
Collaboration diagram for CoreDumpSvc:

Classes

struct  sysDumpRec
 

Public Member Functions

 CoreDumpSvc (const std::string &name, ISvcLocator *pSvcLocator) ATLAS_CTORDTOR_NOT_THREAD_SAFE
 Constructor with parameters. More...
 
virtual ~CoreDumpSvc () ATLAS_CTORDTOR_NOT_THREAD_SAFE
 Destructor. More...
 
ICoreDumpSvc implementation
virtual void setCoreDumpInfo (const std::string &name, const std::string &value) override
 Set a name/value pair in the core dump record. More...
 
virtual void setCoreDumpInfo (const EventContext &ctx, const std::string &name, const std::string &value) override
 Set a name/value pair in the core dump record for given EventContext. More...
 
virtual std::string dump () const override
 Print all core dump records. More...
 

Protected Member Functions

 CoreDumpSvc ()
 Default constructor (do not use) More...
 

Private Member Functions

void propertyHandler ATLAS_NOT_THREAD_SAFE (Gaudi::Details::PropertyBase &p)
 Property handler. More...
 
void print ATLAS_NOT_THREAD_SAFE ()
 Print core dump records to configured stream. More...
 
void setSigInfo (siginfo_t *info)
 Set pointer to siginfo_t struct. More...
 
StatusCode installSignalHandler ATLAS_NOT_THREAD_SAFE ()
 Install signal handlers. More...
 
StatusCode uninstallSignalHandler ATLAS_NOT_THREAD_SAFE ()
 Uninstall signal handlers. More...
 
void setAltStack ()
 Set up an alternate stack for the current thread. More...
 

Friends

void CoreDumpSvcHandler::action (int sig, siginfo_t *info, void *extra)
 

Gaudi implementation

typedef tbb::concurrent_unordered_map< std::string, std::string > UserCore_t
 
std::vector< UserCore_tm_usrCoreDumps
 User defined core dump info. More...
 
std::vector< sysDumpRecm_sysCoreDumps
 Core dump info collected by this service
More...
 
siginfo_tm_siginfo {nullptr}
 Pointer to siginfo_t struct (set by signal handler) More...
 
std::atomic< EventID::event_number_tm_eventCounter {0}
 Event counter. More...
 
Gaudi::Property< std::vector< int > > m_signals
 Alternate stack for signal handler. More...
 
Gaudi::Property< bool > m_callOldHandler
 
Gaudi::Property< bool > m_dumpCoreFile
 
Gaudi::Property< bool > m_stackTrace
 
Gaudi::Property< bool > m_fastStackTrace
 
Gaudi::Property< std::string > m_coreDumpStream
 
Gaudi::Property< int > m_fatalHandlerFlags
 
Gaudi::Property< double > m_timeout
 
Gaudi::Property< bool > m_killOnSigInt {this, "KillOnSigInt",true, "Terminate job on SIGINT (aka Ctrl-C)"}
 
static thread_local std::vector< uint8_t > s_stack
 
virtual StatusCode initialize ATLAS_NOT_THREAD_SAFE () override
 
virtual StatusCode start () override
 
virtual StatusCode finalize ATLAS_NOT_THREAD_SAFE () override
 
virtual void handle (const Incident &incident) override
 Incident listener. More...
 

Detailed Description

Service to print additional information before a crash.

Author
Frank Winklmeier
Sami Kama

This service will catch fatal signals and print its internal core dump record. The service collects some information during event processing. Additional information can be added via setCoreDumpInfo().

For a list of job option properties see CoreDumpSvc::CoreDumpSvc().

Definition at line 44 of file CoreDumpSvc.h.

Member Typedef Documentation

◆ UserCore_t

typedef tbb::concurrent_unordered_map<std::string,std::string > CoreDumpSvc::UserCore_t
private

Definition at line 91 of file CoreDumpSvc.h.

Constructor & Destructor Documentation

◆ CoreDumpSvc() [1/2]

CoreDumpSvc::CoreDumpSvc ( )
protected

Default constructor (do not use)

◆ CoreDumpSvc() [2/2]

CoreDumpSvc::CoreDumpSvc ( const std::string &  name,
ISvcLocator *  pSvcLocator 
)

Constructor with parameters.

Definition at line 236 of file CoreDumpSvc.cxx.

236  :
237  base_class( name, pSvcLocator )
238 {
239  // Set us as the current instance
241 
242  m_callOldHandler.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
243  m_dumpCoreFile.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
244  m_stackTrace.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
245  m_fastStackTrace.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
246  m_coreDumpStream.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
247  m_fatalHandlerFlags.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
248  m_killOnSigInt.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
249  // Allocate for 2 slots just for now.
250  m_usrCoreDumps.resize(2);
251  m_sysCoreDumps.resize(2);
252 }

◆ ~CoreDumpSvc()

CoreDumpSvc::~CoreDumpSvc ( )
virtual

Destructor.

Definition at line 254 of file CoreDumpSvc.cxx.

255 {
257 }

Member Function Documentation

◆ ATLAS_NOT_THREAD_SAFE() [1/6]

void print CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( )
private

Print core dump records to configured stream.

◆ ATLAS_NOT_THREAD_SAFE() [2/6]

StatusCode installSignalHandler CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( )
private

Install signal handlers.

◆ ATLAS_NOT_THREAD_SAFE() [3/6]

StatusCode uninstallSignalHandler CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( )
private

Uninstall signal handlers.

◆ ATLAS_NOT_THREAD_SAFE() [4/6]

virtual StatusCode initialize CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( )
overridevirtual

◆ ATLAS_NOT_THREAD_SAFE() [5/6]

virtual StatusCode finalize CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( )
overridevirtual

◆ ATLAS_NOT_THREAD_SAFE() [6/6]

void propertyHandler CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( Gaudi::Details::PropertyBase &  p)
private

Property handler.

◆ dump()

std::string CoreDumpSvc::dump ( ) const
overridevirtual

Print all core dump records.

Definition at line 392 of file CoreDumpSvc.cxx.

393 {
394  std::ostringstream os;
395  char buf[26];
396  const time_t now = time(nullptr);
397 
398  os << "-------------------------------------------------------------------------------------" << "\n";
399  os << "Core dump from " << name() << " on " << System::hostName()
400  << " at " << ctime_r(&now, buf) /*<< "\n"*/; // ctime adds "\n"
401  os << "\n";
402 
403  // Print additional information if available
404  if (m_siginfo) {
405  int signo = m_siginfo->si_signo; // shorthand
406 
407  os << "Caught signal " << signo
408  << "(" << strsignal(signo) << "). Details: "
409  << "\n";
410 
411  os << " errno = " << m_siginfo->si_errno
412  << ", code = " << m_siginfo->si_code
413  << " (" << Athena::Signal::describe(signo, m_siginfo->si_code) << ")"
414  << "\n";
415 
416  os << " pid = " << m_siginfo->si_pid
417  << ", uid = " << m_siginfo->si_uid
418  << "\n";
419 
420 #ifndef __APPLE__
421  // These are set if the POSIX signal sender passed them.
422  os << " value = (" << m_siginfo->si_int << ", "
423  << std::hex << m_siginfo->si_ptr << ")" << std::dec << "\n";
424 #endif
425 
426  // memory usage informations
428 
429  const long pagesz = sysconf(_SC_PAGESIZE);
430  os << " vmem = " << s.vm_pages*pagesz/1024./1024. << " MB\n"
431  << " rss = " << s.rss_pages*pagesz/1024./1024. << " MB\n";
432 
433 #ifndef __APPLE__
434  // more memory usage informations (system wide stuff)
435  // see sysinfo(2)
436 
437  {
438  struct sysinfo sys;
439  if ( 0 == sysinfo(&sys) ) {
440  // all sizes are reported in sys.mem_unit bytes
441  const float mem_units = sys.mem_unit/(1024.*1024.);
442  os << " total-ram = " << sys.totalram * mem_units << " MB\n"
443  << " free-ram = " << sys.freeram * mem_units << " MB\n"
444  << " buffer-ram= " << sys.bufferram* mem_units << " MB\n"
445  << " total-swap= " << sys.totalswap* mem_units << " MB\n"
446  << " free-swap = " << sys.freeswap * mem_units << " MB\n";
447  }
448  }
449 #endif
450 
451  // This is the interesting address for memory faults.
452  if (signo == SIGILL || signo == SIGFPE || signo == SIGSEGV || signo == SIGBUS)
453  os << " addr = " << std::hex << m_siginfo->si_addr << std::dec << "\n";
454 
455  os << "\n";
456  }
457 
458  os << "Event counter: " << m_eventCounter << "\n";
459 
460 
461  IAlgExecStateSvc* algExecStateSvc(nullptr);
462  IAlgContextSvc* algContextSvc(nullptr);
463 
464  // Use AlgExecStateSvc in MT, otherwise AlgContextSvc
465  if (Gaudi::Concurrency::ConcurrencyFlags::numConcurrentEvents() > 0) {
466  service("AlgExecStateSvc", algExecStateSvc, /*createIf=*/ false).ignore();
467  }
468  else {
469  service("AlgContextSvc", algContextSvc, /*createIf=*/ false).ignore();
470  }
471 
472  // Loop over all slots
473  for (size_t t=0; t < m_sysCoreDumps.size(); ++t){
474 
475  // Currently executing algorithm(s)
476  std::string currentAlg;
477  if (algExecStateSvc) {
478  ATH_MSG_DEBUG("Using AlgExecStateSvc to determine current algorithm(s)");
479  try {
480  // We copy on purpose to avoid modification while we examine it
481  auto states = algExecStateSvc->algExecStates(EventContext(0,t));
482  for (const auto& kv : states) {
483  if (kv.second.state()==AlgExecState::State::Executing)
484  currentAlg += (kv.first + " ");
485  }
486  }
487  catch (const GaudiException&) { // can happen if we get called before any algo execution
488  ATH_MSG_INFO("No information from AlgExecStateSvc because no algorithm was executed yet.");
489  }
490  }
491  else if (algContextSvc) {
492  ATH_MSG_DEBUG("Using AlgContextSvc to determine current algorithm");
493  IAlgorithm* alg = algContextSvc->currentAlg();
494  if (alg) currentAlg = alg->name();
495  }
496  else {
497  ATH_MSG_WARNING("AlgExecStateSvc or AlgContextSvc not available. Cannot determine current algorithm.");
498  }
499 
500  if (currentAlg.empty()) currentAlg = "<NONE>";
501  os << "Slot " << std::setw(3) << t << " : Current algorithm = " << currentAlg << std::endl;
502 
503  // System core dump
504  auto &sys = m_sysCoreDumps.at(t);
505  if (!sys.LastInc.empty()) {
506  os << " : Last Incident = " << sys.LastInc << std::endl
507  << " : Event ID = " << sys.EvId << std::endl;
508  }
509 
510  // User core dump
511  auto &usr = m_usrCoreDumps.at(t);
512  if (!usr.empty()) {
513  for (auto &s : usr) {
514  os << " : (usr) " << s.first << " = " << s.second << std::endl;
515  }
516  }
517  }
518 
519  if (algContextSvc) {
520  os << "Algorithm stack: ";
521  if ( algContextSvc->algorithms().empty() ) os << "<EMPTY>" << "\n";
522  else {
523  os << "\n";
524  for (auto alg : algContextSvc->algorithms()) {
525  if (alg) os << " " << alg->name() << "\n";
526  }
527  }
528  }
529 
530  os << horizLine;
531  os << "| AtlasBaseDir : " << std::setw(66) << getenv("AtlasBaseDir") << " |\n";
532  os << "| AtlasVersion : " << std::setw(66) << getenv("AtlasVersion") << " |\n";
533  os << "| BINARY_TAG : " << std::setw(66) << getenv("BINARY_TAG") << " |\n";
534  os << horizLine;
535  os << " Note: to see line numbers in below stacktrace you might consider running following :\n";
536  os << " atlasAddress2Line --file <logfile>\n";
537 
538  IAthenaSummarySvc *iass(nullptr);
539  if (service("AthenaSummarySvc",iass,false).isSuccess() && iass) {
540  iass->addSummary("CoreDumpSvc",os.str());
541  iass->setStatus(1);
542  iass->createSummary().ignore();
543  }
544 
545  return os.str();
546 }

◆ handle()

void CoreDumpSvc::handle ( const Incident &  incident)
overridevirtual

Incident listener.

Definition at line 552 of file CoreDumpSvc.cxx.

553 {
554  //handle is single threaded in context;
555  auto slot = incident.context().valid() ? incident.context().slot() : 0;
556  auto &currRec = m_sysCoreDumps.at(slot);
557 
558  currRec.LastInc = incident.source() + ":" + incident.type();
559 
560  std::ostringstream oss;
561  oss << incident.context().eventID();
562  currRec.EvId = oss.str();
563 
564  if (incident.type()==IncidentType::BeginEvent) {
565  // Set up an alternate stack for this thread, if not already done.
566  setAltStack();
567  ++m_eventCounter;
568  } else if (incident.type() == "StoreCleared") {
569  // Try to force reallocation.
570  auto newstr = currRec.EvId;
571  // Intentional:
572  // cppcheck-suppress selfAssignment
573  newstr[0] = newstr[0];
574  currRec.EvId = newstr;
575  }
576 
577 }

◆ setAltStack()

void CoreDumpSvc::setAltStack ( )
private

Set up an alternate stack for the current thread.

Definition at line 647 of file CoreDumpSvc.cxx.

648 {
649  std::vector<uint8_t>& stack = s_stack;
650  if (stack.empty()) {
651  stack.resize (std::max (SIGSTKSZ, MINSIGSTKSZ) + 2*1024*1024);
652  stack_t ss;
653  ss.ss_sp = stack.data();
654  ss.ss_flags = 0;
655  ss.ss_size = stack.size();
656  sigaltstack (&ss, nullptr);
657  }
658 }

◆ setCoreDumpInfo() [1/2]

void CoreDumpSvc::setCoreDumpInfo ( const EventContext &  ctx,
const std::string &  name,
const std::string &  value 
)
overridevirtual

Set a name/value pair in the core dump record for given EventContext.

Definition at line 369 of file CoreDumpSvc.cxx.

370 {
371  auto slot = ctx.valid() ? ctx.slot() : 0;
372  m_usrCoreDumps.at(slot)[name] = value;
373 }

◆ setCoreDumpInfo() [2/2]

void CoreDumpSvc::setCoreDumpInfo ( const std::string &  name,
const std::string &  value 
)
overridevirtual

Set a name/value pair in the core dump record.

Definition at line 364 of file CoreDumpSvc.cxx.

365 {
366  setCoreDumpInfo(Gaudi::Hive::currentContext(), name, value);
367 }

◆ setSigInfo()

void CoreDumpSvc::setSigInfo ( siginfo_t info)
inlineprivate

Set pointer to siginfo_t struct.

Definition at line 139 of file CoreDumpSvc.h.

139 { m_siginfo = info; }

◆ start()

StatusCode CoreDumpSvc::start ( )
overridevirtual

Definition at line 337 of file CoreDumpSvc.cxx.

338 {
339  auto numSlots = std::max<size_t>(1, Gaudi::Concurrency::ConcurrencyFlags::numConcurrentEvents());
340  m_usrCoreDumps.resize(numSlots);
341  m_sysCoreDumps.resize(numSlots);
342  return StatusCode::SUCCESS;
343 }

Friends And Related Function Documentation

◆ CoreDumpSvcHandler::action

void CoreDumpSvcHandler::action ( int  sig,
siginfo_t info,
void *  extra 
)
friend

Member Data Documentation

◆ m_callOldHandler

Gaudi::Property<bool> CoreDumpSvc::m_callOldHandler
private
Initial value:
{this, "CallOldHandler", true,
"Call previous signal handler"}

Definition at line 104 of file CoreDumpSvc.h.

◆ m_coreDumpStream

Gaudi::Property<std::string> CoreDumpSvc::m_coreDumpStream
private
Initial value:
{this, "CoreDumpStream", "stdout",
"Stream to use for core dump [stdout,stderr]"}

Definition at line 116 of file CoreDumpSvc.h.

◆ m_dumpCoreFile

Gaudi::Property<bool> CoreDumpSvc::m_dumpCoreFile
private
Initial value:
{this, "DumpCoreFile", false,
"Produce a core dump file if resource limits (ulimit -c) allow"}

Definition at line 107 of file CoreDumpSvc.h.

◆ m_eventCounter

std::atomic<EventID::event_number_t> CoreDumpSvc::m_eventCounter {0}
private

Event counter.

Definition at line 95 of file CoreDumpSvc.h.

◆ m_fastStackTrace

Gaudi::Property<bool> CoreDumpSvc::m_fastStackTrace
private
Initial value:
{this, "FastStackTrace", false,
"Produce fast stack trace of current thread"}

Definition at line 113 of file CoreDumpSvc.h.

◆ m_fatalHandlerFlags

Gaudi::Property<int> CoreDumpSvc::m_fatalHandlerFlags
private
Initial value:
{this, "FatalHandler", 0,
"Flags given to the fatal handler this service installs\n"
"if the flag is zero, no additional fatal handler is installed."}

Definition at line 119 of file CoreDumpSvc.h.

◆ m_killOnSigInt

Gaudi::Property<bool> CoreDumpSvc::m_killOnSigInt {this, "KillOnSigInt",true, "Terminate job on SIGINT (aka Ctrl-C)"}
private

Definition at line 127 of file CoreDumpSvc.h.

◆ m_siginfo

siginfo_t* CoreDumpSvc::m_siginfo {nullptr}
private

Pointer to siginfo_t struct (set by signal handler)

Definition at line 94 of file CoreDumpSvc.h.

◆ m_signals

Gaudi::Property<std::vector<int> > CoreDumpSvc::m_signals
private
Initial value:
{this, "Signals", {SIGSEGV,SIGBUS,SIGILL,SIGFPE,SIGALRM},
"List of signals to catch"}

Alternate stack for signal handler.

Properties

Definition at line 101 of file CoreDumpSvc.h.

◆ m_stackTrace

Gaudi::Property<bool> CoreDumpSvc::m_stackTrace
private
Initial value:
{this, "StackTrace", false,
"Produce (gdb) stack trace on crash. Useful if no other signal handler is used"}

Definition at line 110 of file CoreDumpSvc.h.

◆ m_sysCoreDumps

std::vector<sysDumpRec> CoreDumpSvc::m_sysCoreDumps
private

Core dump info collected by this service

Definition at line 93 of file CoreDumpSvc.h.

◆ m_timeout

Gaudi::Property<double> CoreDumpSvc::m_timeout
private
Initial value:
{this, "TimeOut", 30.0*60*1e9,
"Terminate job after it this reaches the time out in Wallclock time, "
"usually due to hanging during stack unwinding. Timeout given in nanoseconds despite seconds precision"}

Definition at line 123 of file CoreDumpSvc.h.

◆ m_usrCoreDumps

std::vector<UserCore_t> CoreDumpSvc::m_usrCoreDumps
private

User defined core dump info.

Definition at line 92 of file CoreDumpSvc.h.

◆ s_stack

thread_local std::vector< uint8_t > CoreDumpSvc::s_stack
staticprivate

Definition at line 97 of file CoreDumpSvc.h.


The documentation for this class was generated from the following files:
grepfile.info
info
Definition: grepfile.py:38
04Plot.stack
list stack
Definition: 04Plot.py:10
python.SystemOfUnits.s
int s
Definition: SystemOfUnits.py:131
SGout2dot.alg
alg
Definition: SGout2dot.py:243
max
#define max(a, b)
Definition: cfImp.cxx:41
PowhegControl_ttHplus_NLO.ss
ss
Definition: PowhegControl_ttHplus_NLO.py:83
ATH_MSG_INFO
#define ATH_MSG_INFO(x)
Definition: AthMsgStreamMacros.h:31
CoreDumpSvc::m_fastStackTrace
Gaudi::Property< bool > m_fastStackTrace
Definition: CoreDumpSvc.h:113
CoreDumpSvc::m_sysCoreDumps
std::vector< sysDumpRec > m_sysCoreDumps
Core dump info collected by this service
Definition: CoreDumpSvc.h:93
IAthenaSummarySvc
Abstract produces summary of Athena stuff.
Definition: IAthenaSummarySvc.h:18
CoreDumpSvc::m_callOldHandler
Gaudi::Property< bool > m_callOldHandler
Definition: CoreDumpSvc.h:104
CoreDumpSvc::m_coreDumpStream
Gaudi::Property< std::string > m_coreDumpStream
Definition: CoreDumpSvc.h:116
athena.value
value
Definition: athena.py:122
CoreDumpSvc::setAltStack
void setAltStack()
Set up an alternate stack for the current thread.
Definition: CoreDumpSvc.cxx:647
read_hist_ntuple.t
t
Definition: read_hist_ntuple.py:5
CoreDumpSvc::m_eventCounter
std::atomic< EventID::event_number_t > m_eventCounter
Event counter.
Definition: CoreDumpSvc.h:95
mapkey::sys
@ sys
Definition: TElectronEfficiencyCorrectionTool.cxx:42
CoreDumpSvc::setCoreDumpInfo
virtual void setCoreDumpInfo(const std::string &name, const std::string &value) override
Set a name/value pair in the core dump record.
Definition: CoreDumpSvc.cxx:364
python.handimod.now
now
Definition: handimod.py:675
python.DecayParser.buf
buf
print ("=> [%s]"cmd)
Definition: DecayParser.py:27
CoreDumpSvc::m_siginfo
siginfo_t * m_siginfo
Pointer to siginfo_t struct (set by signal handler)
Definition: CoreDumpSvc.h:94
ATH_MSG_DEBUG
#define ATH_MSG_DEBUG(x)
Definition: AthMsgStreamMacros.h:29
CoreDumpSvcHandler::coreDumpSvc
CoreDumpSvc * coreDumpSvc(nullptr)
pointer to CoreDumpSvc
urldecode::states
states
Definition: urldecode.h:39
ReadFromCoolCompare.os
os
Definition: ReadFromCoolCompare.py:231
CoreDumpSvc::m_dumpCoreFile
Gaudi::Property< bool > m_dumpCoreFile
Definition: CoreDumpSvc.h:107
name
std::string name
Definition: Control/AthContainers/Root/debug.cxx:195
athena_statm
Definition: read_athena_statm.h:13
CoreDumpSvc::m_killOnSigInt
Gaudi::Property< bool > m_killOnSigInt
Definition: CoreDumpSvc.h:127
CoreDumpSvc::m_stackTrace
Gaudi::Property< bool > m_stackTrace
Definition: CoreDumpSvc.h:110
read_athena_statm
struct athena_statm read_athena_statm()
Definition: read_athena_statm.cxx:15
Athena::Signal::describe
static const char * describe(int sig, int code)
Return the description for signal info code code for signal number sig.
Definition: SealSignal.cxx:968
SCT_ConditionsAlgorithms::CoveritySafe::getenv
std::string getenv(const std::string &variableName)
get an environment variable
Definition: SCT_ConditionsUtilities.cxx:17
CaloSwCorrections.time
def time(flags, cells_name, *args, **kw)
Definition: CaloSwCorrections.py:242
ATH_MSG_WARNING
#define ATH_MSG_WARNING(x)
Definition: AthMsgStreamMacros.h:32
CoreDumpSvc::m_usrCoreDumps
std::vector< UserCore_t > m_usrCoreDumps
User defined core dump info.
Definition: CoreDumpSvc.h:92
CoreDumpSvc::s_stack
static thread_local std::vector< uint8_t > s_stack
Definition: CoreDumpSvc.h:97
CoreDumpSvc::m_fatalHandlerFlags
Gaudi::Property< int > m_fatalHandlerFlags
Definition: CoreDumpSvc.h:119