ATLAS Offline Software
Loading...
Searching...
No Matches
CoreDumpSvc Class Reference

Service to print additional information before a crash. More...

#include <CoreDumpSvc.h>

Inheritance diagram for CoreDumpSvc:
Collaboration diagram for CoreDumpSvc:

Classes

struct  sysDumpRec

Public Member Functions

 CoreDumpSvc (const std::string &name, ISvcLocator *pSvcLocator) ATLAS_CTORDTOR_NOT_THREAD_SAFE
 Constructor with parameters.
virtual ~CoreDumpSvc () ATLAS_CTORDTOR_NOT_THREAD_SAFE
 Destructor.
ICoreDumpSvc implementation
virtual void setCoreDumpInfo (const std::string &name, const std::string &value) override
 Set a name/value pair in the core dump record.
virtual void setCoreDumpInfo (const EventContext &ctx, const std::string &name, const std::string &value) override
 Set a name/value pair in the core dump record for given EventContext.
virtual std::string dump () const override
 Print all core dump records.

Protected Member Functions

 CoreDumpSvc ()
 Default constructor (do not use)

Private Member Functions

void propertyHandler ATLAS_NOT_THREAD_SAFE (Gaudi::Details::PropertyBase &p)
 Property handler.
void print ATLAS_NOT_THREAD_SAFE ()
 Print core dump records to configured stream.
void setSigInfo (siginfo_t *info)
 Set pointer to siginfo_t struct.
StatusCode installSignalHandler ATLAS_NOT_THREAD_SAFE ()
 Install signal handlers.
StatusCode uninstallSignalHandler ATLAS_NOT_THREAD_SAFE ()
 Uninstall signal handlers.
void setAltStack ()
 Set up an alternate stack for the current thread.

Friends

void CoreDumpSvcHandler::action (int sig, siginfo_t *info, void *extra)

Gaudi implementation

typedef tbb::concurrent_unordered_map< std::string, std::string > UserCore_t
std::vector< UserCore_tm_usrCoreDumps
 User defined core dump info.
std::vector< sysDumpRecm_sysCoreDumps
 Core dump info collected by this service.
siginfo_tm_siginfo {nullptr}
 Pointer to siginfo_t struct (set by signal handler)
std::atomic< EventID::event_number_t > m_eventCounter {0}
 Event counter.
Gaudi::Property< std::vector< int > > m_signals
 Alternate stack for signal handler.
Gaudi::Property< bool > m_callOldHandler
Gaudi::Property< bool > m_dumpCoreFile
Gaudi::Property< bool > m_stackTrace
Gaudi::Property< bool > m_fastStackTrace
Gaudi::Property< std::string > m_coreDumpStream
Gaudi::Property< int > m_fatalHandlerFlags
Gaudi::Property< double > m_timeout
Gaudi::Property< bool > m_killOnSigInt {this, "KillOnSigInt",true, "Terminate job on SIGINT (aka Ctrl-C)"}
static thread_local std::vector< uint8_t > s_stack
virtual StatusCode initialize ATLAS_NOT_THREAD_SAFE () override
virtual StatusCode start () override
virtual StatusCode finalize ATLAS_NOT_THREAD_SAFE () override
virtual void handle (const Incident &incident) override
 Incident listener.

Detailed Description

Service to print additional information before a crash.

Author
Frank Winklmeier
Sami Kama

This service will catch fatal signals and print its internal core dump record. The service collects some information during event processing. Additional information can be added via setCoreDumpInfo().

For a list of job option properties see CoreDumpSvc::CoreDumpSvc().

Definition at line 41 of file CoreDumpSvc.h.

Member Typedef Documentation

◆ UserCore_t

typedef tbb::concurrent_unordered_map<std::string,std::string > CoreDumpSvc::UserCore_t
private

Definition at line 88 of file CoreDumpSvc.h.

Constructor & Destructor Documentation

◆ CoreDumpSvc() [1/2]

CoreDumpSvc::CoreDumpSvc ( )
protected

Default constructor (do not use)

◆ CoreDumpSvc() [2/2]

CoreDumpSvc::CoreDumpSvc ( const std::string & name,
ISvcLocator * pSvcLocator )

Constructor with parameters.

Definition at line 237 of file CoreDumpSvc.cxx.

237 :
238 base_class( name, pSvcLocator )
239{
240 // Set us as the current instance
242
243 m_callOldHandler.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
244 m_dumpCoreFile.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
245 m_stackTrace.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
246 m_fastStackTrace.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
247 m_coreDumpStream.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
248 m_fatalHandlerFlags.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
249 m_killOnSigInt.declareUpdateHandler(&CoreDumpSvc::propertyHandler, this);
250 // Allocate for 2 slots just for now.
251 m_usrCoreDumps.resize(2);
252 m_sysCoreDumps.resize(2);
253}
Gaudi::Property< int > m_fatalHandlerFlags
std::vector< UserCore_t > m_usrCoreDumps
User defined core dump info.
Definition CoreDumpSvc.h:89
Gaudi::Property< bool > m_callOldHandler
Gaudi::Property< bool > m_stackTrace
Gaudi::Property< bool > m_dumpCoreFile
Gaudi::Property< bool > m_killOnSigInt
Gaudi::Property< std::string > m_coreDumpStream
std::vector< sysDumpRec > m_sysCoreDumps
Core dump info collected by this service.
Definition CoreDumpSvc.h:90
Gaudi::Property< bool > m_fastStackTrace
CoreDumpSvc * coreDumpSvc(nullptr)
pointer to CoreDumpSvc

◆ ~CoreDumpSvc()

CoreDumpSvc::~CoreDumpSvc ( )
virtual

Destructor.

Definition at line 255 of file CoreDumpSvc.cxx.

256{
258}

Member Function Documentation

◆ ATLAS_NOT_THREAD_SAFE() [1/6]

void print CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( )
private

Print core dump records to configured stream.

◆ ATLAS_NOT_THREAD_SAFE() [2/6]

StatusCode installSignalHandler CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( )
private

Install signal handlers.

◆ ATLAS_NOT_THREAD_SAFE() [3/6]

StatusCode uninstallSignalHandler CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( )
private

Uninstall signal handlers.

◆ ATLAS_NOT_THREAD_SAFE() [4/6]

virtual StatusCode initialize CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( )
overridevirtual

◆ ATLAS_NOT_THREAD_SAFE() [5/6]

virtual StatusCode finalize CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( )
overridevirtual

◆ ATLAS_NOT_THREAD_SAFE() [6/6]

void propertyHandler CoreDumpSvc::ATLAS_NOT_THREAD_SAFE ( Gaudi::Details::PropertyBase & p)
private

Property handler.

◆ dump()

std::string CoreDumpSvc::dump ( ) const
overridevirtual

Print all core dump records.

Definition at line 393 of file CoreDumpSvc.cxx.

394{
395 std::ostringstream os;
396 char buf[26];
397 const time_t now = time(nullptr);
398
399 os << "-------------------------------------------------------------------------------------" << "\n";
400 os << "Core dump from " << name() << " on " << System::hostName()
401 << " at " << ctime_r(&now, buf) /*<< "\n"*/; // ctime adds "\n"
402 os << "\n";
403
404 // Print additional information if available
405 if (m_siginfo) {
406 int signo = m_siginfo->si_signo; // shorthand
407
408 os << "Caught signal " << signo
409 << "(" << strsignal(signo) << "). Details: "
410 << "\n";
411
412 os << " errno = " << m_siginfo->si_errno
413 << ", code = " << m_siginfo->si_code
414 << " (" << Athena::Signal::describe(signo, m_siginfo->si_code) << ")"
415 << "\n";
416
417 os << " pid = " << m_siginfo->si_pid
418 << ", uid = " << m_siginfo->si_uid
419 << "\n";
420
421#ifndef __APPLE__
422 // These are set if the POSIX signal sender passed them.
423 os << " value = (" << m_siginfo->si_int << ", "
424 << std::hex << m_siginfo->si_ptr << ")" << std::dec << "\n";
425#endif
426
427 // memory usage informations
428 athena_statm s = read_athena_statm();
429
430 const long pagesz = sysconf(_SC_PAGESIZE);
431 os << " vmem = " << s.vm_pages*pagesz/1024./1024. << " MB\n"
432 << " rss = " << s.rss_pages*pagesz/1024./1024. << " MB\n";
433
434#ifndef __APPLE__
435 // more memory usage informations (system wide stuff)
436 // see sysinfo(2)
437
438 {
439 struct sysinfo sys;
440 if ( 0 == sysinfo(&sys) ) {
441 // all sizes are reported in sys.mem_unit bytes
442 const float mem_units = sys.mem_unit/(1024.*1024.);
443 os << " total-ram = " << sys.totalram * mem_units << " MB\n"
444 << " free-ram = " << sys.freeram * mem_units << " MB\n"
445 << " buffer-ram= " << sys.bufferram* mem_units << " MB\n"
446 << " total-swap= " << sys.totalswap* mem_units << " MB\n"
447 << " free-swap = " << sys.freeswap * mem_units << " MB\n";
448 }
449 }
450#endif
451
452 // This is the interesting address for memory faults.
453 if (signo == SIGILL || signo == SIGFPE || signo == SIGSEGV || signo == SIGBUS)
454 os << " addr = " << std::hex << m_siginfo->si_addr << std::dec << "\n";
455
456 os << "\n";
457 }
458
459 os << "Event counter: " << m_eventCounter << "\n";
460
461 SmartIF<IAlgManager> algMgr{serviceLocator()->as<IAlgManager>()};
462 SmartIF<IAlgContextSvc> algContextSvc;
463
464
465 // For serial, retrieve AlgContextSvc
466 if (Gaudi::Concurrency::ConcurrencyFlags::numConcurrentEvents() == 0) {
467 algContextSvc = service("AlgContextSvc", /*createIf=*/ false);
468 }
469
470 // Loop over all slots
471 for (size_t t=0; t < m_sysCoreDumps.size(); ++t){
472
473 // Currently executing algorithm(s)
474 std::string currentAlg;
475
476 // Use AlgExecStateSvc in MT, otherwise AlgContextSvc
477 if (Gaudi::Concurrency::ConcurrencyFlags::numConcurrentEvents() > 0) {
478 const EventContext ctx(0,t);
479 ATH_MSG_DEBUG("Using AlgExecStateSvc to determine current algorithm(s)");
480 try {
481 for (const IAlgorithm* alg : algMgr->getAlgorithms()) {
482 auto aes = alg->execState(ctx);
483 if (aes.state()==AlgExecState::State::Executing)
484 currentAlg += (alg->name() + " ");
485 }
486 }
487 catch (const GaudiException&) { // can happen if we get called before any algo execution
488 ATH_MSG_INFO("No information from AlgExecStateSvc because no algorithm was executed yet.");
489 }
490 }
491 else if (algContextSvc) {
492 ATH_MSG_DEBUG("Using AlgContextSvc to determine current algorithm");
493 IAlgorithm* alg = algContextSvc->currentAlg();
494 if (alg) currentAlg = alg->name();
495 }
496 else {
497 ATH_MSG_WARNING("AlgExecStateSvc or AlgContextSvc not available. Cannot determine current algorithm.");
498 }
499
500 if (currentAlg.empty()) currentAlg = "<NONE>";
501 os << "Slot " << std::setw(3) << t << " : Current algorithm = " << currentAlg << std::endl;
502
503 // System core dump
504 auto &sys = m_sysCoreDumps.at(t);
505 if (!sys.LastInc.empty()) {
506 os << " : Last Incident = " << sys.LastInc << std::endl
507 << " : Event ID = " << sys.EvId << std::endl;
508 }
509
510 // User core dump
511 auto &usr = m_usrCoreDumps.at(t);
512 if (!usr.empty()) {
513 for (auto &s : usr) {
514 os << " : (usr) " << s.first << " = " << s.second << std::endl;
515 }
516 }
517 }
518
519 if (algContextSvc) {
520 os << "Algorithm stack: ";
521 if ( algContextSvc->algorithms().empty() ) os << "<EMPTY>" << "\n";
522 else {
523 os << "\n";
524 for (auto alg : algContextSvc->algorithms()) {
525 if (alg) os << " " << alg->name() << "\n";
526 }
527 }
528 }
529
530 os << horizLine;
531 os << "| AtlasBaseDir : " << std::setw(66) << getenv("AtlasBaseDir") << " |\n";
532 os << "| AtlasVersion : " << std::setw(66) << getenv("AtlasVersion") << " |\n";
533 os << "| BINARY_TAG : " << std::setw(66) << getenv("BINARY_TAG") << " |\n";
534 os << horizLine;
535 os << " Note: to see line numbers in below stacktrace you might consider running following :\n";
536 os << " atlasAddress2Line --file <logfile>\n";
537
538 SmartIF<IAthenaSummarySvc> iass{service("AthenaSummarySvc", /*createIf*/false)};
539 if (iass) {
540 iass->addSummary("CoreDumpSvc",os.str());
541 iass->setStatus(1);
542 iass->createSummary().ignore();
543 }
544
545 return os.str();
546}
#define ATH_MSG_INFO(x)
#define ATH_MSG_WARNING(x)
#define ATH_MSG_DEBUG(x)
static const char * describe(int sig, int code)
Return the description for signal info code code for signal number sig.
siginfo_t * m_siginfo
Pointer to siginfo_t struct (set by signal handler)
Definition CoreDumpSvc.h:91
std::atomic< EventID::event_number_t > m_eventCounter
Event counter.
Definition CoreDumpSvc.h:92
time(flags, cells_name, *args, **kw)
std::string getenv(const std::string &variableName)
get an environment variable
struct athena_statm read_athena_statm()

◆ handle()

void CoreDumpSvc::handle ( const Incident & incident)
overridevirtual

Incident listener.

Definition at line 552 of file CoreDumpSvc.cxx.

553{
554 //handle is single threaded in context;
555 auto slot = incident.context().valid() ? incident.context().slot() : 0;
556 auto &currRec = m_sysCoreDumps.at(slot);
557
558 currRec.LastInc = incident.source() + ":" + incident.type();
559
560 std::ostringstream oss;
561 oss << incident.context().eventID();
562 currRec.EvId = oss.str();
563
564 if (incident.type()==IncidentType::BeginEvent) {
565 // Set up an alternate stack for this thread, if not already done.
566 setAltStack();
568 } else if (incident.type() == "StoreCleared") {
569 // Try to force reallocation.
570 auto newstr = currRec.EvId;
571 // Intentional:
572 // cppcheck-suppress selfAssignment
573 newstr[0] = newstr[0];
574 currRec.EvId = std::move(newstr);
575 }
576
577}
void setAltStack()
Set up an alternate stack for the current thread.

◆ setAltStack()

void CoreDumpSvc::setAltStack ( )
private

Set up an alternate stack for the current thread.

Definition at line 647 of file CoreDumpSvc.cxx.

648{
649 std::vector<uint8_t>& stack = s_stack;
650 if (stack.empty()) {
651 stack.resize (std::max (SIGSTKSZ, MINSIGSTKSZ) + 2*1024*1024);
652 stack_t ss;
653 ss.ss_sp = stack.data();
654 ss.ss_flags = 0;
655 ss.ss_size = stack.size();
656 int ret = sigaltstack (&ss, nullptr);
657 if ( ret!=0 ) {
658 ATH_MSG_WARNING("Error on setting alternative stack! ");
659 }
660 }
661}
static Double_t ss
static thread_local std::vector< uint8_t > s_stack
Definition CoreDumpSvc.h:94

◆ setCoreDumpInfo() [1/2]

void CoreDumpSvc::setCoreDumpInfo ( const EventContext & ctx,
const std::string & name,
const std::string & value )
overridevirtual

Set a name/value pair in the core dump record for given EventContext.

Definition at line 370 of file CoreDumpSvc.cxx.

371{
372 auto slot = ctx.valid() ? ctx.slot() : 0;
373 m_usrCoreDumps.at(slot)[name] = value;
374}

◆ setCoreDumpInfo() [2/2]

void CoreDumpSvc::setCoreDumpInfo ( const std::string & name,
const std::string & value )
overridevirtual

Set a name/value pair in the core dump record.

Definition at line 365 of file CoreDumpSvc.cxx.

366{
367 setCoreDumpInfo(Gaudi::Hive::currentContext(), name, value);
368}
virtual void setCoreDumpInfo(const std::string &name, const std::string &value) override
Set a name/value pair in the core dump record.

◆ setSigInfo()

void CoreDumpSvc::setSigInfo ( siginfo_t * info)
inlineprivate

Set pointer to siginfo_t struct.

Definition at line 136 of file CoreDumpSvc.h.

◆ start()

StatusCode CoreDumpSvc::start ( )
overridevirtual

Definition at line 338 of file CoreDumpSvc.cxx.

339{
340 auto numSlots = std::max<size_t>(1, Gaudi::Concurrency::ConcurrencyFlags::numConcurrentEvents());
341 m_usrCoreDumps.resize(numSlots);
342 m_sysCoreDumps.resize(numSlots);
343 return StatusCode::SUCCESS;
344}

◆ CoreDumpSvcHandler::action

void CoreDumpSvcHandler::action ( int sig,
siginfo_t * info,
void * extra )
friend

Member Data Documentation

◆ m_callOldHandler

Gaudi::Property<bool> CoreDumpSvc::m_callOldHandler
private
Initial value:
{this, "CallOldHandler", true,
"Call previous signal handler"}

Definition at line 101 of file CoreDumpSvc.h.

101 {this, "CallOldHandler", true,
102 "Call previous signal handler"};

◆ m_coreDumpStream

Gaudi::Property<std::string> CoreDumpSvc::m_coreDumpStream
private
Initial value:
{this, "CoreDumpStream", "stdout",
"Stream to use for core dump [stdout,stderr]"}

Definition at line 113 of file CoreDumpSvc.h.

113 {this, "CoreDumpStream", "stdout",
114 "Stream to use for core dump [stdout,stderr]"};

◆ m_dumpCoreFile

Gaudi::Property<bool> CoreDumpSvc::m_dumpCoreFile
private
Initial value:
{this, "DumpCoreFile", false,
"Produce a core dump file if resource limits (ulimit -c) allow"}

Definition at line 104 of file CoreDumpSvc.h.

104 {this, "DumpCoreFile", false,
105 "Produce a core dump file if resource limits (ulimit -c) allow"};

◆ m_eventCounter

std::atomic<EventID::event_number_t> CoreDumpSvc::m_eventCounter {0}
private

Event counter.

Definition at line 92 of file CoreDumpSvc.h.

92{0};

◆ m_fastStackTrace

Gaudi::Property<bool> CoreDumpSvc::m_fastStackTrace
private
Initial value:
{this, "FastStackTrace", false,
"Produce fast stack trace of current thread"}

Definition at line 110 of file CoreDumpSvc.h.

110 {this, "FastStackTrace", false,
111 "Produce fast stack trace of current thread"};

◆ m_fatalHandlerFlags

Gaudi::Property<int> CoreDumpSvc::m_fatalHandlerFlags
private
Initial value:
{this, "FatalHandler", 0,
"Flags given to the fatal handler this service installs\n"
"if the flag is zero, no additional fatal handler is installed."}

Definition at line 116 of file CoreDumpSvc.h.

116 {this, "FatalHandler", 0,
117 "Flags given to the fatal handler this service installs\n"
118 "if the flag is zero, no additional fatal handler is installed."};

◆ m_killOnSigInt

Gaudi::Property<bool> CoreDumpSvc::m_killOnSigInt {this, "KillOnSigInt",true, "Terminate job on SIGINT (aka Ctrl-C)"}
private

Definition at line 124 of file CoreDumpSvc.h.

124{this, "KillOnSigInt",true, "Terminate job on SIGINT (aka Ctrl-C)"};

◆ m_siginfo

siginfo_t* CoreDumpSvc::m_siginfo {nullptr}
private

Pointer to siginfo_t struct (set by signal handler)

Definition at line 91 of file CoreDumpSvc.h.

91{nullptr};

◆ m_signals

Gaudi::Property<std::vector<int> > CoreDumpSvc::m_signals
private
Initial value:
{this, "Signals", {SIGSEGV,SIGBUS,SIGILL,SIGABRT,SIGFPE,SIGALRM},
"List of signals to catch"}

Alternate stack for signal handler.

Properties

Definition at line 98 of file CoreDumpSvc.h.

98 {this, "Signals", {SIGSEGV,SIGBUS,SIGILL,SIGABRT,SIGFPE,SIGALRM},
99 "List of signals to catch"};

◆ m_stackTrace

Gaudi::Property<bool> CoreDumpSvc::m_stackTrace
private
Initial value:
{this, "StackTrace", false,
"Produce (gdb) stack trace on crash. Useful if no other signal handler is used"}

Definition at line 107 of file CoreDumpSvc.h.

107 {this, "StackTrace", false,
108 "Produce (gdb) stack trace on crash. Useful if no other signal handler is used"};

◆ m_sysCoreDumps

std::vector<sysDumpRec> CoreDumpSvc::m_sysCoreDumps
private

Core dump info collected by this service.

Definition at line 90 of file CoreDumpSvc.h.

◆ m_timeout

Gaudi::Property<double> CoreDumpSvc::m_timeout
private
Initial value:
{this, "TimeOut", 30.0*60*1e9,
"Terminate job after it this reaches the time out in Wallclock time, "
"usually due to hanging during stack unwinding. Timeout given in nanoseconds despite seconds precision"}

Definition at line 120 of file CoreDumpSvc.h.

120 {this, "TimeOut", 30.0*60*1e9,
121 "Terminate job after it this reaches the time out in Wallclock time, "
122 "usually due to hanging during stack unwinding. Timeout given in nanoseconds despite seconds precision"};

◆ m_usrCoreDumps

std::vector<UserCore_t> CoreDumpSvc::m_usrCoreDumps
private

User defined core dump info.

Definition at line 89 of file CoreDumpSvc.h.

◆ s_stack

thread_local std::vector< uint8_t > CoreDumpSvc::s_stack
staticprivate

Definition at line 94 of file CoreDumpSvc.h.


The documentation for this class was generated from the following files: