ATLAS Offline Software
Public Member Functions | Protected Member Functions | Private Types | Private Member Functions | Private Attributes | List of all members
TimeoutAlg Class Reference

Algorithm to monitor event timeouts. More...

#include <TimeoutAlg.h>

Inheritance diagram for TimeoutAlg:
Collaboration diagram for TimeoutAlg:

Public Member Functions

virtual StatusCode initialize () override
 Algorithm to monitor event timeouts. More...
 
virtual StatusCode execute (const EventContext &ctx) const override
 
virtual StatusCode stop () override
 
virtual void handle (const Incident &inc) override
 

Protected Member Functions

void setTimeout (Timeout &instance)
 Set timeout. More...
 
void resetTimeout (Timeout &instance)
 Reset timeout. More...
 

Private Types

using clock_t = std::chrono::steady_clock
 

Private Member Functions

void timeoutThread ()
 Watchdog thread. More...
 
void handleTimeout (EventContext::ContextID_t slot)
 Handle timeout. More...
 

Private Attributes

std::chrono::nanoseconds m_timeout
 Timeout property as duration. More...
 
SG::SlotSpecificObj< clock_t::time_point > m_eventStartTime ATLAS_THREAD_SAFE
 Start time of each event per slot. More...
 
std::thread m_thread ATLAS_THREAD_SAFE
 Watchdog thread. More...
 
std::promise< void > m_stop_thread
 Signal to stop watchdog thread. More...
 
std::atomic< bool > m_stopped {false}
 Has watchdog thread already been stopped? (to avoid setting future twice) More...
 
std::mutex m_handleMutex
 Mutex for handleTimeout. More...
 
Properties
Gaudi::Property< unsigned long long > m_timeoutProp
 
Gaudi::Property< unsigned long long > m_checkInterval
 
Gaudi::Property< bool > m_dumpState
 
Gaudi::Property< bool > m_abort
 

Detailed Description

Algorithm to monitor event timeouts.

Author
Frank Winklmeier
Date
Sep, 2025

Algorithm providing a watchdog thread for event timeouts.

This algorithm should run early on (ideally first) in the event sequence. It records the event start time and launches a watchdog thread that checks periodically if an event has timed out.

See the algorithm properties for possible actions on an event timeout.

Definition at line 35 of file TimeoutAlg.h.

Member Typedef Documentation

◆ clock_t

using TimeoutAlg::clock_t = std::chrono::steady_clock
private

Definition at line 48 of file TimeoutAlg.h.

Member Function Documentation

◆ execute()

StatusCode TimeoutAlg::execute ( const EventContext &  ctx) const
overridevirtual

Definition at line 33 of file TimeoutAlg.cxx.

34 {
35  // Timeout thread is started on first event to make sure this also works
36  // in athenaMP (threads usually don't survive forking).
37  [[maybe_unused]] static const bool initThread = [&](){
38  if (m_timeoutProp > 0) {
39  const auto nc_this ATLAS_THREAD_SAFE = const_cast<TimeoutAlg*>(this);
40  m_thread = std::thread(&TimeoutAlg::timeoutThread, nc_this);
41  }
42  return true;
43  }();
44 
45  // Set event start time for current slot
46  *m_eventStartTime.get(ctx) = clock_t::now();
47 
48  return StatusCode::SUCCESS;
49 }

◆ handle()

void TimeoutAlg::handle ( const Incident &  inc)
overridevirtual

Definition at line 52 of file TimeoutAlg.cxx.

53 {
54  if (inc.type() == "EndAlgorithms") {
55  ATH_MSG_DEBUG("Resetting event timeout for slot " << inc.context().slot());
56  // Reset start time for slot to zero
57  *m_eventStartTime.get(inc.context()) = {};
58  }
59 }

◆ handleTimeout()

void TimeoutAlg::handleTimeout ( EventContext::ContextID_t  slot)
private

Handle timeout.

Definition at line 103 of file TimeoutAlg.cxx.

104 {
105  // To avoid getting another timeout while handling this one
106  std::scoped_lock lock(m_handleMutex);
107 
108  // Create minimal context with slot number
109  const EventContext ctx(0, slot);
110 
111  // Don't duplicate the actions if the timeout was already reached for this slot
112  if (Athena::Timeout::instance(ctx).reached()) return;
113 
114  // Print ERROR message
115  const std::string msg = std::format("Event timeout ({}) in slot {} reached",
116  std::chrono::duration<double>(m_timeout), slot);
118 
119  // Set timeout flag
121 
122  // Dump scheduler state if requested
123  if (m_dumpState) {
124  ServiceHandle<IScheduler> schedulerSvc("AvalancheSchedulerSvc", name());
125  if (schedulerSvc.retrieve().isSuccess()) {
126  schedulerSvc->dumpState();
127  }
128  }
129 
130  // Abort job if requested
131  if (m_abort) {
132  // Stop the timeout thread to avoid additional triggers
133  stop().ignore();
134 
135  // Tell CoreDumpSvc about the reason for the abort
136  ServiceHandle<ICoreDumpSvc> coreDumpSvc("CoreDumpSvc", name());
137  if ( coreDumpSvc.retrieve().isSuccess() ) {
138  coreDumpSvc->setCoreDumpInfo(ctx, "Reason", msg);
139  }
140  else {
141  std::cerr << msg << std::endl;
142  }
143  // Abort job (and let CoreDumpSvc handle SIGABRT)
144  std::abort();
145  }
146 
147 }

◆ initialize()

StatusCode TimeoutAlg::initialize ( )
overridevirtual

Algorithm to monitor event timeouts.

Author
Frank Winklmeier
Date
Sep, 2025

Definition at line 20 of file TimeoutAlg.cxx.

21 {
22  m_timeout = std::chrono::nanoseconds(m_timeoutProp);
23 
24  // Subscribe to EndAlgorithms (includes output sequence)
25  ServiceHandle<IIncidentSvc> incSvc("IncidentSvc/IncidentSvc", name());
26  ATH_CHECK(incSvc.retrieve());
27  incSvc->addListener(this, "EndAlgorithms", /*priority*/ 0);
28 
29  return StatusCode::SUCCESS;
30 }

◆ resetTimeout()

void Athena::TimeoutMaster::resetTimeout ( Timeout instance)
inlineprotectedinherited

Reset timeout.

Definition at line 83 of file Timeout.h.

83 { instance.reset(); }

◆ setTimeout()

void Athena::TimeoutMaster::setTimeout ( Timeout instance)
inlineprotectedinherited

Set timeout.

Definition at line 80 of file Timeout.h.

80 { instance.set(); }

◆ stop()

StatusCode TimeoutAlg::stop ( )
overridevirtual

Definition at line 62 of file TimeoutAlg.cxx.

63 {
64  if (m_thread.joinable() && !m_stopped.exchange(true)) {
65  // Signal timeout thread to stop
66  ATH_MSG_DEBUG("Stopping timeout thread");
67  m_stop_thread.set_value();
68  m_thread.join();
69  }
70 
71  return StatusCode::SUCCESS;
72 }

◆ timeoutThread()

void TimeoutAlg::timeoutThread ( )
private

Watchdog thread.

Definition at line 75 of file TimeoutAlg.cxx.

76 {
77  ATH_MSG_INFO(std::format("Setting per-event timeout of {}",
78  std::chrono::duration<double>(m_timeout)));
79 
80  // Wakeup at regular intervals (with a minimum frequency, useful for long timeouts)
81  const std::chrono::nanoseconds wakeup_interval =
82  std::min(m_timeout, std::chrono::nanoseconds(m_checkInterval));
83 
84  // Loop until we have received stop signal
85  auto stop_signal = m_stop_thread.get_future();
86  while ( stop_signal.wait_for(wakeup_interval) == std::future_status::timeout ) {
87 
88  // Loop over all slots and check if event has reached timeout
89  const auto now = clock_t::now();
90  for (EventContext::ContextID_t slot = 0;
91  const auto& startTime : m_eventStartTime) {
92 
93  if (startTime.time_since_epoch().count() > 0 && now > startTime + m_timeout) {
94  handleTimeout(slot);
95  }
96 
97  ++slot;
98  }
99  }
100 }

Member Data Documentation

◆ ATLAS_THREAD_SAFE [1/2]

SG::SlotSpecificObj<clock_t::time_point> m_eventStartTime TimeoutAlg::ATLAS_THREAD_SAFE
mutableprivate

Start time of each event per slot.

Definition at line 76 of file TimeoutAlg.h.

◆ ATLAS_THREAD_SAFE [2/2]

std::thread m_thread TimeoutAlg::ATLAS_THREAD_SAFE
mutableprivate

Watchdog thread.

Definition at line 79 of file TimeoutAlg.h.

◆ m_abort

Gaudi::Property<bool> TimeoutAlg::m_abort
private
Initial value:
{
this, "AbortJob", false, "Abort job on timeout"
}

Definition at line 67 of file TimeoutAlg.h.

◆ m_checkInterval

Gaudi::Property<unsigned long long> TimeoutAlg::m_checkInterval
private
Initial value:
{
this, "MaxCheckInterval", 10*1e9, "Maximum time (ns) between timeout checks"
}

Definition at line 61 of file TimeoutAlg.h.

◆ m_dumpState

Gaudi::Property<bool> TimeoutAlg::m_dumpState
private
Initial value:
{
this, "DumpSchedulerState", false, "Print scheduler state on timeout"
}

Definition at line 64 of file TimeoutAlg.h.

◆ m_handleMutex

std::mutex TimeoutAlg::m_handleMutex
private

Mutex for handleTimeout.

Definition at line 88 of file TimeoutAlg.h.

◆ m_stop_thread

std::promise<void> TimeoutAlg::m_stop_thread
private

Signal to stop watchdog thread.

Definition at line 82 of file TimeoutAlg.h.

◆ m_stopped

std::atomic<bool> TimeoutAlg::m_stopped {false}
private

Has watchdog thread already been stopped? (to avoid setting future twice)

Definition at line 85 of file TimeoutAlg.h.

◆ m_timeout

std::chrono::nanoseconds TimeoutAlg::m_timeout
private

Timeout property as duration.

Definition at line 73 of file TimeoutAlg.h.

◆ m_timeoutProp

Gaudi::Property<unsigned long long> TimeoutAlg::m_timeoutProp
private
Initial value:
{
this, "Timeout", 0, "Timeout in nanoseconds (0 means disabled)"
}

Definition at line 58 of file TimeoutAlg.h.


The documentation for this class was generated from the following files:
TimeoutAlg::m_dumpState
Gaudi::Property< bool > m_dumpState
Definition: TimeoutAlg.h:64
TimeoutAlg::m_checkInterval
Gaudi::Property< unsigned long long > m_checkInterval
Definition: TimeoutAlg.h:61
TimeoutAlg::ATLAS_THREAD_SAFE
SG::SlotSpecificObj< clock_t::time_point > m_eventStartTime ATLAS_THREAD_SAFE
Start time of each event per slot.
Definition: TimeoutAlg.h:76
vtune_athena.format
format
Definition: vtune_athena.py:14
ATH_MSG_INFO
#define ATH_MSG_INFO(x)
Definition: AthMsgStreamMacros.h:31
min
constexpr double min()
Definition: ap_fixedTest.cxx:26
lumiFormat.startTime
startTime
Definition: lumiFormat.py:95
TimeoutAlg::m_timeout
std::chrono::nanoseconds m_timeout
Timeout property as duration.
Definition: TimeoutAlg.h:73
python.RatesEmulationExample.lock
lock
Definition: RatesEmulationExample.py:148
TimeoutAlg
Algorithm to monitor event timeouts.
Definition: TimeoutAlg.h:37
CoreDumpSvc::setCoreDumpInfo
virtual void setCoreDumpInfo(const std::string &name, const std::string &value) override
Set a name/value pair in the core dump record.
Definition: CoreDumpSvc.cxx:365
instance
std::map< std::string, double > instance
Definition: Run_To_Get_Tags.h:8
TimeoutAlg::m_timeoutProp
Gaudi::Property< unsigned long long > m_timeoutProp
Definition: TimeoutAlg.h:58
python.handimod.now
now
Definition: handimod.py:674
ATH_MSG_ERROR
#define ATH_MSG_ERROR(x)
Definition: AthMsgStreamMacros.h:33
ATH_MSG_DEBUG
#define ATH_MSG_DEBUG(x)
Definition: AthMsgStreamMacros.h:29
CoreDumpSvcHandler::coreDumpSvc
CoreDumpSvc * coreDumpSvc(nullptr)
pointer to CoreDumpSvc
TimeoutAlg::handleTimeout
void handleTimeout(EventContext::ContextID_t slot)
Handle timeout.
Definition: TimeoutAlg.cxx:103
ATH_CHECK
#define ATH_CHECK
Definition: AthCheckMacros.h:40
TimeoutAlg::m_stop_thread
std::promise< void > m_stop_thread
Signal to stop watchdog thread.
Definition: TimeoutAlg.h:82
Athena::Timeout::instance
static Timeout & instance()
Get reference to Timeout singleton.
Definition: Timeout.h:64
TimeoutAlg::m_handleMutex
std::mutex m_handleMutex
Mutex for handleTimeout.
Definition: TimeoutAlg.h:88
TimeoutAlg::timeoutThread
void timeoutThread()
Watchdog thread.
Definition: TimeoutAlg.cxx:75
name
std::string name
Definition: Control/AthContainers/Root/debug.cxx:240
TimeoutAlg::m_abort
Gaudi::Property< bool > m_abort
Definition: TimeoutAlg.h:67
TimeoutAlg::stop
virtual StatusCode stop() override
Definition: TimeoutAlg.cxx:62
Athena::TimeoutMaster::setTimeout
void setTimeout(Timeout &instance)
Set timeout.
Definition: Timeout.h:80
python.TrigInDetArtSteps.timeout
timeout
Definition: TrigInDetArtSteps.py:36
python.AutoConfigFlags.msg
msg
Definition: AutoConfigFlags.py:7
ServiceHandle< IScheduler >
TimeoutAlg::m_stopped
std::atomic< bool > m_stopped
Has watchdog thread already been stopped? (to avoid setting future twice)
Definition: TimeoutAlg.h:85