ATLAS Offline Software
Loading...
Searching...
No Matches
TimeoutAlg Class Reference

Algorithm to monitor event timeouts. More...

#include <TimeoutAlg.h>

Inheritance diagram for TimeoutAlg:
Collaboration diagram for TimeoutAlg:

Public Member Functions

virtual StatusCode initialize () override
 Algorithm to monitor event timeouts.
virtual StatusCode execute (const EventContext &ctx) const override
virtual StatusCode stop () override
virtual void handle (const Incident &inc) override

Protected Member Functions

void setTimeout (Timeout &instance)
 Set timeout.
void resetTimeout (Timeout &instance)
 Reset timeout.

Private Types

using clock_t = std::chrono::steady_clock

Private Member Functions

void timeoutThread ()
 Watchdog thread.
void handleTimeout (EventContext::ContextID_t slot)
 Handle timeout.

Private Attributes

std::chrono::nanoseconds m_timeout
 Timeout property as duration.
SG::SlotSpecificObj< clock_t::time_point > m_eventStartTime ATLAS_THREAD_SAFE
 Start time of each event per slot.
std::thread m_thread ATLAS_THREAD_SAFE
 Watchdog thread.
std::promise< void > m_stop_thread
 Signal to stop watchdog thread.
std::atomic< bool > m_stopped {false}
 Has watchdog thread already been stopped? (to avoid setting future twice)
std::mutex m_handleMutex
 Mutex for handleTimeout.
Properties
Gaudi::Property< unsigned long long > m_timeoutProp
Gaudi::Property< unsigned long long > m_checkInterval
Gaudi::Property< bool > m_dumpState
Gaudi::Property< bool > m_abort

Detailed Description

Algorithm to monitor event timeouts.

Author
Frank Winklmeier
Date
Sep, 2025

Algorithm providing a watchdog thread for event timeouts.

This algorithm should run early on (ideally first) in the event sequence. It records the event start time and launches a watchdog thread that checks periodically if an event has timed out.

See the algorithm properties for possible actions on an event timeout.

Definition at line 35 of file TimeoutAlg.h.

Member Typedef Documentation

◆ clock_t

using TimeoutAlg::clock_t = std::chrono::steady_clock
private

Definition at line 48 of file TimeoutAlg.h.

Member Function Documentation

◆ execute()

StatusCode TimeoutAlg::execute ( const EventContext & ctx) const
overridevirtual

Definition at line 39 of file TimeoutAlg.cxx.

40{
41 // Timeout thread is started on first event to make sure this also works
42 // in athenaMP (threads usually don't survive forking).
43 [[maybe_unused]] static const bool initThread = [&](){
44 if (m_timeoutProp > 0) {
45 const auto nc_this ATLAS_THREAD_SAFE = const_cast<TimeoutAlg*>(this);
46 m_thread = std::thread(&TimeoutAlg::timeoutThread, nc_this);
47 }
48 return true;
49 }();
50
51 // Set event start time for current slot
52 *m_eventStartTime.get(ctx) = clock_t::now();
53
54 return StatusCode::SUCCESS;
55}
Gaudi::Property< unsigned long long > m_timeoutProp
Definition TimeoutAlg.h:58
void timeoutThread()
Watchdog thread.
SG::SlotSpecificObj< clock_t::time_point > m_eventStartTime ATLAS_THREAD_SAFE
Start time of each event per slot.
Definition TimeoutAlg.h:76

◆ handle()

void TimeoutAlg::handle ( const Incident & inc)
overridevirtual

Definition at line 58 of file TimeoutAlg.cxx.

59{
60 if (inc.type() == "EndAlgorithms") {
61 ATH_MSG_DEBUG("Resetting event timeout for slot " << inc.context().slot());
62 // Reset start time for slot to zero
63 *m_eventStartTime.get(inc.context()) = {};
64 }
65}
#define ATH_MSG_DEBUG(x)

◆ handleTimeout()

void TimeoutAlg::handleTimeout ( EventContext::ContextID_t slot)
private

Handle timeout.

Definition at line 109 of file TimeoutAlg.cxx.

110{
111 // To avoid getting another timeout while handling this one
112 std::scoped_lock lock(m_handleMutex);
113
114 // Create minimal context with slot number
115 const EventContext ctx(0, slot);
116
117 // Don't duplicate the actions if the timeout was already reached for this slot
118 if (Athena::Timeout::instance(ctx).reached()) return;
119
120 // Print ERROR message
121 const std::string msg = std::format("Event timeout ({}) in slot {} reached",
122 std::chrono::duration<double>(m_timeout), slot);
124
125 // Set timeout flag
127
128 // Dump scheduler state if requested
129 if (m_dumpState) {
130 ServiceHandle<IScheduler> schedulerSvc("AvalancheSchedulerSvc", name());
131 if (schedulerSvc.retrieve().isSuccess()) {
132 schedulerSvc->dumpState();
133 }
134 }
135
136 // Abort job if requested
137 if (m_abort) {
138 // Stop the timeout thread to avoid additional triggers
139 stop().ignore();
140
141 // Tell CoreDumpSvc about the reason for the abort
142 ServiceHandle<ICoreDumpSvc> coreDumpSvc("CoreDumpSvc", name());
143 if ( coreDumpSvc.retrieve().isSuccess() ) {
144 coreDumpSvc->setCoreDumpInfo(ctx, "Reason", msg);
145 }
146 else {
147 std::cerr << msg << std::endl;
148 }
149 // Abort job (and let CoreDumpSvc handle SIGABRT)
150 std::abort();
151 }
152
153}
#define ATH_MSG_ERROR(x)
void setTimeout(Timeout &instance)
Set timeout.
Definition Timeout.h:80
static Timeout & instance()
Get reference to Timeout singleton.
Definition Timeout.h:64
virtual void setCoreDumpInfo(const std::string &name, const std::string &value) override
Set a name/value pair in the core dump record.
Gaudi::Property< bool > m_dumpState
Definition TimeoutAlg.h:64
std::mutex m_handleMutex
Mutex for handleTimeout.
Definition TimeoutAlg.h:88
virtual StatusCode stop() override
Gaudi::Property< bool > m_abort
Definition TimeoutAlg.h:67
std::chrono::nanoseconds m_timeout
Timeout property as duration.
Definition TimeoutAlg.h:73
CoreDumpSvc * coreDumpSvc(nullptr)
pointer to CoreDumpSvc
MsgStream & msg
Definition testRead.cxx:32

◆ initialize()

StatusCode TimeoutAlg::initialize ( )
overridevirtual

Algorithm to monitor event timeouts.

Author
Frank Winklmeier
Date
Sep, 2025

Definition at line 21 of file TimeoutAlg.cxx.

22{
23 if (RUNNING_ON_VALGRIND) {
24 ATH_MSG_INFO("Detected running on valgrind. Disabling algorithm timeout");
26 return StatusCode::SUCCESS;
27 }
28 m_timeout = std::chrono::nanoseconds(m_timeoutProp);
29
30 // Subscribe to EndAlgorithms (includes output sequence)
31 ServiceHandle<IIncidentSvc> incSvc("IncidentSvc/IncidentSvc", name());
32 ATH_CHECK(incSvc.retrieve());
33 incSvc->addListener(this, "EndAlgorithms", /*priority*/ 0);
34
35 return StatusCode::SUCCESS;
36}
#define ATH_CHECK
Evaluate an expression and check for errors.
#define ATH_MSG_INFO(x)

◆ resetTimeout()

void Athena::TimeoutMaster::resetTimeout ( Timeout & instance)
inlineprotectedinherited

Reset timeout.

Definition at line 83 of file Timeout.h.

83{ instance.reset(); }
std::map< std::string, double > instance

◆ setTimeout()

void Athena::TimeoutMaster::setTimeout ( Timeout & instance)
inlineprotectedinherited

Set timeout.

Definition at line 80 of file Timeout.h.

80{ instance.set(); }

◆ stop()

StatusCode TimeoutAlg::stop ( )
overridevirtual

Definition at line 68 of file TimeoutAlg.cxx.

69{
70 if (m_thread.joinable() && !m_stopped.exchange(true)) {
71 // Signal timeout thread to stop
72 ATH_MSG_DEBUG("Stopping timeout thread");
73 m_stop_thread.set_value();
74 m_thread.join();
75 }
76
77 return StatusCode::SUCCESS;
78}
std::promise< void > m_stop_thread
Signal to stop watchdog thread.
Definition TimeoutAlg.h:82
std::atomic< bool > m_stopped
Has watchdog thread already been stopped? (to avoid setting future twice)
Definition TimeoutAlg.h:85

◆ timeoutThread()

void TimeoutAlg::timeoutThread ( )
private

Watchdog thread.

Definition at line 81 of file TimeoutAlg.cxx.

82{
83 ATH_MSG_INFO(std::format("Setting per-event timeout of {}",
84 std::chrono::duration<double>(m_timeout)));
85
86 // Wakeup at regular intervals (with a minimum frequency, useful for long timeouts)
87 const std::chrono::nanoseconds wakeup_interval =
88 std::min(m_timeout, std::chrono::nanoseconds(m_checkInterval));
89
90 // Loop until we have received stop signal
91 auto stop_signal = m_stop_thread.get_future();
92 while ( stop_signal.wait_for(wakeup_interval) == std::future_status::timeout ) {
93
94 // Loop over all slots and check if event has reached timeout
95 const auto now = clock_t::now();
96 for (EventContext::ContextID_t slot = 0;
97 const auto& startTime : m_eventStartTime) {
98
99 if (startTime.time_since_epoch().count() > 0 && now > startTime + m_timeout) {
100 handleTimeout(slot);
101 }
102
103 ++slot;
104 }
105 }
106}
void handleTimeout(EventContext::ContextID_t slot)
Handle timeout.
Gaudi::Property< unsigned long long > m_checkInterval
Definition TimeoutAlg.h:61

Member Data Documentation

◆ ATLAS_THREAD_SAFE [1/2]

SG::SlotSpecificObj<clock_t::time_point> m_eventStartTime TimeoutAlg::ATLAS_THREAD_SAFE
mutableprivate

Start time of each event per slot.

Definition at line 76 of file TimeoutAlg.h.

◆ ATLAS_THREAD_SAFE [2/2]

std::thread m_thread TimeoutAlg::ATLAS_THREAD_SAFE
mutableprivate

Watchdog thread.

Definition at line 79 of file TimeoutAlg.h.

◆ m_abort

Gaudi::Property<bool> TimeoutAlg::m_abort
private
Initial value:
{
this, "AbortJob", false, "Abort job on timeout"
}

Definition at line 67 of file TimeoutAlg.h.

67 {
68 this, "AbortJob", false, "Abort job on timeout"
69 };

◆ m_checkInterval

Gaudi::Property<unsigned long long> TimeoutAlg::m_checkInterval
private
Initial value:
{
this, "MaxCheckInterval", 10*1e9, "Maximum time (ns) between timeout checks"
}

Definition at line 61 of file TimeoutAlg.h.

61 {
62 this, "MaxCheckInterval", 10*1e9, "Maximum time (ns) between timeout checks"
63 };

◆ m_dumpState

Gaudi::Property<bool> TimeoutAlg::m_dumpState
private
Initial value:
{
this, "DumpSchedulerState", false, "Print scheduler state on timeout"
}

Definition at line 64 of file TimeoutAlg.h.

64 {
65 this, "DumpSchedulerState", false, "Print scheduler state on timeout"
66 };

◆ m_handleMutex

std::mutex TimeoutAlg::m_handleMutex
private

Mutex for handleTimeout.

Definition at line 88 of file TimeoutAlg.h.

◆ m_stop_thread

std::promise<void> TimeoutAlg::m_stop_thread
private

Signal to stop watchdog thread.

Definition at line 82 of file TimeoutAlg.h.

◆ m_stopped

std::atomic<bool> TimeoutAlg::m_stopped {false}
private

Has watchdog thread already been stopped? (to avoid setting future twice)

Definition at line 85 of file TimeoutAlg.h.

85{false};

◆ m_timeout

std::chrono::nanoseconds TimeoutAlg::m_timeout
private

Timeout property as duration.

Definition at line 73 of file TimeoutAlg.h.

◆ m_timeoutProp

Gaudi::Property<unsigned long long> TimeoutAlg::m_timeoutProp
private
Initial value:
{
this, "Timeout", 0, "Timeout in nanoseconds (0 means disabled)"
}

Definition at line 58 of file TimeoutAlg.h.

58 {
59 this, "Timeout", 0, "Timeout in nanoseconds (0 means disabled)"
60 };

The documentation for this class was generated from the following files: