ATLAS Offline Software
Loading...
Searching...
No Matches
TimeoutAlg Class Reference

Algorithm to monitor event timeouts. More...

#include <TimeoutAlg.h>

Inheritance diagram for TimeoutAlg:
Collaboration diagram for TimeoutAlg:

Public Member Functions

virtual StatusCode initialize () override
 Algorithm to monitor event timeouts.
virtual StatusCode execute (const EventContext &ctx) const override
virtual StatusCode stop () override
virtual void handle (const Incident &inc) override

Protected Member Functions

void setTimeout (Timeout &instance)
 Set timeout.
void resetTimeout (Timeout &instance)
 Reset timeout.

Private Types

using clock_t = std::chrono::steady_clock

Private Member Functions

void timeoutThread ()
 Watchdog thread.
void handleTimeout (EventContext::ContextID_t slot)
 Handle timeout.

Private Attributes

std::chrono::nanoseconds m_timeout
 Timeout property as duration.
SG::SlotSpecificObj< clock_t::time_point > m_eventStartTime ATLAS_THREAD_SAFE
 Start time of each event per slot.
std::thread m_thread ATLAS_THREAD_SAFE
 Watchdog thread.
std::promise< void > m_stop_thread
 Signal to stop watchdog thread.
std::atomic< bool > m_stopped {false}
 Has watchdog thread already been stopped? (to avoid setting future twice)
std::mutex m_handleMutex
 Mutex for handleTimeout.
Properties
Gaudi::Property< unsigned long long > m_timeoutProp
Gaudi::Property< unsigned long long > m_checkInterval
Gaudi::Property< bool > m_dumpState
Gaudi::Property< bool > m_abort

Detailed Description

Algorithm to monitor event timeouts.

Author
Frank Winklmeier
Date
Sep, 2025

Algorithm providing a watchdog thread for event timeouts.

This algorithm should run early on (ideally first) in the event sequence. It records the event start time and launches a watchdog thread that checks periodically if an event has timed out.

See the algorithm properties for possible actions on an event timeout.

Definition at line 35 of file TimeoutAlg.h.

Member Typedef Documentation

◆ clock_t

using TimeoutAlg::clock_t = std::chrono::steady_clock
private

Definition at line 48 of file TimeoutAlg.h.

Member Function Documentation

◆ execute()

StatusCode TimeoutAlg::execute ( const EventContext & ctx) const
overridevirtual

Definition at line 33 of file TimeoutAlg.cxx.

34{
35 // Timeout thread is started on first event to make sure this also works
36 // in athenaMP (threads usually don't survive forking).
37 [[maybe_unused]] static const bool initThread = [&](){
38 if (m_timeoutProp > 0) {
39 const auto nc_this ATLAS_THREAD_SAFE = const_cast<TimeoutAlg*>(this);
40 m_thread = std::thread(&TimeoutAlg::timeoutThread, nc_this);
41 }
42 return true;
43 }();
44
45 // Set event start time for current slot
46 *m_eventStartTime.get(ctx) = clock_t::now();
47
48 return StatusCode::SUCCESS;
49}
Gaudi::Property< unsigned long long > m_timeoutProp
Definition TimeoutAlg.h:58
void timeoutThread()
Watchdog thread.
SG::SlotSpecificObj< clock_t::time_point > m_eventStartTime ATLAS_THREAD_SAFE
Start time of each event per slot.
Definition TimeoutAlg.h:76

◆ handle()

void TimeoutAlg::handle ( const Incident & inc)
overridevirtual

Definition at line 52 of file TimeoutAlg.cxx.

53{
54 if (inc.type() == "EndAlgorithms") {
55 ATH_MSG_DEBUG("Resetting event timeout for slot " << inc.context().slot());
56 // Reset start time for slot to zero
57 *m_eventStartTime.get(inc.context()) = {};
58 }
59}
#define ATH_MSG_DEBUG(x)

◆ handleTimeout()

void TimeoutAlg::handleTimeout ( EventContext::ContextID_t slot)
private

Handle timeout.

Definition at line 103 of file TimeoutAlg.cxx.

104{
105 // To avoid getting another timeout while handling this one
106 std::scoped_lock lock(m_handleMutex);
107
108 // Create minimal context with slot number
109 const EventContext ctx(0, slot);
110
111 // Don't duplicate the actions if the timeout was already reached for this slot
112 if (Athena::Timeout::instance(ctx).reached()) return;
113
114 // Print ERROR message
115 const std::string msg = std::format("Event timeout ({}) in slot {} reached",
116 std::chrono::duration<double>(m_timeout), slot);
118
119 // Set timeout flag
121
122 // Dump scheduler state if requested
123 if (m_dumpState) {
124 ServiceHandle<IScheduler> schedulerSvc("AvalancheSchedulerSvc", name());
125 if (schedulerSvc.retrieve().isSuccess()) {
126 schedulerSvc->dumpState();
127 }
128 }
129
130 // Abort job if requested
131 if (m_abort) {
132 // Stop the timeout thread to avoid additional triggers
133 stop().ignore();
134
135 // Tell CoreDumpSvc about the reason for the abort
136 ServiceHandle<ICoreDumpSvc> coreDumpSvc("CoreDumpSvc", name());
137 if ( coreDumpSvc.retrieve().isSuccess() ) {
138 coreDumpSvc->setCoreDumpInfo(ctx, "Reason", msg);
139 }
140 else {
141 std::cerr << msg << std::endl;
142 }
143 // Abort job (and let CoreDumpSvc handle SIGABRT)
144 std::abort();
145 }
146
147}
#define ATH_MSG_ERROR(x)
void setTimeout(Timeout &instance)
Set timeout.
Definition Timeout.h:80
static Timeout & instance()
Get reference to Timeout singleton.
Definition Timeout.h:64
virtual void setCoreDumpInfo(const std::string &name, const std::string &value) override
Set a name/value pair in the core dump record.
Gaudi::Property< bool > m_dumpState
Definition TimeoutAlg.h:64
std::mutex m_handleMutex
Mutex for handleTimeout.
Definition TimeoutAlg.h:88
virtual StatusCode stop() override
Gaudi::Property< bool > m_abort
Definition TimeoutAlg.h:67
std::chrono::nanoseconds m_timeout
Timeout property as duration.
Definition TimeoutAlg.h:73
CoreDumpSvc * coreDumpSvc(nullptr)
pointer to CoreDumpSvc
MsgStream & msg
Definition testRead.cxx:32

◆ initialize()

StatusCode TimeoutAlg::initialize ( )
overridevirtual

Algorithm to monitor event timeouts.

Author
Frank Winklmeier
Date
Sep, 2025

Definition at line 20 of file TimeoutAlg.cxx.

21{
22 m_timeout = std::chrono::nanoseconds(m_timeoutProp);
23
24 // Subscribe to EndAlgorithms (includes output sequence)
25 ServiceHandle<IIncidentSvc> incSvc("IncidentSvc/IncidentSvc", name());
26 ATH_CHECK(incSvc.retrieve());
27 incSvc->addListener(this, "EndAlgorithms", /*priority*/ 0);
28
29 return StatusCode::SUCCESS;
30}
#define ATH_CHECK
Evaluate an expression and check for errors.

◆ resetTimeout()

void Athena::TimeoutMaster::resetTimeout ( Timeout & instance)
inlineprotectedinherited

Reset timeout.

Definition at line 83 of file Timeout.h.

83{ instance.reset(); }
std::map< std::string, double > instance

◆ setTimeout()

void Athena::TimeoutMaster::setTimeout ( Timeout & instance)
inlineprotectedinherited

Set timeout.

Definition at line 80 of file Timeout.h.

80{ instance.set(); }

◆ stop()

StatusCode TimeoutAlg::stop ( )
overridevirtual

Definition at line 62 of file TimeoutAlg.cxx.

63{
64 if (m_thread.joinable() && !m_stopped.exchange(true)) {
65 // Signal timeout thread to stop
66 ATH_MSG_DEBUG("Stopping timeout thread");
67 m_stop_thread.set_value();
68 m_thread.join();
69 }
70
71 return StatusCode::SUCCESS;
72}
std::promise< void > m_stop_thread
Signal to stop watchdog thread.
Definition TimeoutAlg.h:82
std::atomic< bool > m_stopped
Has watchdog thread already been stopped? (to avoid setting future twice)
Definition TimeoutAlg.h:85

◆ timeoutThread()

void TimeoutAlg::timeoutThread ( )
private

Watchdog thread.

Definition at line 75 of file TimeoutAlg.cxx.

76{
77 ATH_MSG_INFO(std::format("Setting per-event timeout of {}",
78 std::chrono::duration<double>(m_timeout)));
79
80 // Wakeup at regular intervals (with a minimum frequency, useful for long timeouts)
81 const std::chrono::nanoseconds wakeup_interval =
82 std::min(m_timeout, std::chrono::nanoseconds(m_checkInterval));
83
84 // Loop until we have received stop signal
85 auto stop_signal = m_stop_thread.get_future();
86 while ( stop_signal.wait_for(wakeup_interval) == std::future_status::timeout ) {
87
88 // Loop over all slots and check if event has reached timeout
89 const auto now = clock_t::now();
90 for (EventContext::ContextID_t slot = 0;
91 const auto& startTime : m_eventStartTime) {
92
93 if (startTime.time_since_epoch().count() > 0 && now > startTime + m_timeout) {
94 handleTimeout(slot);
95 }
96
97 ++slot;
98 }
99 }
100}
#define ATH_MSG_INFO(x)
void handleTimeout(EventContext::ContextID_t slot)
Handle timeout.
Gaudi::Property< unsigned long long > m_checkInterval
Definition TimeoutAlg.h:61

Member Data Documentation

◆ ATLAS_THREAD_SAFE [1/2]

SG::SlotSpecificObj<clock_t::time_point> m_eventStartTime TimeoutAlg::ATLAS_THREAD_SAFE
mutableprivate

Start time of each event per slot.

Definition at line 76 of file TimeoutAlg.h.

◆ ATLAS_THREAD_SAFE [2/2]

std::thread m_thread TimeoutAlg::ATLAS_THREAD_SAFE
mutableprivate

Watchdog thread.

Definition at line 79 of file TimeoutAlg.h.

◆ m_abort

Gaudi::Property<bool> TimeoutAlg::m_abort
private
Initial value:
{
this, "AbortJob", false, "Abort job on timeout"
}

Definition at line 67 of file TimeoutAlg.h.

67 {
68 this, "AbortJob", false, "Abort job on timeout"
69 };

◆ m_checkInterval

Gaudi::Property<unsigned long long> TimeoutAlg::m_checkInterval
private
Initial value:
{
this, "MaxCheckInterval", 10*1e9, "Maximum time (ns) between timeout checks"
}

Definition at line 61 of file TimeoutAlg.h.

61 {
62 this, "MaxCheckInterval", 10*1e9, "Maximum time (ns) between timeout checks"
63 };

◆ m_dumpState

Gaudi::Property<bool> TimeoutAlg::m_dumpState
private
Initial value:
{
this, "DumpSchedulerState", false, "Print scheduler state on timeout"
}

Definition at line 64 of file TimeoutAlg.h.

64 {
65 this, "DumpSchedulerState", false, "Print scheduler state on timeout"
66 };

◆ m_handleMutex

std::mutex TimeoutAlg::m_handleMutex
private

Mutex for handleTimeout.

Definition at line 88 of file TimeoutAlg.h.

◆ m_stop_thread

std::promise<void> TimeoutAlg::m_stop_thread
private

Signal to stop watchdog thread.

Definition at line 82 of file TimeoutAlg.h.

◆ m_stopped

std::atomic<bool> TimeoutAlg::m_stopped {false}
private

Has watchdog thread already been stopped? (to avoid setting future twice)

Definition at line 85 of file TimeoutAlg.h.

85{false};

◆ m_timeout

std::chrono::nanoseconds TimeoutAlg::m_timeout
private

Timeout property as duration.

Definition at line 73 of file TimeoutAlg.h.

◆ m_timeoutProp

Gaudi::Property<unsigned long long> TimeoutAlg::m_timeoutProp
private
Initial value:
{
this, "Timeout", 0, "Timeout in nanoseconds (0 means disabled)"
}

Definition at line 58 of file TimeoutAlg.h.

58 {
59 this, "Timeout", 0, "Timeout in nanoseconds (0 means disabled)"
60 };

The documentation for this class was generated from the following files: