ATLAS Offline Software
PyJobTransforms.trfGraph Namespace Reference

Transform graph utilities. More...

Detailed Description

Transform graph utilities.

Graph which represents transform executors (nodes) connected vis data types (edges)

Author
atlas.nosp@m.-com.nosp@m.p-tra.nosp@m.nsfo.nosp@m.rms-d.nosp@m.ev@c.nosp@m.ern.c.nosp@m.h
Note
There are a few well established python graph implementations, but none seem to be in the ATLAS release (NetworkX, igraph). Our needs are so basic that we might well be able to just take a few well known routines and have them in this module. See, e.g., http://www.python.org/doc/essays/graphs.html
Basic idea is to have nodes representing athena jobs (sub-steps), with edges representing data. (It turns out this works, which is not true for the reverse representation when a job requires multiple inputs, e.g., DQHist merging needs the HIST_ESD and the HIST_AOD inputs, as edges can only connect 2 nodes; in contrast nodes can have an arbitrary number of edge connections.) The nodes have multiple input data types. Having multiple data types generally means that the node can execute is any of them are present (e.g., RDO or RAW). However, when multiple inputs are needed to execute these are bound into a tuple.
We do not track one path through the graph - we track one path for each data type we need to produce and record which nodes get hit. Then each hit node is executed in order. Need to record which data objects are going to be produced ephemerally. One of the most important problems is that we are trying to minimise the cost of all the paths we need to take. To do this we start with data sorted in topological order (i.e., data which is produced earliest in the process first). Each path is traced back to the starting node and the cheapest is taken. Once nodes have been switched on for one data type they are assumed free for other data types.
We represent a dataless edge with the fake data inputs and outputs {in,out}NULL. These are used by the graph tracer, but then removed from the input/output of the actual substeps.