ATLAS Offline Software
Loading...
Searching...
No Matches
yodamerge_tmp.py
Go to the documentation of this file.
1# Copyright (C) 2002-2022 CERN for the benefit of the ATLAS collaboration
2
3"""\
4%(prog)s [-o outfile] <yodafile1>[:<scale1>] <yodafile2>[:<scale1>] ...
5 e.g. %(prog)s run1.yoda run2.yoda run3.yoda (unweighted merging of three runs)
6 or %(prog)s run1.yoda:2.0 run2.yoda:3.142 (weighted merging of two runs)
7
8Merge analysis objects from multiple YODA files, combining the statistics of
9objects whose names are found in multiple files. May be used either to merge
10disjoint collections of data objects, or to combine multiple statistically
11independent runs of the same data objects into one high-statistics run. Optional
12scaling parameters may be given to rescale the weights of the objects on a
13per-file basis before merging.
14
15By default the output is written to stdout since we can't guess what would be
16a good automatic filename choice! Use the -o option to provide an output filename.
17
18
19IMPORTANT!
20 This script is not meant to handle all run merging situations or data objects:
21 there are limitations to what can be inferred from data objects alone. If you
22 need to do something more complex than the common cases handled by this script,
23 please write your own script / program to load and process the data objects.
24
25
26SCATTERS (E.G. HISTOGRAM RATIOS) CAN'T BE MERGED
27
28 Note that 'scatter' data objects, as opposed to histograms, cannot be merged
29 by this tool since they do not preserve sufficient statistical
30 information. The canonical example of this is a ratio plot: there are
31 infinitely many combinations of numerator and denominator which could give the
32 same ratio, and the result does not indicate anything about which of those
33 infinite inputs is right (or the effects of correlations in the division).
34
35 If you need to merge Scatter2D objects, you can write your own Python script
36 or C++ program using the YODA interface, and apply whatever case-specific
37 treatment is appropriate. By default the first such copy encountered will be
38 returned as the 'merged' output, with no actual merging having been done.
39
40NORMALIZED, UNNORMALIZED, OR A MIX?
41
42 An important detail in histogram merging is whether a statistical treatment
43 for normalized or unnormalized histograms should be used: in the former case
44 the normalization scaling must be undone *before* the histograms are added
45 together, and then re-applied afterwards. This script examines the ScaledBy
46 attribute each histograms to determine if it has been normalized. We make the
47 assumption that if ScaledBy exists (i.e. h.scaleW has been called) then the
48 histogram is normalized and we normalize the resulting merged histogram to the
49 weighted average of input norms; if there is no ScaledBy, we assume that the
50 histogram is not normalised.
51
52 This is not an infallible approach, but we believe is more robust than heuristics
53 to determine whether norms are sufficiently close to be considered equal.
54 In complicated situations you will again be better off writing your own
55 script or program to do the merging: the merging machinery of this script is
56 available directly in the yoda Python module.
57
58MORE NOTES
59
60 If all the input histograms with a particular path are found to have the same
61 normalization, and they have ScaledBy attributes indicating that a histogram
62 weight scaling has been applied in producing the input histograms, each
63 histogram in that group will be first unscaled by their appropriate factor, then
64 merged, and then re-normalized to the target value. Otherwise the weights from
65 each histogram copy will be directly added together with no attempt to guess an
66 appropriate normalization. The normalization guesses (and they are guesses --
67 see below) are made *before* application of the per-file scaling arguments.
68
69 IMPORTANT: note from the above that this script can't work out what to do
70 re. scaling and normalization of output histograms from the input data files
71 alone. It may be possible (although unlikely) that input histograms have the
72 same normalization but are meant to be added directly. It may also be the case
73 (and much more likely) that histograms which should be normalized to a common
74 value will not trigger the appropriate treatment due to e.g. statistical
75 fluctuations in each run's calculation of a cross-section used in the
76 normalization. And anything more complex than a global scaling (e.g. calculation
77 of a ratio or asymmetry) cannot be handled at all with a post-hoc scaling
78 treatment. The --assume-normalized command line option will force all histograms
79 to be treated as if they are normalized in the input, which can be useful if
80 you know that all the output histograms are indeed of this nature. If they are
81 not, it will go wrong: you have been warned!
82
83 Please use this script as a template if you need to do something more specific.
84
85 NOTE: there are many possible desired behaviours when merging runs, depending on
86 the factors above as well as whether the files being merged are of homogeneous
87 type, heterogeneous type, or a combination of both. It is tempting, therefore,
88 to add a large number of optional command-line parameters to this script, to
89 handle these cases. Experience from Rivet 1.x suggests that this is a bad idea:
90 if a problem is of programmatic complexity then a command-line interface which
91 attempts to solve it in general is doomed to both failure and unusability. Hence
92 we will NOT add extra arguments for applying different merging weights or
93 strategies based on analysis object path regexes, auto-identifying 'types' of
94 run, etc., etc.: if you need to merge data files in such complex ways, please
95 use this script as a template around which to write logic that satisfies your
96 particular requirements.
97"""
98
99import yoda
100import argparse
101import sys
102import math
103
104parser = argparse.ArgumentParser(usage=__doc__)
105parser.add_argument("INFILES", nargs="+", help="datafile1 datafile2 [...]")
106parser.add_argument("-o", "--output", default="-", dest="OUTPUT_FILE", metavar="PATH",
107 help="write output to specified path")
108parser.add_argument("--s1d-mode", "--s1dmode", default="assume_mean", dest="S1D_MODE", metavar="MODE",
109 help="choose strategy for combining Scatter1D objects: one of 'first', 'combine', 'assume_mean', 'add'")
110parser.add_argument("--s2d-mode", "--s2dmode", default="assume_mean", dest="S2D_MODE", metavar="MODE",
111 help="choose strategy for combining Scatter2D objects: one of 'first', 'combine', 'assume_mean', 'add'")
112parser.add_argument("--s3d-mode", "--s3dmode", default="assume_mean", dest="S3D_MODE", metavar="MODE",
113 help="choose strategy for combining Scatter3D objects: one of 'first', 'combine', 'assume_mean', 'add'")
114parser.add_argument("--type-mismatch-mode", default="scatter", dest="TYPE_MISMATCH_MODE", metavar="MODE",
115 help="choose strategy for combining objects whose types mismatch: one of 'first', 'scatter'")
116parser.add_argument("--add", "--stack", action="store_true", default=False, dest="STACK",
117 help="force simple stacking (also forces all scatter modes to 'add')")
118parser.add_argument("--no-veto-empty", action="store_false", default=True, dest="VETO_EMPTY",
119 help="disable the removal of empty (sumW=0) data objects _before_ applying merge heuristics. You probably want the default!")
120parser.add_argument("--assume-normalized", action="store_true", default=False, dest="ASSUME_NORMALIZED",
121 help="DEPRECATED, AND DOES NOTHING. This option _used_ to bypass the detection heuristic for unnormalized histograms")
122parser.add_argument("-m", "--match", action="append", dest="PATHPATTERNS", default=[],
123 help="only write out histograms whose $path/$name string matches any of these regexes")
124parser.add_argument("-M", "--unmatch", action="append", dest="PATHUNPATTERNS", default=[],
125 help="exclude histograms whose $path/$name string matches any of these regexes")
126args = parser.parse_args()
127
128
129# Include scatters in "add" mode
130if args.STACK:
131 args.S1D_MODE = "add"
132 args.S2D_MODE = "add"
133 args.S3D_MODE = "add"
134
135# Put the incoming objects into a dict from each path to a list of histos and scalings
136analysisobjects_in = {}
137for fa in args.INFILES:
138 filename, scale = fa, 1.0
139 if ":" in fa:
140 try:
141 filename, scale = fa.rsplit(":", 1)
142 scale = float(scale)
143 except Exception:
144 sys.stderr.write("Error processing arg '%s' with file:scale format\n" % fa)
145 aos = yoda.read(filename, patterns=args.PATHPATTERNS, unpatterns=args.PATHUNPATTERNS)
146 for aopath, ao in aos.items():
147 ao.setAnnotation("yodamerge_scale", scale)
148 analysisobjects_in.setdefault(aopath, []).append(ao)
149
150analysisobjects_out = {}
151for p, aos in analysisobjects_in.items():
152
153 # Identify the canonical aotype being handled from the type of the first entry in aos
154 aotype = type(aos[0])
155
156 # Check that types match, and just output the first one if they don't
157 if not all(type(ao) is aotype for ao in aos):
158 msg = "WARNING: cannot merge mismatched analysis object types for path %s: " % p
159 scatter_fail = False
160 if args.TYPE_MISMATCH_MODE == "scatter":
161 saos = []
162 for ao in aos:
163 sao = ao.mkScatter()
164 sao.setAnnotation("yodamerge_scale", ao.annotation("yodamerge_scale"))
165 saos.append(sao)
166 saotype = type(saos[0])
167 msg += "converting to %ss" % saotype.__name__
168 if all(type(sao) is saotype for sao in saos):
169 sys.stderr.write(msg + "\n")
170 aos = saos
171 aotype = saotype
172 else:
173 msg += "... failed, "
174 scatter_fail = True
175 if args.TYPE_MISMATCH_MODE == "first" or scatter_fail:
176 sys.stderr.write(msg + "returning first object\n")
177 analysisobjects_out[p] = aos[0]
178 continue
179
180 # Remove empty fillable data objects, to avoid gotchas where e.g. histos are normalised and hence
181 # ScaledBy should be set... but isn't because the emptiness blocked rescaling to finite area
182 if args.VETO_EMPTY:
183 # TODO: Add a Fillable interface/ABC and use that for the type matching
184 if aotype in (yoda.Counter, yoda.Histo1D, yoda.Histo2D, yoda.Profile1D, yoda.Profile2D):
185 aos_nonzero = [ao for ao in aos if ao.sumW() != 0] # < possible that this doesn't mean no fills :-/
186 # Just output the first histo if they are all empty
187 if not aos_nonzero:
188 analysisobjects_out[p] = aos[0]
189 continue
190 # Reset aos to only contain non-empty ones
191 aos = aos_nonzero
192
193 # Counter, Histo and Profile (i.e. Fillable) merging
194 # TODO: Add a Fillable interface/ABC and use that for the type matching
195 if aotype in (yoda.Counter, yoda.Histo1D, yoda.Histo2D, yoda.Profile1D, yoda.Profile2D):
196
197 # Identify a target rescaling factor from the 1/scalefactor-weighted norms of each run
198 rescale = None
199 if len(aos) > 1 and args.STACK:
200 pass # we're in dumb stacking mode
201 elif all("ScaledBy" in ao.annotations() for ao in aos):
202 try:
203 rescale = 1.0 / sum(float(ao.annotation("yodamerge_scale")) / float(ao.annotation("ScaledBy")) for ao in aos)
204 except ZeroDivisionError:
205 sys.stderr.write("WARNING: Abandoning normalized merge of path %s because ScaledBy attributes are zero.\n" % p)
206 elif all("ScaledBy" not in ao.annotations() for ao in aos):
207 pass
208 else:
209 sys.stderr.write("WARNING: Abandoning normalized merge of path %s because some but not all inputs have ScaledBy attributes\n" % p)
210
211 # Now that the normalization-identifying heuristic is done, apply user scalings and undo the normalization scaling if appropriate
212 for ao in aos:
213 if rescale:
214 ao.scaleW(1.0 / float(ao.annotation("ScaledBy")))
215 ao.scaleW(float(ao.annotation("yodamerge_scale")))
216
217 # Make a copy of the (scaled & unnormalized) first object as the basis for the output
218 # and merge for histograms (including weights, normalization, and user scaling)
219 ao_out = aos[0].clone()
220 ao_out.rmAnnotation("yodamerge_scale")
221 for ao in aos[1:]:
222 ao_out += ao
223 if rescale:
224 ao_out.scaleW(rescale)
225
226 # Merge for Scatters, assuming equal run sizes, and applying user scaling
227 else:
228
229 # Make a copy of the first object as the basis for merging (suitable for all Scatter types)
230 ao_out = aos[0].clone()
231 ao_out.rmAnnotation("yodamerge_scale")
232 # If there's only one object, there's no need to do any combining
233 if len(aos) == 1:
234 pass
235
236 elif aotype in (yoda.Scatter1D, yoda.Scatter2D, yoda.Scatter3D):
237
238 # Retrieve dimensionality of the Scatter*D object
239 dim = ao_out.dim()
240 SND_MODE = getattr(args, "S%dD_MODE" % dim)
241 axis = ['', 'x', 'y', 'z']
242
243 # Use asymptotic mean+stderr convergence statistics
244 if SND_MODE in ("assume_mean", "add"):
245
246 msg = "WARNING: Scatter%dD %s merge assumes asymptotic statistics and equal run sizes" % (dim, p)
247 if any(float(ao.annotation("yodamerge_scale")) != 1.0 for ao in aos):
248 msg += " (+ user scaling)"
249 sys.stderr.write(msg + "\n")
250 npoints = len(ao_out.points())
251 for i in range(npoints):
252 val_i = scalesum = 0.0
253 ep_i = {} # will hold the values of the multiple error sources
254 em_i = {} # will hold the values of the multiple error sources
255 for ao in aos:
256 scale = float(ao.annotation("yodamerge_scale"))
257 variations = ao.variations()
258 scalesum += scale
259 val_i += scale * ao.point(i).val(dim)
260 for var in variations:
261 if var not in ep_i.keys():
262 ep_i[var] = 0.
263 em_i[var] = 0.
264 ep_i[var] += (scale * ao.point(i).errs(dim, var)[0]) ** 2
265 em_i[var] += (scale * ao.point(i).errs(dim, var)[1]) ** 2
266 for var in ep_i.keys():
267 ep_i[var] = math.sqrt(ep_i[var])
268 em_i[var] = math.sqrt(em_i[var])
269 if SND_MODE == "assume_mean":
270 val_i /= scalesum
271 for var in ep_i.keys():
272 ep_i[var] /= scalesum
273 em_i[var] /= scalesum
274 ao_out.point(i).setVal(dim, val_i)
275 for var in ep_i.keys():
276 # setattr(ao_out.point(i), 'set%sErrs' % axis[dim].upper(), ((ep_i[var], em_i[var]), var))
277 ao_out.point(i).setErrs(dim, (ep_i[var], em_i[var]), var)
278
279 # Add more points to the output scatter
280 elif SND_MODE == "combine":
281 for ao in aos[1:]:
282 ao_out.combineWith(ao)
283
284 # Just return the first AO unmodified & unmerged
285 elif SND_MODE == "first":
286 pass
287
288 else:
289 raise Exception("Unknown Scatter%dD merging mode:" % dim + args.SND_MODE)
290
291 # Other data types (just warn, and write out the first object)
292 else:
293 sys.stderr.write("WARNING: Analysis object %s of type %s cannot be merged\n" % (p, str(aotype)))
294
295 # Put the output AO into the output dict
296 analysisobjects_out[p] = ao_out
297
298# Write output
299yoda.writeYODA(analysisobjects_out, args.OUTPUT_FILE)