ATLAS Offline Software
Namespaces | Classes | Typedefs | Enumerations | Functions | Variables
H5Utils Namespace Reference

HDF5 Tuple Writer. More...

Namespaces

 defaults
 
 hist
 
 internal
 clssses to add type traits for H5
 
 Print
 

Classes

struct  AppOpts
 
class  Consumers
 
class  DefaultMerger
 
class  IH5Merger
 
struct  IOOpts
 
struct  TreeCopyOpts
 
class  VariableFillers
 Variable filler arrays. More...
 
class  Writer
 Writer. More...
 
struct  WriterConfiguration
 
class  WriterXd
 WriterXd. More...
 

Typedefs

template<typename T >
using CRefConsumer = Consumers< const T & >
 CRefConsumer. More...
 
template<size_t N, typename T >
using CRefWriter = Writer< N, const T & >
 CRefWriter. More...
 
template<typename T >
using SimpleWriter = Writer< 0, const T & >
 SimpleWriter. More...
 

Enumerations

enum  Compression { Compression::STANDARD, Compression::HALF_PRECISION, Compression::HALF_PRECISION_LARGE }
 

Functions

bool checkDatasetsToMerge (const H5::DataSet &target, const H5::DataSet &source, hsize_t mergeAxis)
 Make sure that two datasets can be merged. More...
 
bool checkDatasetsToMerge (const H5::DataSet &target, const H5::DataSet &source, hsize_t mergeAxis, std::string &errMsg)
 Make sure that two datasets can be merged. More...
 
void mergeDatasets (H5::DataSet &target, const H5::DataSet &source, hsize_t mergeAxis, std::size_t bufferSize=-1)
 Merge two datasets. More...
 
H5::DataSet createDataSet (H5::H5Location &targetLocation, const H5::DataSet &source, hsize_t mergeAxis, int chunkSize=-1, int mergeExtent=-1)
 Make a new dataset using the properties of another. More...
 
std::size_t getRowSize (const H5::DataSet &ds, hsize_t axis)
 Calculate the size of a row of a dataset in bytes. More...
 
template<size_t N, class I >
Writer< N, I > makeWriter (H5::Group &group, const std::string &name, const Consumers< I > &consumers, const std::array< hsize_t, N > &extent=internal::uniform< N >(5), hsize_t batch_size=defaults::batch_size)
 makeWriter More...
 
void copyRootTree (TTree &tt, H5::Group &fg, const TreeCopyOpts &opts)
 
std::string getTree (const std::string &file_name)
 
AppOpts getTreeCopyOpts (int argc, char *argv[])
 

Variables

const size_t CHUNK_SIZE = 128
 

Detailed Description

HDF5 Tuple Writer.

HDF5 Writer.

This is a tool to write N-dimensional arrays of compound data types to HDF5 files.

Skip down to the WriterXd and VariableFillers classes below to see the stuff that you'll have to interact with.

Skip down to the Writer and Consumers classes below to see the stuff that you'll have to interact with.

Typedef Documentation

◆ CRefConsumer

template<typename T >
using H5Utils::CRefConsumer = typedef Consumers<const T&>

CRefConsumer.

Convenience wrapper, CRefConsumer<T> is equivelent to H5Utils::Consumers<const T&>.

Definition at line 549 of file Writer.h.

◆ CRefWriter

template<size_t N, typename T >
using H5Utils::CRefWriter = typedef Writer<N, const T&>

CRefWriter.

Convenience wrapper, CRefWriter<N,T> is equivelent to H5Utils::Writer<N, const T&>.

Definition at line 559 of file Writer.h.

◆ SharedConsumer

template<typename I >
using H5Utils::SharedConsumer = typedef std::shared_ptr<internal::IDataConsumer<I> >

Consumer Class.

The elements added to this container each specify one element in the output HDF5 DataSet. You need to give each variable a name and a function that fills the variable.

Definition at line 126 of file Writer.h.

◆ SimpleWriter

template<typename T >
using H5Utils::SimpleWriter = typedef Writer<0, const T&>

SimpleWriter.

Convenience wrapper, SimpleWriter<T> is equivelent to H5Utils::Writer<0, const T&>.

Definition at line 569 of file Writer.h.

Enumeration Type Documentation

◆ Compression

enum H5Utils::Compression
strong
Enumerator
STANDARD 
HALF_PRECISION 
HALF_PRECISION_LARGE 

Definition at line 11 of file CompressionEnums.h.

Function Documentation

◆ checkDatasetsToMerge() [1/2]

bool H5Utils::checkDatasetsToMerge ( const H5::DataSet &  target,
const H5::DataSet &  source,
hsize_t  mergeAxis 
)

Make sure that two datasets can be merged.

Parameters
targetThe dataset to merge into
sourceThe dataset to merge from
mergeAxisThe axis to merged along.
Returns
False if the datasets cannot be merged

Definition at line 53 of file MergeUtils.cxx.

57  {
58  std::string sink;
59  return checkDatasetsToMerge(target, source, mergeAxis, sink);
60  }

◆ checkDatasetsToMerge() [2/2]

bool H5Utils::checkDatasetsToMerge ( const H5::DataSet &  target,
const H5::DataSet &  source,
hsize_t  mergeAxis,
std::string &  errMsg 
)

Make sure that two datasets can be merged.

Parameters
targetThe dataset to merge into
sourceThe dataset to merge from
mergeAxisThe axis to merged along.
[out]errMsgIf the datasets cannot be merged, fill this string with an explanation
Returns
False if the datasets cannot be merged

Definition at line 62 of file MergeUtils.cxx.

67  {
68  // Check that the datasets hold the same types
69  // Note that H5 *can* do type comparisons but this function assumes that we
70  // should only merge the same types
71  if (target.getDataType() != source.getDataType() ) {
72  errMsg = "Target and source datasets hold different types.";
73  return false;
74  }
75 
76  // Get the dataspaces
77  H5::DataSpace targetSpace = target.getSpace();
78  H5::DataSpace sourceSpace = source.getSpace();
79  if (!targetSpace.isSimple() || !sourceSpace.isSimple() ) {
80  errMsg = "Only simple dataspaces are understood.";
81  return false;
82  }
83 
84  // Make sure that the dataspaces have the same dimensions
85  int nDims = targetSpace.getSimpleExtentNdims();
86  if (nDims != sourceSpace.getSimpleExtentNdims() ) {
87  errMsg = "Target and source dataspaces have different dimensions, " +
88  std::to_string(nDims) + " and " +
89  std::to_string(sourceSpace.getSimpleExtentNdims() ) + " respectively";
90  return false;
91  }
92 
93  // Make sure that the merge axis fits in the dimension
94  if (nDims <= static_cast<int>(mergeAxis)) {
95  errMsg = "Dataset dimension " + std::to_string(nDims) +
96  " is not compatible with the merge axis " +
97  std::to_string(mergeAxis);
98  return false;
99  }
100 
101  // Now make sure that the extent matches
102  std::vector<hsize_t> targetDims(nDims, 0);
103  std::vector<hsize_t> maxTargetDims(nDims, 0);
104  targetSpace.getSimpleExtentDims(targetDims.data(), maxTargetDims.data() );
105  std::vector<hsize_t> sourceDims(nDims, 0);
106  sourceSpace.getSimpleExtentDims(sourceDims.data() );
107 
108  for (int ii = 0; ii < nDims; ++ii) {
109  // Skip the merge axis in this check
110  if (ii == static_cast<int>(mergeAxis) )
111  continue;
112  if (targetDims.at(ii) != sourceDims.at(ii) ) {
113  errMsg = "Target and source databases dimensions differ on axis " +
114  std::to_string(ii) + ", " + std::to_string(targetDims.at(ii) ) +
115  " and " + std::to_string(sourceDims.at(ii) ) + " respectively";
116  return false;
117  }
118  }
119 
120  // Check the maximum extent is sufficient
121  if (maxTargetDims.at(mergeAxis) < (
122  targetDims.at(mergeAxis) + sourceDims.at(mergeAxis) ) ) {
123  errMsg = "Merged dataset will not fit into target dataset";
124  return false;
125  }
126 
127  return true;
128  } //> end function checkDatasetsToMerge

◆ copyRootTree()

void H5Utils::copyRootTree ( TTree &  tt,
H5::Group &  fg,
const TreeCopyOpts opts 
)

Definition at line 125 of file copyRootTree.cxx.

125  {
126 
127  // define the buffers for root to read into
128  std::vector<std::unique_ptr<IBuffer> > buffers;
129 
130  // this keeps track of the things we couldn't read
131  std::set<std::string> skipped;
132 
133 
134  // Each `VariableFiller` must be constructed with a "filler"
135  // function (or callable object), which takes no arguments and
136  // returns the variable we want to write out. In this case they are
137  // implemented as closures over the buffers that ROOT is reading
138  // into.
139 
140  // This is the 1d variables
141  VariableFillers vars;
142  std::vector<size_t> idx_dummy;
143 
144  // These are 2d variables (i.e. vector<T> in the root file)
145  //
146  // We also need an index which the HDF5 writer increments as it
147  // fills. This is shared with the ROOT buffers to index entries in
148  // std::vectors
149  VariableFillers vars2d;
150  std::vector<size_t> idx(1,0);
151 
152  // 3d variables (index is now 2d)
153  VariableFillers vars3d;
154  std::vector<size_t> idx2(2,0);
155 
156  // Iterate over all the leaf names. There are some duplicates in the
157  // list of keys, so we have to build the set ourselves.
158  std::regex branch_filter(opts.branch_regex);
159  TIter next(tt.GetListOfLeaves());
160  TLeaf* leaf;
161  std::set<std::string> leaf_names;
162  while ((leaf = dynamic_cast<TLeaf*>(next()))) {
163  leaf_names.insert(leaf->GetName());
164  }
165  if (leaf_names.size() == 0) throw std::logic_error("no branches found");
166 
167  // Loop over all the leafs, assign buffers to each
168  //
169  // These `Buffer` classes are defined above. The buffers turn the
170  // branchs on, so we can set them all off to start.
171  tt.SetBranchStatus("*", false);
172  for (const auto& lname: leaf_names) {
173  bool keep = true;
174  if (opts.branch_regex.size() > 0) {
175  keep = std::regex_search(lname, branch_filter);
176  }
177  if (opts.verbose) {
178  std::cout << (keep ? "found " : "rejecting ") << lname << std::endl;
179  }
180  if (!keep) continue;
181 
182  leaf = tt.GetLeaf(lname.c_str());
183  std::string branchName = leaf->GetBranch()->GetName();
184  std::string leaf_type = leaf->GetTypeName();
185  if (leaf_type == "Int_t") {
186  buffers.emplace_back(new Buffer<int>(vars, tt, branchName));
187  } else if (leaf_type == "Float_t") {
188  buffers.emplace_back(new Buffer<float>(vars, tt, branchName));
189  } else if (leaf_type == "Double_t") {
190  buffers.emplace_back(new Buffer<double>(vars, tt, branchName));
191  } else if (leaf_type == "Bool_t") {
192  buffers.emplace_back(new Buffer<bool>(vars, tt, branchName));
193  } else if (leaf_type == "Long64_t") {
194  buffers.emplace_back(new Buffer<long long>(vars, tt, branchName));
195  } else if (leaf_type == "ULong64_t") {
196  buffers.emplace_back(new Buffer<unsigned long long>(vars, tt, branchName));
197  } else if (leaf_type == "UInt_t") {
198  buffers.emplace_back(new Buffer<unsigned int>(vars, tt, branchName));
199  } else if (leaf_type == "UChar_t") {
200  buffers.emplace_back(new Buffer<unsigned char>(vars, tt, branchName));
201  } else if (leaf_type == "vector<float>") {
202  buffers.emplace_back(new VBuf<float>(vars2d, idx, tt, branchName, NAN));
203  } else if (leaf_type == "vector<double>") {
204  buffers.emplace_back(new VBuf<double>(vars2d, idx, tt, branchName, NAN));
205  } else if (leaf_type == "vector<int>") {
206  buffers.emplace_back(new VBuf<int>(vars2d, idx, tt, branchName, 0));
207  } else if (leaf_type == "vector<unsigned int>") {
208  buffers.emplace_back(new VBuf<unsigned int>(vars2d, idx, tt, branchName, 0));
209  } else if (leaf_type == "vector<unsigned char>") {
210  buffers.emplace_back(new VBuf<unsigned char>(vars2d, idx, tt, branchName, 0));
211  } else if (leaf_type == "vector<bool>") {
212  buffers.emplace_back(new VBuf<bool>(vars2d, idx, tt, branchName, 0));
213  } else if (leaf_type == "vector<vector<int> >") {
214  buffers.emplace_back(new VVBuf<int>(vars3d, idx2, tt, branchName, 0));
215  } else if (leaf_type == "vector<vector<unsigned int> >") {
216  buffers.emplace_back(new VVBuf<unsigned int>(vars3d, idx2, tt, branchName, 0));
217  } else if (leaf_type == "vector<vector<unsigned char> >") {
218  buffers.emplace_back(new VVBuf<unsigned char>(vars3d, idx2, tt, branchName, 0));
219  } else if (leaf_type == "vector<vector<float> >") {
220  buffers.emplace_back(new VVBuf<float>(vars3d, idx2, tt, branchName, NAN));
221  } else if (leaf_type == "vector<vector<double> >") {
222  buffers.emplace_back(new VVBuf<double>(vars3d, idx2, tt, branchName, NAN));
223  } else if (leaf_type == "vector<vector<bool> >") {
224  buffers.emplace_back(new VVBuf<bool>(vars3d, idx2, tt, branchName, 0));
225  } else {
226  skipped.insert(leaf_type);
227  }
228  }
229 
230  // Build HDF5 Outputs
231  //
232  // In the simple case where we're not reading vectors, we store one
233  // dataset with the same name as the tree. If there are vectors, we
234  // instead create a group with the same name as the tree, and name
235  // the datasets 1d, 2d, etc.
236  //
237  const std::string tree_name = tt.GetName();
238 
239  std::unique_ptr<WriterXd> writer1d;
240  std::unique_ptr<WriterXd> writer2d;
241  std::unique_ptr<WriterXd> writer3d;
242  std::unique_ptr<H5::Group> top_group;
243  if (opts.vector_lengths.size() > 0) {
244  if (opts.vector_lengths.size() > 2) throw std::logic_error(
245  "we don't support outputs with rank > 3");
246  size_t length = opts.vector_lengths.at(0);
247  top_group.reset(new H5::Group(fg.createGroup(tree_name)));
248  if (opts.vector_lengths.size() > 1) {
249  size_t length2 = opts.vector_lengths.at(1);
250  if (vars3d.size() > 0) {
251  writer3d.reset(new WriterXd(*top_group, "3d", vars3d,
252  {length, length2}, opts.chunk_size));
253  }
254  }
255  if (vars2d.size() > 0) {
256  writer2d.reset(new WriterXd(*top_group, "2d", vars2d,
257  {length}, opts.chunk_size));
258  }
259  if (vars.size() > 0) {
260  writer1d.reset(new WriterXd(*top_group, "1d",
261  vars, {}, opts.chunk_size));
262  }
263  } else {
264  if (vars.size() > 0) {
265  writer1d.reset(new WriterXd(fg, tree_name, vars, {}, opts.chunk_size));
266  }
267  }
268 
269  // Main event loop
270  //
271  // Very little actually happens here since the buffers are already
272  // defined, as are the HDF5 reader functions.
273  //
274 
275  // Get the selection string and build a new TTreeFormula
276  std::string cut_string = opts.selection;
277  const char * cut_char = cut_string.c_str();
278  TTreeFormula *cut =0;
279  if(!cut_string.empty()){
280  // This is so a cut can be applied without requiring the
281  // branch to be output to the hdf5 file.
282  tt.SetBranchStatus("*", true);
283  cut = new TTreeFormula("selection", cut_char, &tt);
284  }
285 
286  size_t n_entries = tt.GetEntries();
287  if (opts.n_entries) n_entries = std::min(n_entries, opts.n_entries);
288  int print_interval = opts.print_interval;
289  if (print_interval == -1) {
290  print_interval = std::max(1UL, n_entries / 100);
291  }
292 
293  for (size_t iii = 0; iii < n_entries; iii++) {
294  if (print_interval && (iii % print_interval == 0)) {
295  std::cout << "events processed: " << iii
296  << " (" << std::round(iii*1e2 / n_entries) << "% of "
297  << n_entries << ")" << std::endl;
298  }
299  tt.GetEntry(iii);
300  if(cut) cut->UpdateFormulaLeaves();
301  if (!passTTreeCut(cut)) continue;
302  if (writer1d) writer1d->fillWhileIncrementing(idx_dummy);
303  if (writer2d) writer2d->fillWhileIncrementing(idx);
304  if (writer3d) writer3d->fillWhileIncrementing(idx2);
305  }
306 
307  // Flush the memory buffers on the HDF5 side. (This is done by the
308  // destructor automatically, but we do it here to make any errors
309  // more explicit.)
310  if (writer1d) writer1d->flush();
311  if (writer2d) writer2d->flush();
312  if (writer3d) writer3d->flush();
313 
314  // Print the names of any classes that we were't able to read from
315  // the root file.
316  if (opts.verbose) {
317  for (const auto& name: skipped) {
318  std::cerr << "could not read branch of type " << name << std::endl;
319  }
320  }
321  } // end copyRootTree

◆ createDataSet()

H5::DataSet H5Utils::createDataSet ( H5::H5Location &  targetLocation,
const H5::DataSet &  source,
hsize_t  mergeAxis,
int  chunkSize = -1,
int  mergeExtent = -1 
)

Make a new dataset using the properties of another.

Parameters
targetLocationThe location to place the new dataset
sourceThe dataset to create from
mergeAxisThe axis to merge along
chunkSizeThe chunk size to use. If negative then the chunk size from the source is used.
mergeExtentThe maximum extent to allow along the merge axis. -1 means unlimited.

This will not merge the source dataset into the new one!

Definition at line 222 of file MergeUtils.cxx.

228  {
229  H5::DataSpace sourceSpace = source.getSpace();
230  // Get the new extent
231  std::vector<hsize_t> DSExtent(sourceSpace.getSimpleExtentNdims(), 0);
232  sourceSpace.getSimpleExtentDims(DSExtent.data() );
233  // Set the merge axis to be 0 length to begin with
234  DSExtent.at(mergeAxis) = 0;
235  std::vector<hsize_t> maxDSExtent = DSExtent;
236  maxDSExtent.at(mergeAxis) = mergeExtent;
237 
238  // Get the existing dataset creation properties
239  H5::DSetCreatPropList cList = source.getCreatePlist();
240  if (chunkSize > 0) {
241  std::vector<hsize_t> chunks = DSExtent;
242  chunks.at(mergeAxis) = chunkSize;
243  cList.setChunk(chunks.size(), chunks.data() );
244  }
245 
246  // Create the new space
247  H5::DataSpace space(DSExtent.size(), DSExtent.data(), maxDSExtent.data());
248  // This does nothing with the acc property list because I don't know
249  // what it is
250  return targetLocation.createDataSet(
251  source.getObjName(), source.getDataType(), space, cList);
252  }

◆ getRowSize()

std::size_t H5Utils::getRowSize ( const H5::DataSet &  ds,
hsize_t  axis 
)

Calculate the size of a row of a dataset in bytes.

Parameters
dsThe dataset to use
axisThe axis that the row is orthogonal to

A row is the hyperplane orthogonal to the axis. This will throw an overflow error if the row size overflows a std::size_t. This is rather unlikely because that means that there wouldn't be enough memory addresses to hold a single row in memory!

Definition at line 254 of file MergeUtils.cxx.

254  {
255  // The size of one element
256  std::size_t eleSize = ds.getDataType().getSize();
257 
258  // The dimensions of the space
259  H5::DataSpace space = ds.getSpace();
260  std::vector<hsize_t> spaceDims(space.getSimpleExtentNdims(), 0);
261  space.getSimpleExtentDims(spaceDims.data() );
262 
263  std::size_t nRowElements = 1;
264  for (std::size_t ii = 0; ii < spaceDims.size(); ++ii)
265  if (ii != axis)
266  nRowElements *= spaceDims.at(ii);
267 
268  // Double check that this fits. This is probably over cautious but fine...
269  if (std::size_t(-1) / nRowElements < eleSize)
270  throw std::overflow_error("The size of one row would overflow the register!");
271 
272  return eleSize * nRowElements;
273  }

◆ getTree()

std::string H5Utils::getTree ( const std::string &  file_name)

Definition at line 36 of file getTree.cxx.

36  {
37  if (!exists(file_name) && !is_remote(file_name)) {
38  throw std::logic_error(file_name + " doesn't exist");
39  }
40  std::unique_ptr<TFile> file(TFile::Open(file_name.c_str()));
41  if (!file || !file->IsOpen() || file->IsZombie()) {
42  throw std::logic_error("can't open " + file_name);
43  }
44  std::set<std::string> keys;
45  int n_keys = file->GetListOfKeys()->GetSize();
46  if (n_keys == 0) {
47  throw std::logic_error("no keys found in file");
48  }
49  for (int keyn = 0; keyn < n_keys; keyn++) {
50  keys.insert(file->GetListOfKeys()->At(keyn)->GetName());
51  }
52  size_t n_unique = keys.size();
53  if (n_unique > 1) {
54  std::string prob = "Can't decide which tree to use, choose one of {";
55  size_t uniq_n = 0;
56  for (const auto& key: keys) {
57  prob.append(key);
58  uniq_n++;
59  if (uniq_n < n_unique) prob.append(", ");
60  }
61  prob.append("} with the --tree-name option");
62  throw std::logic_error(prob);
63  }
64  auto* key = dynamic_cast<TKey*>(file->GetListOfKeys()->At(0));
65  std::string name = key->GetName();
66  file->Close();
67  return name;
68  }

◆ getTreeCopyOpts()

AppOpts H5Utils::getTreeCopyOpts ( int  argc,
char *  argv[] 
)

Definition at line 12 of file treeCopyOpts.cxx.

13  {
14  namespace po = boost::program_options;
15  AppOpts app;
16  std::string usage = "usage: " + std::string(argv[0]) + " <files>..."
17  + " -o <output> [-h] [opts...]\n";
18  po::options_description opt(usage + "\nConvert a root tree to HDF5");
19  opt.add_options()
20  ("in-file",
21  po::value(&app.file.in)->required()->multitoken(),
22  "input file name")
23  ("out-file,o",
24  po::value(&app.file.out)->required(),
25  "output file name")
26  ("tree-name,t",
27  po::value(&app.file.tree)->default_value("", "found"),
28  "tree to use, use whatever is there by default (or crash if multiple)")
29  ("help,h", "Print help messages")
30  ("branch-regex,r",
31  po::value(&app.tree.branch_regex)->default_value(""),
32  "regex to filter branches")
33  ("vector-lengths,l",
34  po::value(&app.tree.vector_lengths)->multitoken()->value_name("args..."),
35  "max size of vectors to write")
36  ("verbose,v",
37  po::bool_switch(&app.tree.verbose),
38  "print branches copied")
39  ("n-entries,n",
40  po::value(&app.tree.n_entries)->default_value(0, "all")->implicit_value(1),
41  "number of entries to copy")
42  ("chunk-size,c",
43  po::value(&app.tree.chunk_size)->default_value(CHUNK_SIZE),
44  "chunk size in HDF5 file")
45  ("selection,s",
46  po::value(&app.tree.selection)->default_value(""),
47  "selection string applied to ntuples")
48  ("print-interval,p",
49  po::value(&app.tree.print_interval)->default_value(0, "never")->implicit_value(-1, "1%"),
50  "print progress")
51 
52  ;
53  po::positional_options_description pos_opts;
54  pos_opts.add("in-file", -1);
55 
56  po::variables_map vm;
57  try {
58  po::store(po::command_line_parser(argc, argv).options(opt)
59  .positional(pos_opts).run(), vm);
60  if ( vm.count("help") ) {
61  std::cout << opt << std::endl;
62  app.exit_code = 1;
63  }
64  po::notify(vm);
65  } catch (po::error& err) {
66  std::cerr << usage << "ERROR: " << err.what() << std::endl;
67  app.exit_code = 1;
68  }
69  return app;
70  }

◆ makeWriter()

template<size_t N, class I >
Writer<N,I> H5Utils::makeWriter ( H5::Group &  group,
const std::string &  name,
const Consumers< I > &  consumers,
const std::array< hsize_t, N > &  extent = internal::uniform<N>(5),
hsize_t  batch_size = defaults::batch_size 
)

makeWriter

Convenience function to make a writer from an existing list of Consumers. Allows you to deduce the input type from consumers.

To be used like

auto writer = H5Utils::makeWriter<2>(group, name, consumers);

Definition at line 534 of file Writer.h.

538  {
539  return Writer<N,I>(group, name, consumers, extent, batch_size);
540  }

◆ mergeDatasets()

void H5Utils::mergeDatasets ( H5::DataSet &  target,
const H5::DataSet &  source,
hsize_t  mergeAxis,
std::size_t  bufferSize = -1 
)

Merge two datasets.

Parameters
targetThe dataset to merge into
sourceThe dataset to merge from
mergeAxisThe axis to merged along.
bufferSizeThe maximum size of the buffer to use. Take care when setting this, if it is too large then the job may run into memory issues! This size is measured in bytes.

Note that this does nothing to dataset attributes. This function ignores the chunking of the source and target datasets, only splitting up the source dataset along the merge axis.

Definition at line 130 of file MergeUtils.cxx.

135  {
136  std::string errMsg;
137  if (!checkDatasetsToMerge(target, source, mergeAxis, errMsg) )
138  throw std::invalid_argument(errMsg);
139 
140  // Get information about the target and source datasets
141  H5::DataSpace targetSpace = target.getSpace();
142  H5::DataSpace sourceSpace = source.getSpace();
143  int nDims = targetSpace.getSimpleExtentNdims();
144 
145  // Now make sure that the extent matches
146  std::vector<hsize_t> targetDims(nDims, 0);
147  targetSpace.getSimpleExtentDims(targetDims.data() );
148  std::vector<hsize_t> sourceDims(nDims, 0);
149  sourceSpace.getSimpleExtentDims(sourceDims.data() );
150 
151  // Start by extending the target dataset
152  std::vector<hsize_t> newDims = targetDims;
153  newDims.at(mergeAxis) += sourceDims.at(mergeAxis);
154  target.extend(newDims.data() );
155  targetSpace.setExtentSimple(newDims.size(), newDims.data() );
156 
157  // Now we need to work out how far we need to subdivide the source dataset
158  // to fit it inside the buffer.
159  std::size_t rowSize = getRowSize(source, mergeAxis);
160  // How many rows can we fit into one buffer
161  std::size_t nRowsBuffer = bufferSize / rowSize;
162  if (nRowsBuffer == 0)
163  throw std::invalid_argument(
164  "Allocated buffer is smaller than a single row! Merging is impossible.");
165 
166  // We have to allocate an area in memory for the buffer. Unlike normally in
167  // C++ we aren't allocating a space for an object but a specific size. This
168  // means that we have to use malloc.
169  // Smart pointers require some annoying syntax to use with malloc, but we
170  // can implement the same pattern with a simple struct.
171  SmartMalloc buffer;
172 
173  // Keep track of the offset from the target dataset
174  std::vector<hsize_t> targetOffset(nDims, 0);
175  // Start it from its end point before we extended it
176  targetOffset.at(mergeAxis) = targetDims.at(mergeAxis);
177 
178  // Step through the source dataset in increments equal to the number of
179  // source rows that can fit into the buffer.
180  std::size_t nSourceRows = sourceDims.at(mergeAxis);
181  for (std::size_t iRow = 0; iRow < nSourceRows; iRow += nRowsBuffer) {
182  // Construct the size and offset of the source slab
183  std::vector<hsize_t> sourceOffset(nDims, 0);
184  sourceOffset.at(mergeAxis) = iRow;
185  // The number of rows to write
186  std::size_t nRowsToWrite = std::min(nSourceRows-iRow, nRowsBuffer);
187  std::vector<hsize_t> sourceSize(sourceDims);
188  sourceSize.at(mergeAxis) = nRowsToWrite;
189  // Create the source hyperslab
190  sourceSpace.selectNone();
191  sourceSpace.selectHyperslab(
192  H5S_SELECT_SET,
193  sourceSize.data(),
194  sourceOffset.data() );
195 
196  // Create the target hyperslab
197  targetSpace.selectNone();
198  targetSpace.selectHyperslab(
199  H5S_SELECT_SET,
200  sourceSize.data(),
201  targetOffset.data() );
202 
203  H5::DataSpace memorySpace(sourceSize.size(), sourceSize.data() );
204  memorySpace.selectAll();
205 
206  // Prepare the buffer
207  buffer.allocate(nRowsToWrite*rowSize);
208  // Read into it
209  source.read(buffer.data, source.getDataType(), memorySpace, sourceSpace);
210  // Write from it
211  target.write(buffer.data, target.getDataType(), memorySpace, targetSpace);
212  // Increment the target offset
213  targetOffset.at(mergeAxis) += nRowsToWrite;
214  }
215  // Sanity check - make sure that the final targetOffset is where we think it
216  // should be
217  if (targetOffset.at(mergeAxis) != newDims.at(mergeAxis) )
218  throw std::logic_error(
219  "Target dataset was not filled! This indicates a logic error in the code!");
220  }

Variable Documentation

◆ CHUNK_SIZE

const size_t H5Utils::CHUNK_SIZE = 128

Definition at line 15 of file treeCopyOpts.h.

AtlCoolConsole.usage
tuple usage
Definition: AtlCoolConsole.py:443
H5Utils::CHUNK_SIZE
const size_t CHUNK_SIZE
Definition: treeCopyOpts.h:15
store
StoreGateSvc * store
Definition: fbtTestBasics.cxx:71
checkxAOD.ds
ds
Definition: Tools/PyUtils/bin/checkxAOD.py:258
TrigDefs::Group
Group
Properties of a chain group.
Definition: GroupProperties.h:13
max
#define max(a, b)
Definition: cfImp.cxx:41
make_coralServer_rep.opt
opt
Definition: make_coralServer_rep.py:19
H5Utils::Compression::STANDARD
@ STANDARD
yodamerge_tmp.axis
list axis
Definition: yodamerge_tmp.py:241
MuonGM::round
float round(const float toRound, const unsigned int decimals)
Definition: Mdt.cxx:27
run
int run(int argc, char *argv[])
Definition: ttree2hdf5.cxx:28
athena.value
value
Definition: athena.py:124
H5Utils::defaults::batch_size
const hsize_t batch_size
Definition: defaults.h:9
covarianceTool.prob
prob
Definition: covarianceTool.py:678
AthExHiveOpts.chunkSize
chunkSize
Definition: AthExHiveOpts.py:101
H5Utils::Compression::HALF_PRECISION
@ HALF_PRECISION
PrepareReferenceFile.regex
regex
Definition: PrepareReferenceFile.py:43
physics_parameters.file_name
string file_name
Definition: physics_parameters.py:32
createCoolChannelIdFile.buffer
buffer
Definition: createCoolChannelIdFile.py:12
H5Utils::Writer
Writer.
Definition: Writer.h:349
dqt_zlumi_pandas.err
err
Definition: dqt_zlumi_pandas.py:182
fillPileUpNoiseLumi.next
next
Definition: fillPileUpNoiseLumi.py:52
ReadCalibFromCool.keep
keep
Definition: ReadCalibFromCool.py:85
LArCellNtuple.argv
argv
Definition: LArCellNtuple.py:152
BindingsTest.cut
cut
This script demonstrates how to call a C++ class from Python Also how to use PyROOT is shown.
Definition: BindingsTest.py:13
file
TFile * file
Definition: tile_monitor.h:29
python.AtlRunQueryLib.options
options
Definition: AtlRunQueryLib.py:379
DQHistogramMergeRegExp.argc
argc
Definition: DQHistogramMergeRegExp.py:20
min
#define min(a, b)
Definition: cfImp.cxx:40
H5Utils::Compression::HALF_PRECISION_LARGE
@ HALF_PRECISION_LARGE
name
std::string name
Definition: Control/AthContainers/Root/debug.cxx:221
ActsTrk::to_string
std::string to_string(const DetectorType &type)
Definition: GeometryDefs.h:34
H5Utils::getRowSize
std::size_t getRowSize(const H5::DataSet &ds, hsize_t axis)
Calculate the size of a row of a dataset in bytes.
Definition: MergeUtils.cxx:254
Buffer
Definition: trigbs_orderedMerge.cxx:114
H5Utils::checkDatasetsToMerge
bool checkDatasetsToMerge(const H5::DataSet &target, const H5::DataSet &source, hsize_t mergeAxis)
Make sure that two datasets can be merged.
Definition: MergeUtils.cxx:53
CaloLCW_tf.group
group
Definition: CaloLCW_tf.py:28
egammaEnergyPositionAllSamples::e2
double e2(const xAOD::CaloCluster &cluster)
return the uncorrected cluster energy in 2nd sampling
copySelective.target
string target
Definition: copySelective.py:37
LArNewCalib_DelayDump_OFC_Cali.idx
idx
Definition: LArNewCalib_DelayDump_OFC_Cali.py:69
copySelective.source
string source
Definition: copySelective.py:32
python.Bindings.keys
keys
Definition: Control/AthenaPython/python/Bindings.py:798
python.dummyaccess.exists
def exists(filename)
Definition: dummyaccess.py:9
get_generator_info.error
error
Definition: get_generator_info.py:40
athena.opts
opts
Definition: athena.py:88
passTTreeCut
bool passTTreeCut(TTreeFormula &cut)
TileDCSDataPlotter.tt
tt
Definition: TileDCSDataPlotter.py:874
length
double length(const pvec &v)
Definition: FPGATrackSimLLPDoubletHoughTransformTool.cxx:26
mapkey::key
key
Definition: TElectronEfficiencyCorrectionTool.cxx:37