ATLAS Offline Software
Loading...
Searching...
No Matches
H5Utils Namespace Reference

HDF5 Tuple Writer. More...

Namespaces

namespace  defaults
namespace  hist
namespace  internal
 clssses to add type traits for H5
namespace  Print

Classes

struct  AppOpts
class  Consumers
class  DefaultMerger
class  IH5Merger
struct  IOOpts
struct  TreeCopyOpts
class  VariableFillers
 Variable filler arrays. More...
class  Writer
 Writer. More...
struct  WriterConfiguration
class  WriterXd
 WriterXd. More...

Typedefs

template<typename T>
using CRefConsumer = Consumers<const T&>
 CRefConsumer.
template<size_t N, typename T>
using CRefWriter = Writer<N, const T&>
 CRefWriter.
template<typename T>
using SimpleWriter = Writer<0, const T&>
 SimpleWriter.
template<typename I>
using SharedConsumer = std::shared_ptr<internal::IDataConsumer<I> >
 Consumer Class.

Enumerations

enum class  Compression { STANDARD , HALF_PRECISION , HALF_PRECISION_LARGE }

Functions

bool checkDatasetsToMerge (const H5::DataSet &target, const H5::DataSet &source, hsize_t mergeAxis)
 Make sure that two datasets can be merged.
bool checkDatasetsToMerge (const H5::DataSet &target, const H5::DataSet &source, hsize_t mergeAxis, std::string &errMsg)
 Make sure that two datasets can be merged.
void mergeDatasets (H5::DataSet &target, const H5::DataSet &source, hsize_t mergeAxis, std::size_t bufferSize=-1)
 Merge two datasets.
H5::DataSet createDataSet (H5::H5Location &targetLocation, const H5::DataSet &source, hsize_t mergeAxis, int chunkSize=-1, int mergeExtent=-1)
 Make a new dataset using the properties of another.
std::size_t getRowSize (const H5::DataSet &ds, hsize_t axis)
 Calculate the size of a row of a dataset in bytes.
template<size_t N, class I>
Writer< N, ImakeWriter (H5::Group &group, const std::string &name, const Consumers< I > &consumers, const std::array< hsize_t, N > &extent=internal::uniform< N >(5), hsize_t batch_size=defaults::batch_size)
 makeWriter
void copyRootTree (TTree &tt, H5::Group &fg, const TreeCopyOpts &opts)
std::string getTree (const std::string &file_name)
AppOpts getTreeCopyOpts (int argc, char *argv[])

Variables

const size_t CHUNK_SIZE = 128

Detailed Description

HDF5 Tuple Writer.

HDF5 Writer.

This is a tool to write N-dimensional arrays of compound data types to HDF5 files.

Skip down to the WriterXd and VariableFillers classes below to see the stuff that you'll have to interact with.

Skip down to the Writer and Consumers classes below to see the stuff that you'll have to interact with.

Typedef Documentation

◆ CRefConsumer

template<typename T>
using H5Utils::CRefConsumer = Consumers<const T&>

CRefConsumer.

Convenience wrapper, CRefConsumer<T> is equivelent to H5Utils::Consumers<const T&>.

Definition at line 550 of file Writer.h.

◆ CRefWriter

template<size_t N, typename T>
using H5Utils::CRefWriter = Writer<N, const T&>

CRefWriter.

Convenience wrapper, CRefWriter<N,T> is equivelent to H5Utils::Writer<N, const T&>.

Definition at line 560 of file Writer.h.

◆ SharedConsumer

template<typename I>
using H5Utils::SharedConsumer = std::shared_ptr<internal::IDataConsumer<I> >

Consumer Class.

The elements added to this container each specify one element in the output HDF5 DataSet. You need to give each variable a name and a function that fills the variable.

Definition at line 126 of file Writer.h.

◆ SimpleWriter

template<typename T>
using H5Utils::SimpleWriter = Writer<0, const T&>

SimpleWriter.

Convenience wrapper, SimpleWriter<T> is equivelent to H5Utils::Writer<0, const T&>.

Definition at line 570 of file Writer.h.

Enumeration Type Documentation

◆ Compression

enum class H5Utils::Compression
strong
Enumerator
STANDARD 
HALF_PRECISION 
HALF_PRECISION_LARGE 

Definition at line 11 of file CompressionEnums.h.

Function Documentation

◆ checkDatasetsToMerge() [1/2]

bool H5Utils::checkDatasetsToMerge ( const H5::DataSet & target,
const H5::DataSet & source,
hsize_t mergeAxis )

Make sure that two datasets can be merged.

Parameters
targetThe dataset to merge into
sourceThe dataset to merge from
mergeAxisThe axis to merged along.
Returns
False if the datasets cannot be merged

Definition at line 53 of file MergeUtils.cxx.

57 {
58 std::string sink;
59 return checkDatasetsToMerge(target, source, mergeAxis, sink);
60 }
bool checkDatasetsToMerge(const H5::DataSet &target, const H5::DataSet &source, hsize_t mergeAxis)
Make sure that two datasets can be merged.

◆ checkDatasetsToMerge() [2/2]

bool H5Utils::checkDatasetsToMerge ( const H5::DataSet & target,
const H5::DataSet & source,
hsize_t mergeAxis,
std::string & errMsg )

Make sure that two datasets can be merged.

Parameters
targetThe dataset to merge into
sourceThe dataset to merge from
mergeAxisThe axis to merged along.
[out]errMsgIf the datasets cannot be merged, fill this string with an explanation
Returns
False if the datasets cannot be merged

Definition at line 62 of file MergeUtils.cxx.

67 {
68 // Check that the datasets hold the same types
69 // Note that H5 *can* do type comparisons but this function assumes that we
70 // should only merge the same types
71 if (target.getDataType() != source.getDataType() ) {
72 errMsg = "Target and source datasets hold different types.";
73 return false;
74 }
75
76 // Get the dataspaces
77 H5::DataSpace targetSpace = target.getSpace();
78 H5::DataSpace sourceSpace = source.getSpace();
79 if (!targetSpace.isSimple() || !sourceSpace.isSimple() ) {
80 errMsg = "Only simple dataspaces are understood.";
81 return false;
82 }
83
84 // Make sure that the dataspaces have the same dimensions
85 int nDims = targetSpace.getSimpleExtentNdims();
86 if (nDims != sourceSpace.getSimpleExtentNdims() ) {
87 errMsg = "Target and source dataspaces have different dimensions, " +
88 std::to_string(nDims) + " and " +
89 std::to_string(sourceSpace.getSimpleExtentNdims() ) + " respectively";
90 return false;
91 }
92
93 // Make sure that the merge axis fits in the dimension
94 if (nDims <= static_cast<int>(mergeAxis)) {
95 errMsg = "Dataset dimension " + std::to_string(nDims) +
96 " is not compatible with the merge axis " +
97 std::to_string(mergeAxis);
98 return false;
99 }
100
101 // Now make sure that the extent matches
102 std::vector<hsize_t> targetDims(nDims, 0);
103 std::vector<hsize_t> maxTargetDims(nDims, 0);
104 targetSpace.getSimpleExtentDims(targetDims.data(), maxTargetDims.data() );
105 std::vector<hsize_t> sourceDims(nDims, 0);
106 sourceSpace.getSimpleExtentDims(sourceDims.data() );
107
108 for (int ii = 0; ii < nDims; ++ii) {
109 // Skip the merge axis in this check
110 if (ii == static_cast<int>(mergeAxis) )
111 continue;
112 if (targetDims.at(ii) != sourceDims.at(ii) ) {
113 errMsg = "Target and source databases dimensions differ on axis " +
114 std::to_string(ii) + ", " + std::to_string(targetDims.at(ii) ) +
115 " and " + std::to_string(sourceDims.at(ii) ) + " respectively";
116 return false;
117 }
118 }
119
120 // Check the maximum extent is sufficient
121 if (maxTargetDims.at(mergeAxis) < (
122 targetDims.at(mergeAxis) + sourceDims.at(mergeAxis) ) ) {
123 errMsg = "Merged dataset will not fit into target dataset";
124 return false;
125 }
126
127 return true;
128 } //> end function checkDatasetsToMerge

◆ copyRootTree()

void H5Utils::copyRootTree ( TTree & tt,
H5::Group & fg,
const TreeCopyOpts & opts )

Definition at line 126 of file copyRootTree.cxx.

126 {
127
128 // define the buffers for root to read into
129 std::vector<std::unique_ptr<IBuffer> > buffers;
130
131 // this keeps track of the things we couldn't read
132 std::set<std::string> skipped;
133
134
135 // Each `VariableFiller` must be constructed with a "filler"
136 // function (or callable object), which takes no arguments and
137 // returns the variable we want to write out. In this case they are
138 // implemented as closures over the buffers that ROOT is reading
139 // into.
140
141 // This is the 1d variables
142 VariableFillers vars;
143 std::vector<size_t> idx_dummy;
144
145 // These are 2d variables (i.e. vector<T> in the root file)
146 //
147 // We also need an index which the HDF5 writer increments as it
148 // fills. This is shared with the ROOT buffers to index entries in
149 // std::vectors
150 VariableFillers vars2d;
151 std::vector<size_t> idx(1,0);
152
153 // 3d variables (index is now 2d)
154 VariableFillers vars3d;
155 std::vector<size_t> idx2(2,0);
156
157 // Iterate over all the leaf names. There are some duplicates in the
158 // list of keys, so we have to build the set ourselves.
159 std::regex branch_filter(opts.branch_regex);
160 TIter next(tt.GetListOfLeaves());
161 TLeaf* leaf;
162 std::set<std::string> leaf_names;
163 while ((leaf = dynamic_cast<TLeaf*>(next()))) {
164 leaf_names.insert(leaf->GetName());
165 }
166 if (leaf_names.size() == 0) throw std::logic_error("no branches found");
167
168 // Loop over all the leafs, assign buffers to each
169 //
170 // These `Buffer` classes are defined above. The buffers turn the
171 // branchs on, so we can set them all off to start.
172 tt.SetBranchStatus("*", false);
173 for (const auto& lname: leaf_names) {
174 bool keep = true;
175 if (opts.branch_regex.size() > 0) {
176 keep = std::regex_search(lname, branch_filter);
177 }
178 if (opts.verbose) {
179 std::cout << (keep ? "found " : "rejecting ") << lname << std::endl;
180 }
181 if (!keep) continue;
182
183 leaf = tt.GetLeaf(lname.c_str());
184 std::string branchName = leaf->GetBranch()->GetName();
185 std::string leaf_type = leaf->GetTypeName();
186 if (leaf_type == "Int_t") {
187 buffers.emplace_back(new Buffer<int>(vars, tt, branchName));
188 } else if (leaf_type == "Float_t") {
189 buffers.emplace_back(new Buffer<float>(vars, tt, branchName));
190 } else if (leaf_type == "Double_t") {
191 buffers.emplace_back(new Buffer<double>(vars, tt, branchName));
192 } else if (leaf_type == "Bool_t") {
193 buffers.emplace_back(new Buffer<bool>(vars, tt, branchName));
194 } else if (leaf_type == "Long64_t") {
195 buffers.emplace_back(new Buffer<long long>(vars, tt, branchName));
196 } else if (leaf_type == "ULong64_t") {
197 buffers.emplace_back(new Buffer<unsigned long long>(vars, tt, branchName));
198 } else if (leaf_type == "UInt_t") {
199 buffers.emplace_back(new Buffer<unsigned int>(vars, tt, branchName));
200 } else if (leaf_type == "UChar_t") {
201 buffers.emplace_back(new Buffer<unsigned char>(vars, tt, branchName));
202 } else if (leaf_type == "vector<float>") {
203 buffers.emplace_back(new VBuf<float>(vars2d, idx, tt, branchName, NAN));
204 } else if (leaf_type == "vector<double>") {
205 buffers.emplace_back(new VBuf<double>(vars2d, idx, tt, branchName, NAN));
206 } else if (leaf_type == "vector<int>") {
207 buffers.emplace_back(new VBuf<int>(vars2d, idx, tt, branchName, 0));
208 } else if (leaf_type == "vector<unsigned int>") {
209 buffers.emplace_back(new VBuf<unsigned int>(vars2d, idx, tt, branchName, 0));
210 } else if (leaf_type == "vector<unsigned char>") {
211 buffers.emplace_back(new VBuf<unsigned char>(vars2d, idx, tt, branchName, 0));
212 } else if (leaf_type == "vector<bool>") {
213 buffers.emplace_back(new VBuf<bool>(vars2d, idx, tt, branchName, 0));
214 } else if (leaf_type == "vector<vector<int> >") {
215 buffers.emplace_back(new VVBuf<int>(vars3d, idx2, tt, branchName, 0));
216 } else if (leaf_type == "vector<vector<unsigned int> >") {
217 buffers.emplace_back(new VVBuf<unsigned int>(vars3d, idx2, tt, branchName, 0));
218 } else if (leaf_type == "vector<vector<unsigned char> >") {
219 buffers.emplace_back(new VVBuf<unsigned char>(vars3d, idx2, tt, branchName, 0));
220 } else if (leaf_type == "vector<vector<float> >") {
221 buffers.emplace_back(new VVBuf<float>(vars3d, idx2, tt, branchName, NAN));
222 } else if (leaf_type == "vector<vector<double> >") {
223 buffers.emplace_back(new VVBuf<double>(vars3d, idx2, tt, branchName, NAN));
224 } else if (leaf_type == "vector<vector<bool> >") {
225 buffers.emplace_back(new VVBuf<bool>(vars3d, idx2, tt, branchName, 0));
226 } else {
227 skipped.insert(std::move(leaf_type));
228 }
229 }
230
231 // Build HDF5 Outputs
232 //
233 // In the simple case where we're not reading vectors, we store one
234 // dataset with the same name as the tree. If there are vectors, we
235 // instead create a group with the same name as the tree, and name
236 // the datasets 1d, 2d, etc.
237 //
238 const std::string tree_name = tt.GetName();
239
240 std::unique_ptr<WriterXd> writer1d;
241 std::unique_ptr<WriterXd> writer2d;
242 std::unique_ptr<WriterXd> writer3d;
243 std::unique_ptr<H5::Group> top_group;
244 if (opts.vector_lengths.size() > 0) {
245 if (opts.vector_lengths.size() > 2) throw std::logic_error(
246 "we don't support outputs with rank > 3");
247 size_t length = opts.vector_lengths.at(0);
248 top_group.reset(new H5::Group(fg.createGroup(tree_name)));
249 if (opts.vector_lengths.size() > 1) {
250 size_t length2 = opts.vector_lengths.at(1);
251 if (vars3d.size() > 0) {
252 writer3d.reset(new WriterXd(*top_group, "3d", std::move(vars3d),
253 {length, length2}, opts.chunk_size));
254 }
255 }
256 if (vars2d.size() > 0) {
257 writer2d.reset(new WriterXd(*top_group, "2d", std::move(vars2d),
258 {length}, opts.chunk_size));
259 }
260 if (vars.size() > 0) {
261 writer1d.reset(new WriterXd(*top_group, "1d",
262 std::move(vars), {}, opts.chunk_size));
263 }
264 } else {
265 if (vars.size() > 0) {
266 writer1d.reset(new WriterXd(fg, tree_name, std::move(vars), {}, opts.chunk_size));
267 }
268 }
269
270 // Main event loop
271 //
272 // Very little actually happens here since the buffers are already
273 // defined, as are the HDF5 reader functions.
274 //
275
276 // Get the selection string and build a new TTreeFormula
277 std::string cut_string = opts.selection;
278 const char * cut_char = cut_string.c_str();
279 TTreeFormula *cut =0;
280 if(!cut_string.empty()){
281 // This is so a cut can be applied without requiring the
282 // branch to be output to the hdf5 file.
283 tt.SetBranchStatus("*", true);
284 cut = new TTreeFormula("selection", cut_char, &tt);
285 }
286
287 size_t n_entries = tt.GetEntries();
288 if (opts.n_entries) n_entries = std::min(n_entries, opts.n_entries);
289 int print_interval = opts.print_interval;
290 if (print_interval == -1) {
291 print_interval = std::max(1UL, n_entries / 100);
292 }
293
294 for (size_t iii = 0; iii < n_entries; iii++) {
295 if (print_interval && (iii % print_interval == 0)) {
296 std::cout << "events processed: " << iii
297 << " (" << std::round(iii*1e2 / n_entries) << "% of "
298 << n_entries << ")" << std::endl;
299 }
300 tt.GetEntry(iii);
301 if(cut) cut->UpdateFormulaLeaves();
302 if (!passTTreeCut(cut)) continue;
303 if (writer1d) writer1d->fillWhileIncrementing(idx_dummy);
304 if (writer2d) writer2d->fillWhileIncrementing(idx);
305 if (writer3d) writer3d->fillWhileIncrementing(idx2);
306 }
307
308 // Flush the memory buffers on the HDF5 side. (This is done by the
309 // destructor automatically, but we do it here to make any errors
310 // more explicit.)
311 if (writer1d) writer1d->flush();
312 if (writer2d) writer2d->flush();
313 if (writer3d) writer3d->flush();
314
315 // Print the names of any classes that we were't able to read from
316 // the root file.
317 if (opts.verbose) {
318 for (const auto& name: skipped) {
319 std::cerr << "could not read branch of type " << name << std::endl;
320 }
321 }
322 } // end copyRootTree
double length(const pvec &v)
Variable filler arrays.
Definition HdfTuple.h:118
bool passTTreeCut(TTreeFormula &cut)
cut
This script demonstrates how to call a C++ class from Python Also how to use PyROOT is shown.

◆ createDataSet()

H5::DataSet H5Utils::createDataSet ( H5::H5Location & targetLocation,
const H5::DataSet & source,
hsize_t mergeAxis,
int chunkSize = -1,
int mergeExtent = -1 )

Make a new dataset using the properties of another.

Parameters
targetLocationThe location to place the new dataset
sourceThe dataset to create from
mergeAxisThe axis to merge along
chunkSizeThe chunk size to use. If negative then the chunk size from the source is used.
mergeExtentThe maximum extent to allow along the merge axis. -1 means unlimited.

This will not merge the source dataset into the new one!

Definition at line 222 of file MergeUtils.cxx.

228 {
229 H5::DataSpace sourceSpace = source.getSpace();
230 // Get the new extent
231 std::vector<hsize_t> DSExtent(sourceSpace.getSimpleExtentNdims(), 0);
232 sourceSpace.getSimpleExtentDims(DSExtent.data() );
233 // Set the merge axis to be 0 length to begin with
234 DSExtent.at(mergeAxis) = 0;
235 std::vector<hsize_t> maxDSExtent = DSExtent;
236 maxDSExtent.at(mergeAxis) = mergeExtent;
237
238 // Get the existing dataset creation properties
239 H5::DSetCreatPropList cList = source.getCreatePlist();
240 if (chunkSize > 0) {
241 std::vector<hsize_t> chunks = DSExtent;
242 chunks.at(mergeAxis) = chunkSize;
243 cList.setChunk(chunks.size(), chunks.data() );
244 }
245
246 // Create the new space
247 H5::DataSpace space(DSExtent.size(), DSExtent.data(), maxDSExtent.data());
248 // This does nothing with the acc property list because I don't know
249 // what it is
250 return targetLocation.createDataSet(
251 source.getObjName(), source.getDataType(), space, cList);
252 }

◆ getRowSize()

std::size_t H5Utils::getRowSize ( const H5::DataSet & ds,
hsize_t axis )

Calculate the size of a row of a dataset in bytes.

Parameters
dsThe dataset to use
axisThe axis that the row is orthogonal to

A row is the hyperplane orthogonal to the axis. This will throw an overflow error if the row size overflows a std::size_t. This is rather unlikely because that means that there wouldn't be enough memory addresses to hold a single row in memory!

Definition at line 254 of file MergeUtils.cxx.

254 {
255 // The size of one element
256 std::size_t eleSize = ds.getDataType().getSize();
257
258 // The dimensions of the space
259 H5::DataSpace space = ds.getSpace();
260 std::vector<hsize_t> spaceDims(space.getSimpleExtentNdims(), 0);
261 space.getSimpleExtentDims(spaceDims.data() );
262
263 std::size_t nRowElements = 1;
264 for (std::size_t ii = 0; ii < spaceDims.size(); ++ii)
265 if (ii != axis)
266 nRowElements *= spaceDims.at(ii);
267
268 // Double check that this fits. This is probably over cautious but fine...
269 if (std::size_t(-1) / nRowElements < eleSize)
270 throw std::overflow_error("The size of one row would overflow the register!");
271
272 return eleSize * nRowElements;
273 }

◆ getTree()

std::string H5Utils::getTree ( const std::string & file_name)

Definition at line 36 of file getTree.cxx.

36 {
37 if (!exists(file_name) && !is_remote(file_name)) {
38 throw std::logic_error(file_name + " doesn't exist");
39 }
40 std::unique_ptr<TFile> file(TFile::Open(file_name.c_str()));
41 if (!file || !file->IsOpen() || file->IsZombie()) {
42 throw std::logic_error("can't open " + file_name);
43 }
44 std::set<std::string> keys;
45 int n_keys = file->GetListOfKeys()->GetSize();
46 if (n_keys == 0) {
47 throw std::logic_error("no keys found in file");
48 }
49 for (int keyn = 0; keyn < n_keys; keyn++) {
50 keys.insert(file->GetListOfKeys()->At(keyn)->GetName());
51 }
52 size_t n_unique = keys.size();
53 if (n_unique > 1) {
54 std::string prob = "Can't decide which tree to use, choose one of {";
55 size_t uniq_n = 0;
56 for (const auto& key: keys) {
57 prob.append(key);
58 uniq_n++;
59 if (uniq_n < n_unique) prob.append(", ");
60 }
61 prob.append("} with the --tree-name option");
62 throw std::logic_error(prob);
63 }
64 auto* key = dynamic_cast<TKey*>(file->GetListOfKeys()->At(0));
65 std::string name = key->GetName();
66 file->Close();
67 return name;
68 }
bool exists(const std::string &filename)
does a file exist
TFile * file

◆ getTreeCopyOpts()

AppOpts H5Utils::getTreeCopyOpts ( int argc,
char * argv[] )

Definition at line 12 of file treeCopyOpts.cxx.

13 {
14 namespace po = boost::program_options;
15 AppOpts app;
16 std::string usage = "usage: " + std::string(argv[0]) + " <files>..."
17 + " -o <output> [-h] [opts...]\n";
18 po::options_description opt(usage + "\nConvert a root tree to HDF5");
19 opt.add_options()
20 ("in-file",
21 po::value(&app.file.in)->required()->multitoken(),
22 "input file name")
23 ("out-file,o",
24 po::value(&app.file.out)->required(),
25 "output file name")
26 ("tree-name,t",
27 po::value(&app.file.tree)->default_value("", "found"),
28 "tree to use, use whatever is there by default (or crash if multiple)")
29 ("help,h", "Print help messages")
30 ("branch-regex,r",
31 po::value(&app.tree.branch_regex)->default_value(""),
32 "regex to filter branches")
33 ("vector-lengths,l",
34 po::value(&app.tree.vector_lengths)->multitoken()->value_name("args..."),
35 "max size of vectors to write")
36 ("verbose,v",
37 po::bool_switch(&app.tree.verbose),
38 "print branches copied")
39 ("n-entries,n",
40 po::value(&app.tree.n_entries)->default_value(0, "all")->implicit_value(1),
41 "number of entries to copy")
42 ("chunk-size,c",
43 po::value(&app.tree.chunk_size)->default_value(CHUNK_SIZE),
44 "chunk size in HDF5 file")
45 ("selection,s",
46 po::value(&app.tree.selection)->default_value(""),
47 "selection string applied to ntuples")
48 ("print-interval,p",
49 po::value(&app.tree.print_interval)->default_value(0, "never")->implicit_value(-1, "1%"),
50 "print progress")
51
52 ;
53 po::positional_options_description pos_opts;
54 pos_opts.add("in-file", -1);
55
56 po::variables_map vm;
57 try {
58 po::store(po::command_line_parser(argc, argv).options(opt)
59 .positional(pos_opts).run(), vm);
60 if ( vm.count("help") ) {
61 std::cout << opt << std::endl;
62 app.exit_code = 1;
63 }
64 po::notify(vm);
65 } catch (po::error& err) {
66 std::cerr << usage << "ERROR: " << err.what() << std::endl;
67 app.exit_code = 1;
68 }
69 return app;
70 }
StatusCode usage()
int usage(std::ostream &s, int, char **argv, int status=-1)
Definition hcg.cxx:1035
const size_t CHUNK_SIZE
Definition run.py:1
TreeCopyOpts tree
std::string tree
std::vector< std::string > in
std::string out
std::string branch_regex
std::vector< size_t > vector_lengths

◆ makeWriter()

template<size_t N, class I>
Writer< N, I > H5Utils::makeWriter ( H5::Group & group,
const std::string & name,
const Consumers< I > & consumers,
const std::array< hsize_t, N > & extent = internal::uniform<N>(5),
hsize_t batch_size = defaults::batch_size )

makeWriter

Convenience function to make a writer from an existing list of Consumers. Allows you to deduce the input type from consumers.

To be used like

auto writer = H5Utils::makeWriter<2>(group, name, consumers);

Definition at line 535 of file Writer.h.

539 {
540 return Writer<N,I>(group, name, consumers, extent, batch_size);
541 }
Writer.
Definition Writer.h:350

◆ mergeDatasets()

void H5Utils::mergeDatasets ( H5::DataSet & target,
const H5::DataSet & source,
hsize_t mergeAxis,
std::size_t bufferSize = -1 )

Merge two datasets.

Parameters
targetThe dataset to merge into
sourceThe dataset to merge from
mergeAxisThe axis to merged along.
bufferSizeThe maximum size of the buffer to use. Take care when setting this, if it is too large then the job may run into memory issues! This size is measured in bytes.

Note that this does nothing to dataset attributes. This function ignores the chunking of the source and target datasets, only splitting up the source dataset along the merge axis.

Definition at line 130 of file MergeUtils.cxx.

135 {
136 std::string errMsg;
137 if (!checkDatasetsToMerge(target, source, mergeAxis, errMsg) )
138 throw std::invalid_argument(errMsg);
139
140 // Get information about the target and source datasets
141 H5::DataSpace targetSpace = target.getSpace();
142 H5::DataSpace sourceSpace = source.getSpace();
143 int nDims = targetSpace.getSimpleExtentNdims();
144
145 // Now make sure that the extent matches
146 std::vector<hsize_t> targetDims(nDims, 0);
147 targetSpace.getSimpleExtentDims(targetDims.data() );
148 std::vector<hsize_t> sourceDims(nDims, 0);
149 sourceSpace.getSimpleExtentDims(sourceDims.data() );
150
151 // Start by extending the target dataset
152 std::vector<hsize_t> newDims = targetDims;
153 newDims.at(mergeAxis) += sourceDims.at(mergeAxis);
154 target.extend(newDims.data() );
155 targetSpace.setExtentSimple(newDims.size(), newDims.data() );
156
157 // Now we need to work out how far we need to subdivide the source dataset
158 // to fit it inside the buffer.
159 std::size_t rowSize = getRowSize(source, mergeAxis);
160 // How many rows can we fit into one buffer
161 std::size_t nRowsBuffer = bufferSize / rowSize;
162 if (nRowsBuffer == 0)
163 throw std::invalid_argument(
164 "Allocated buffer is smaller than a single row! Merging is impossible.");
165
166 // We have to allocate an area in memory for the buffer. Unlike normally in
167 // C++ we aren't allocating a space for an object but a specific size. This
168 // means that we have to use malloc.
169 // Smart pointers require some annoying syntax to use with malloc, but we
170 // can implement the same pattern with a simple struct.
171 SmartMalloc buffer;
172
173 // Keep track of the offset from the target dataset
174 std::vector<hsize_t> targetOffset(nDims, 0);
175 // Start it from its end point before we extended it
176 targetOffset.at(mergeAxis) = targetDims.at(mergeAxis);
177
178 // Step through the source dataset in increments equal to the number of
179 // source rows that can fit into the buffer.
180 std::size_t nSourceRows = sourceDims.at(mergeAxis);
181 for (std::size_t iRow = 0; iRow < nSourceRows; iRow += nRowsBuffer) {
182 // Construct the size and offset of the source slab
183 std::vector<hsize_t> sourceOffset(nDims, 0);
184 sourceOffset.at(mergeAxis) = iRow;
185 // The number of rows to write
186 std::size_t nRowsToWrite = std::min(nSourceRows-iRow, nRowsBuffer);
187 std::vector<hsize_t> sourceSize(sourceDims);
188 sourceSize.at(mergeAxis) = nRowsToWrite;
189 // Create the source hyperslab
190 sourceSpace.selectNone();
191 sourceSpace.selectHyperslab(
192 H5S_SELECT_SET,
193 sourceSize.data(),
194 sourceOffset.data() );
195
196 // Create the target hyperslab
197 targetSpace.selectNone();
198 targetSpace.selectHyperslab(
199 H5S_SELECT_SET,
200 sourceSize.data(),
201 targetOffset.data() );
202
203 H5::DataSpace memorySpace(sourceSize.size(), sourceSize.data() );
204 memorySpace.selectAll();
205
206 // Prepare the buffer
207 buffer.allocate(nRowsToWrite*rowSize);
208 // Read into it
209 source.read(buffer.data, source.getDataType(), memorySpace, sourceSpace);
210 // Write from it
211 target.write(buffer.data, target.getDataType(), memorySpace, targetSpace);
212 // Increment the target offset
213 targetOffset.at(mergeAxis) += nRowsToWrite;
214 }
215 // Sanity check - make sure that the final targetOffset is where we think it
216 // should be
217 if (targetOffset.at(mergeAxis) != newDims.at(mergeAxis) )
218 throw std::logic_error(
219 "Target dataset was not filled! This indicates a logic error in the code!");
220 }
std::size_t getRowSize(const H5::DataSet &ds, hsize_t axis)
Calculate the size of a row of a dataset in bytes.

Variable Documentation

◆ CHUNK_SIZE

const size_t H5Utils::CHUNK_SIZE = 128

Definition at line 15 of file treeCopyOpts.h.