ATLAS Offline Software
Loading...
Searching...
No Matches
SH::ScanDir Struct Reference

the class used for scanning local directories and file servers for samples More...

#include <ScanDir.h>

Collaboration diagram for SH::ScanDir:

Public Member Functions

 ScanDir ()
 standard constructor
const ScanDirscan (SampleHandler &sh, const std::string &dir) const
 scan the given directory and put the created samples into the sample handler
const ScanDirscanEOS (SampleHandler &sh, const std::string &eosDir) const
 scan the given directory in EOS and put the created samples into the sample handler
const ScanDirscan (SampleHandler &sh, DiskList &list) const
 scan the given directory and put the created samples into the sample handler
ScanDirsampleDepth (int val_sampleDepth)
 the index of the file hierarchy at which we gather the sample name.
ScanDirabsSampleDepth (int val_absSampleDepth)
 the index of the file hierarchy at which we gather the sample name.
ScanDirsampleName (const std::string &val_sampleName)
 a single sample name into which all found files should be placed.
ScanDirminDepth (std::size_t val_minDepth)
 the minimum depth for files to make it into the sample
ScanDirmaxDepth (std::size_t val_maxDepth)
 the maximum depth for files to make it into the sample
ScanDirfilePattern (const std::string &val_filePattern)
 the pattern for files to be accepted
ScanDirfileRegex (const std::string &val_fileRegex)
 the regular expression for files to be accepted
ScanDirdirectoryPattern (const std::string &val_directoryPattern)
 the pattern for directories to be visited
ScanDirdirectoryRegex (const std::string &val_directoryRegex)
 the regular expression for directories to be visited
ScanDirsamplePattern (const std::string &val_samplePattern)
 the pattern for samples to be accepted
ScanDirsamplePostfix (const std::string &val_samplePostfix)
 the pattern for the postfix to be stripped from the sampleName
ScanDirsampleRename (const std::string &pattern, const std::string &name)
 rename any sample matching pattern to name
ScanDirextraNameComponent (int val_relSampleDepth)
 attach an extra name component to the sample based on a second component of the path

Private Types

typedef std::vector< std::pair< std::regex, std::string > >::const_iterator SampleRenameIter
 the list of entries from sampleRename

Private Member Functions

void recurse (std::map< std::string, std::shared_ptr< Sample > > &samples, DiskList &list, const std::vector< std::string > &hierarchy) const
 perform the recursive scanning of the directory tree
void addSampleFile (std::map< std::string, std::shared_ptr< Sample > > &samples, const std::vector< std::string > &hierarchy, const std::string &path) const
 add the given file to the sample based on the hierarchy, creating the sample if necessary
std::string findPathComponent (const std::vector< std::string > &hierarchy, int absSampleDepth, int relSampleDepth) const
 find the path component at the given depth

Private Attributes

int m_relSampleDepth
 if this is negative it is the depth at which we take the sample name, counting from the end
int m_absSampleDepth
 if m_relSampleDepth is not negative, it is the depth at which we take the sample name, counting from the first directory scanned
std::string m_sampleName
 the value set by sampleName
std::size_t m_minDepth
 the value set by minDepth
std::size_t m_maxDepth
 the value set by maxDepth
std::regex m_filePattern
 the value set by filePattern, converted to a regular expression
std::regex m_directoryPattern
 the value set by directoryPattern, converted to a regular expression
std::regex m_samplePattern
 the value set by samplePattern, converted to a regular expression
std::regex m_samplePostfix
 the value set by samplePostfix, converted to a regular expression
bool m_samplePostfixEmpty
 whether samplePostfix has been set to the empty string
std::vector< std::pair< std::regex, std::string > > m_sampleRename
int m_extraNameComponent
 the depth set with extraNameComponent, or 0 otherwise

Detailed Description

the class used for scanning local directories and file servers for samples

Originally these was a series of stand-alone function calls, but people kept asking for more and more options, making it unwieldy to call and to maintain. Instead we now have a single class containing all the possible parameters, which makes it easier to configure and extend.

The member functions all return *this, so that usage like this is possible:

.filePattern ("*.root*")
.scan (sh, "/data");
ScanDir()
standard constructor
Definition ScanDir.cxx:33

Definition at line 38 of file ScanDir.h.

Member Typedef Documentation

◆ SampleRenameIter

typedef std::vector<std::pair<std::regex,std::string>>::const_iterator SH::ScanDir::SampleRenameIter
private

the list of entries from sampleRename

Definition at line 211 of file ScanDir.h.

Constructor & Destructor Documentation

◆ ScanDir()

SH::ScanDir::ScanDir ( )

standard constructor

Guarantee
strong
Failures
out of memory I

Definition at line 32 of file ScanDir.cxx.

35 m_minDepth (0), m_maxDepth (-1),
42 {}
std::string glob_to_regexp(std::string_view glob)
returns: a string that is the regular expression equivalent of the given glob expression guarantee: s...
std::regex m_directoryPattern
the value set by directoryPattern, converted to a regular expression
Definition ScanDir.h:192
bool m_samplePostfixEmpty
whether samplePostfix has been set to the empty string
Definition ScanDir.h:207
std::regex m_filePattern
the value set by filePattern, converted to a regular expression
Definition ScanDir.h:187
int m_relSampleDepth
if this is negative it is the depth at which we take the sample name, counting from the end
Definition ScanDir.h:164
int m_extraNameComponent
the depth set with extraNameComponent, or 0 otherwise
Definition ScanDir.h:216
int m_absSampleDepth
if m_relSampleDepth is not negative, it is the depth at which we take the sample name,...
Definition ScanDir.h:170
std::size_t m_maxDepth
the value set by maxDepth
Definition ScanDir.h:182
std::regex m_samplePattern
the value set by samplePattern, converted to a regular expression
Definition ScanDir.h:197
std::size_t m_minDepth
the value set by minDepth
Definition ScanDir.h:178
std::regex m_samplePostfix
the value set by samplePostfix, converted to a regular expression
Definition ScanDir.h:202

Member Function Documentation

◆ absSampleDepth()

ScanDir & SH::ScanDir::absSampleDepth ( int val_absSampleDepth)

the index of the file hierarchy at which we gather the sample name.

this differs from sampleDepth in that negative numbers count up in the directory hierarchy from the top of where we scan, while sampleDepth starts counting from the back if the number is negative.

Definition at line 56 of file ScanDir.cxx.

58 {
60 m_absSampleDepth = val_absSampleDepth;
61 return *this;
62 }

◆ addSampleFile()

void SH::ScanDir::addSampleFile ( std::map< std::string, std::shared_ptr< Sample > > & samples,
const std::vector< std::string > & hierarchy,
const std::string & path ) const
private

add the given file to the sample based on the hierarchy, creating the sample if necessary

Guarantee
basic
Failures
out of memory II

Definition at line 254 of file ScanDir.cxx.

258 {
259 std::string sampleName;
260
261 if (!m_sampleName.empty())
262 {
264 } else
265 {
268 if (sampleName.empty())
269 return;
270
272 {
273 bool done = false;
274 for (std::size_t iter = 0, end = sampleName.size();
275 iter != end && !done; ++ iter)
276 {
277 if (RCU::match_expr (m_samplePostfix, sampleName.substr (iter)))
278 {
279 if (iter == 0)
280 RCU_THROW_MSG ("sample name matches entire postfix pattern: \"" + sampleName + "\"");
281 sampleName.resize (iter);
282 done = true;
283 }
284 }
285 }
286
287 if (m_extraNameComponent != 0)
288 {
289 std::string component = findPathComponent
291 if (component.empty())
292 return;
293 sampleName += "_" + component;
294 }
295
297 return;
298
299 {
300 bool done = false;
301 for (SampleRenameIter iter = m_sampleRename.begin(),
302 end = m_sampleRename.end(); !done && iter != end; ++ iter)
303 {
304 if (RCU::match_expr (iter->first, sampleName))
305 {
306 sampleName = iter->second;
307 done = true;
308 }
309 }
310 }
311 }
312
313 auto iter = samples.find (sampleName);
314 if (iter == samples.end())
315 {
316 auto sample = std::make_shared<SampleLocal> (sampleName);
317 samples[sampleName] = sample;
318 iter = samples.find (sampleName);
319 }
320 SampleLocal *sample = dynamic_cast<SampleLocal*>(iter->second.get());
321 RCU_ASSERT (sample != 0);
322 sample->add (path);
323 }
#define RCU_ASSERT(x)
Definition Assert.h:217
#define RCU_THROW_MSG(message)
Definition PrintMsg.h:53
bool match_expr(const std::regex &expr, std::string_view str)
returns: whether we can match the entire string with the regular expression guarantee: strong failure...
ScanDir & sampleName(const std::string &val_sampleName)
a single sample name into which all found files should be placed.
Definition ScanDir.cxx:67
std::string m_sampleName
the value set by sampleName
Definition ScanDir.h:174
std::vector< std::pair< std::regex, std::string > >::const_iterator SampleRenameIter
the list of entries from sampleRename
Definition ScanDir.h:211
std::string findPathComponent(const std::vector< std::string > &hierarchy, int absSampleDepth, int relSampleDepth) const
find the path component at the given depth
Definition ScanDir.cxx:328
std::vector< std::pair< std::regex, std::string > > m_sampleRename
Definition ScanDir.h:212

◆ directoryPattern()

ScanDir & SH::ScanDir::directoryPattern ( const std::string & val_directoryPattern)

the pattern for directories to be visited

See also
directoryPatternRegex

Definition at line 111 of file ScanDir.cxx.

113 {
114 m_directoryPattern = RCU::glob_to_regexp (val_directoryPattern);
115 return *this;
116 }

◆ directoryRegex()

ScanDir & SH::ScanDir::directoryRegex ( const std::string & val_directoryRegex)

the regular expression for directories to be visited

See also
directoryPattern

Definition at line 120 of file ScanDir.cxx.

122 {
123 m_directoryPattern = val_directoryRegex;
124 return *this;
125 }

◆ extraNameComponent()

ScanDir & SH::ScanDir::extraNameComponent ( int val_relSampleDepth)

attach an extra name component to the sample based on a second component of the path

Precondition
val_relSampleDepth != 0

Definition at line 157 of file ScanDir.cxx.

159 {
160 RCU_REQUIRE (val_relSampleDepth != 0);
161 m_extraNameComponent = val_relSampleDepth;
162 return *this;
163 }
#define RCU_REQUIRE(x)
Definition Assert.h:203

◆ filePattern()

ScanDir & SH::ScanDir::filePattern ( const std::string & val_filePattern)

the pattern for files to be accepted

See also
filePatternRegex

Definition at line 93 of file ScanDir.cxx.

95 {
96 m_filePattern = RCU::glob_to_regexp (val_filePattern);
97 return *this;
98 }

◆ fileRegex()

ScanDir & SH::ScanDir::fileRegex ( const std::string & val_fileRegex)

the regular expression for files to be accepted

See also
filePattern

Definition at line 102 of file ScanDir.cxx.

104 {
105 m_filePattern = val_fileRegex;
106 return *this;
107 }

◆ findPathComponent()

std::string SH::ScanDir::findPathComponent ( const std::vector< std::string > & hierarchy,
int absSampleDepth,
int relSampleDepth ) const
private

find the path component at the given depth

Returns
the path componenent, or NULL if it doesn't exist
Guarantee
strong
Failures
out of memory II

Definition at line 327 of file ScanDir.cxx.

331 {
332 std::string sampleName;
333
334 int myindex = absSampleDepth+1;
335 if (relSampleDepth < 0)
336 myindex = relSampleDepth + hierarchy.size();
337 if (std::size_t (myindex) >= hierarchy.size())
338 return sampleName;
339 if (myindex > 0)
340 {
341 sampleName = hierarchy[myindex];
342 } else
343 {
344 sampleName = hierarchy[0];
345 while (sampleName.empty() ||
346 sampleName[sampleName.size()-1] == '/' ||
347 myindex < 0)
348 {
349 while (!sampleName.empty() && sampleName[sampleName.size()-1] == '/')
350 sampleName.pop_back();
351 if (sampleName.empty())
352 return sampleName;
353 if (myindex < 0)
354 {
355 std::string::size_type split = sampleName.rfind ('/');
356 if (split == std::string::npos)
357 {
358 sampleName.clear ();
359 return sampleName;
360 }
361 sampleName.resize (split);
362 ++ myindex;
363 }
364 if (sampleName.empty())
365 return sampleName;
366 }
367 std::string::size_type split = sampleName.rfind ('/');
368 if (split != std::string::npos)
369 sampleName = sampleName.substr (split + 1);
370 }
371 return sampleName;
372 }
std::vector< std::string > split(const std::string &s, const std::string &t=":")
Definition hcg.cxx:179
ScanDir & absSampleDepth(int val_absSampleDepth)
the index of the file hierarchy at which we gather the sample name.
Definition ScanDir.cxx:57

◆ maxDepth()

ScanDir & SH::ScanDir::maxDepth ( std::size_t val_maxDepth)

the maximum depth for files to make it into the sample

Definition at line 84 of file ScanDir.cxx.

86 {
87 m_maxDepth = val_maxDepth;
88 return *this;
89 }

◆ minDepth()

ScanDir & SH::ScanDir::minDepth ( std::size_t val_minDepth)

the minimum depth for files to make it into the sample

Definition at line 75 of file ScanDir.cxx.

77 {
78 m_minDepth = val_minDepth;
79 return *this;
80 }

◆ recurse()

void SH::ScanDir::recurse ( std::map< std::string, std::shared_ptr< Sample > > & samples,
DiskList & list,
const std::vector< std::string > & hierarchy ) const
private

perform the recursive scanning of the directory tree

Guarantee
basic
Failures
out of memory III
i/o errors

Definition at line 208 of file ScanDir.cxx.

212 {
213 using namespace msgScanDir;
214
215 ANA_MSG_DEBUG ("scanning directory: " << list.dirname());
216 while (list.next())
217 {
218 std::unique_ptr<DiskList> sublist (list.openDir());
219
220 if (sublist.get() != 0)
221 {
222 if (!RCU::match_expr (m_directoryPattern, list.fileName()))
223 {
224 ANA_MSG_DEBUG ("directory does not match pattern, skipping directory " << list.path());
225 } else if (hierarchy.size() > m_maxDepth)
226 {
227 ANA_MSG_DEBUG ("maxDepth exceeded, skipping directory " << list.path());
228 } else
229 {
230 ANA_MSG_DEBUG ("descending into directory " << list.path());
231 std::vector<std::string> subhierarchy = hierarchy;
232 subhierarchy.push_back (list.fileName());
233 recurse (samples, *sublist, subhierarchy);
234 }
235 } else
236 {
237 if (hierarchy.size() > m_minDepth &&
238 RCU::match_expr (m_filePattern, list.fileName()))
239 {
240 ANA_MSG_DEBUG ("adding file " << list.path());
241 std::vector<std::string> subhierarchy = hierarchy;
242 subhierarchy.push_back (list.fileName());
243 addSampleFile (samples, subhierarchy, list.path());
244 } else
245 {
246 ANA_MSG_DEBUG ("skipping file " << list.path());
247 }
248 }
249 }
250 }
#define ANA_MSG_DEBUG(xmsg)
Macro printing debug messages.
list(name, path='/')
Definition histSizes.py:38
void recurse(std::map< std::string, std::shared_ptr< Sample > > &samples, DiskList &list, const std::vector< std::string > &hierarchy) const
perform the recursive scanning of the directory tree
Definition ScanDir.cxx:209
void addSampleFile(std::map< std::string, std::shared_ptr< Sample > > &samples, const std::vector< std::string > &hierarchy, const std::string &path) const
add the given file to the sample based on the hierarchy, creating the sample if necessary
Definition ScanDir.cxx:255

◆ sampleDepth()

ScanDir & SH::ScanDir::sampleDepth ( int val_sampleDepth)

the index of the file hierarchy at which we gather the sample name.

this is positive when it starts counting from the top, and negative when it starts from the back, i.e. -1 uses the file name, 0 denotes the directory inside the top level directory

Definition at line 46 of file ScanDir.cxx.

48 {
49 m_relSampleDepth = val_sampleDepth;
50 m_absSampleDepth = val_sampleDepth;
51 return *this;
52 }

◆ sampleName()

ScanDir & SH::ScanDir::sampleName ( const std::string & val_sampleName)

a single sample name into which all found files should be placed.

if set, this overrides all other naming methods.

Definition at line 66 of file ScanDir.cxx.

68 {
69 m_sampleName = val_sampleName;
70 return *this;
71 }

◆ samplePattern()

ScanDir & SH::ScanDir::samplePattern ( const std::string & val_samplePattern)

the pattern for samples to be accepted

Definition at line 129 of file ScanDir.cxx.

131 {
132 m_samplePattern = RCU::glob_to_regexp (val_samplePattern);
133 return *this;
134 }

◆ samplePostfix()

ScanDir & SH::ScanDir::samplePostfix ( const std::string & val_samplePostfix)

the pattern for the postfix to be stripped from the sampleName

Definition at line 138 of file ScanDir.cxx.

140 {
141 m_samplePostfix = RCU::glob_to_regexp (val_samplePostfix);
142 m_samplePostfixEmpty = val_samplePostfix.empty();
143 return *this;
144 }

◆ sampleRename()

ScanDir & SH::ScanDir::sampleRename ( const std::string & pattern,
const std::string & name )

rename any sample matching pattern to name

Definition at line 148 of file ScanDir.cxx.

150 {
151 m_sampleRename.push_back (std::pair<std::regex,std::string> (std::regex (RCU::glob_to_regexp (pattern)), name));
152 return *this;
153 }

◆ scan() [1/2]

const ScanDir & SH::ScanDir::scan ( SampleHandler & sh,
const std::string & dir ) const

scan the given directory and put the created samples into the sample handler

Returns
*this
Guarantee
basic
Failures
out of memory III
i/o errors
duplicate samples

Definition at line 167 of file ScanDir.cxx.

169 {
170 DiskListLocal list (dir);
171 scan (sh, list);
172 return *this;
173 }
const ScanDir & scan(SampleHandler &sh, const std::string &dir) const
scan the given directory and put the created samples into the sample handler
Definition ScanDir.cxx:168

◆ scan() [2/2]

const ScanDir & SH::ScanDir::scan ( SampleHandler & sh,
DiskList & list ) const

scan the given directory and put the created samples into the sample handler

Returns
*this
Guarantee
basic
Failures
out of memory III
i/o errors
duplicate samples

Definition at line 187 of file ScanDir.cxx.

189 {
190 std::vector<std::string> hierarchy;
191 hierarchy.push_back (list.dirname());
192
193 std::map<std::string,std::shared_ptr<Sample>> samples;
194 recurse (samples, list, hierarchy);
195 for (auto sample = samples.begin(), end = samples.end();
196 sample != end; ++ sample)
197 {
198 if (sample->second != nullptr)
199 {
200 sh.add (sample->second);
201 }
202 }
203 return *this;
204 }

◆ scanEOS()

const ScanDir & SH::ScanDir::scanEOS ( SampleHandler & sh,
const std::string & eosDir ) const

scan the given directory in EOS and put the created samples into the sample handler

Returns
*this
Guarantee
basic
Failures
out of memory III
i/o errors
duplicate samples

Definition at line 177 of file ScanDir.cxx.

179 {
180 DiskListEOS list (eosDir);
181 scan (sh, list);
182 return *this;
183 }

Member Data Documentation

◆ m_absSampleDepth

int SH::ScanDir::m_absSampleDepth
private

if m_relSampleDepth is not negative, it is the depth at which we take the sample name, counting from the first directory scanned

Definition at line 170 of file ScanDir.h.

◆ m_directoryPattern

std::regex SH::ScanDir::m_directoryPattern
private

the value set by directoryPattern, converted to a regular expression

Definition at line 192 of file ScanDir.h.

◆ m_extraNameComponent

int SH::ScanDir::m_extraNameComponent
private

the depth set with extraNameComponent, or 0 otherwise

Definition at line 216 of file ScanDir.h.

◆ m_filePattern

std::regex SH::ScanDir::m_filePattern
private

the value set by filePattern, converted to a regular expression

Definition at line 187 of file ScanDir.h.

◆ m_maxDepth

std::size_t SH::ScanDir::m_maxDepth
private

the value set by maxDepth

Definition at line 182 of file ScanDir.h.

◆ m_minDepth

std::size_t SH::ScanDir::m_minDepth
private

the value set by minDepth

Definition at line 178 of file ScanDir.h.

◆ m_relSampleDepth

int SH::ScanDir::m_relSampleDepth
private

if this is negative it is the depth at which we take the sample name, counting from the end

Definition at line 164 of file ScanDir.h.

◆ m_sampleName

std::string SH::ScanDir::m_sampleName
private

the value set by sampleName

Definition at line 174 of file ScanDir.h.

◆ m_samplePattern

std::regex SH::ScanDir::m_samplePattern
private

the value set by samplePattern, converted to a regular expression

Definition at line 197 of file ScanDir.h.

◆ m_samplePostfix

std::regex SH::ScanDir::m_samplePostfix
private

the value set by samplePostfix, converted to a regular expression

Definition at line 202 of file ScanDir.h.

◆ m_samplePostfixEmpty

bool SH::ScanDir::m_samplePostfixEmpty
private

whether samplePostfix has been set to the empty string

Definition at line 207 of file ScanDir.h.

◆ m_sampleRename

std::vector<std::pair<std::regex,std::string> > SH::ScanDir::m_sampleRename
private

Definition at line 212 of file ScanDir.h.


The documentation for this struct was generated from the following files: