ATLAS Offline Software
Loading...
Searching...
No Matches
SH::ScanDir Struct Reference

the class used for scanning local directories and file servers for samples More...

#include <ScanDir.h>

Collaboration diagram for SH::ScanDir:

Public Member Functions

 ScanDir ()
 standard constructor
const ScanDirscan (SampleHandler &sh, const std::string &dir) const
 scan the given directory and put the created samples into the sample handler
const ScanDirscanEOS (SampleHandler &sh, const std::string &eosDir) const
 scan the given directory in EOS and put the created samples into the sample handler
const ScanDirscan (SampleHandler &sh, DiskList &list) const
 scan the given directory and put the created samples into the sample handler
ScanDirsampleDepth (int val_sampleDepth)
 the index of the file hierarchy at which we gather the sample name.
ScanDirabsSampleDepth (int val_absSampleDepth)
 the index of the file hierarchy at which we gather the sample name.
ScanDirsampleName (const std::string &val_sampleName)
 a single sample name into which all found files should be placed.
ScanDirminDepth (std::size_t val_minDepth)
 the minimum depth for files to make it into the sample
ScanDirmaxDepth (std::size_t val_maxDepth)
 the maximum depth for files to make it into the sample
ScanDirfilePattern (const std::string &val_filePattern)
 the pattern for files to be accepted
ScanDirfileRegex (const std::string &val_fileRegex)
 the regular expression for files to be accepted
ScanDirdirectoryPattern (const std::string &val_directoryPattern)
 the pattern for directories to be visited
ScanDirdirectoryRegex (const std::string &val_directoryRegex)
 the regular expression for directories to be visited
ScanDirsamplePattern (const std::string &val_samplePattern)
 the pattern for samples to be accepted
ScanDirsamplePostfix (const std::string &val_samplePostfix)
 the pattern for the postfix to be stripped from the sampleName
ScanDirsampleRename (const std::string &pattern, const std::string &name)
 rename any sample matching pattern to name
ScanDirextraNameComponent (int val_relSampleDepth)
 attach an extra name component to the sample based on a second component of the path

Private Types

typedef std::vector< std::pair< std::regex, std::string > >::const_iterator SampleRenameIter
 the list of entries from sampleRename

Private Member Functions

void recurse (std::map< std::string, SamplePtr > &samples, DiskList &list, const std::vector< std::string > &hierarchy) const
 perform the recursive scanning of the directory tree
void addSampleFile (std::map< std::string, SamplePtr > &samples, const std::vector< std::string > &hierarchy, const std::string &path) const
 add the given file to the sample based on the hierarchy, creating the sample if necessary
std::string findPathComponent (const std::vector< std::string > &hierarchy, int absSampleDepth, int relSampleDepth) const
 find the path component at the given depth

Private Attributes

int m_relSampleDepth
 if this is negative it is the depth at which we take the sample name, counting from the end
int m_absSampleDepth
 if m_relSampleDepth is not negative, it is the depth at which we take the sample name, counting from the first directory scanned
std::string m_sampleName
 the value set by sampleName
std::size_t m_minDepth
 the value set by minDepth
std::size_t m_maxDepth
 the value set by maxDepth
std::regex m_filePattern
 the value set by filePattern, converted to a regular expression
std::regex m_directoryPattern
 the value set by directoryPattern, converted to a regular expression
std::regex m_samplePattern
 the value set by samplePattern, converted to a regular expression
std::regex m_samplePostfix
 the value set by samplePostfix, converted to a regular expression
bool m_samplePostfixEmpty
 whether samplePostfix has been set to the empty string
std::vector< std::pair< std::regex, std::string > > m_sampleRename
int m_extraNameComponent
 the depth set with extraNameComponent, or 0 otherwise

Detailed Description

the class used for scanning local directories and file servers for samples

Originally these was a series of stand-alone function calls, but people kept asking for more and more options, making it unwieldy to call and to maintain. Instead we now have a single class containing all the possible parameters, which makes it easier to configure and extend.

The member functions all return *this, so that usage like this is possible:

.filePattern ("*.root*")
.scan (sh, "/data");
ScanDir()
standard constructor
Definition ScanDir.cxx:33

Definition at line 37 of file ScanDir.h.

Member Typedef Documentation

◆ SampleRenameIter

typedef std::vector<std::pair<std::regex,std::string>>::const_iterator SH::ScanDir::SampleRenameIter
private

the list of entries from sampleRename

Definition at line 210 of file ScanDir.h.

Constructor & Destructor Documentation

◆ ScanDir()

SH::ScanDir::ScanDir ( )

standard constructor

Guarantee
strong
Failures
out of memory I

Definition at line 32 of file ScanDir.cxx.

35 m_minDepth (0), m_maxDepth (-1),
42 {}
std::string glob_to_regexp(const std::string &glob)
returns: a string that is the regular expression equivalent of the given glob expression guarantee: s...
std::regex m_directoryPattern
the value set by directoryPattern, converted to a regular expression
Definition ScanDir.h:191
bool m_samplePostfixEmpty
whether samplePostfix has been set to the empty string
Definition ScanDir.h:206
std::regex m_filePattern
the value set by filePattern, converted to a regular expression
Definition ScanDir.h:186
int m_relSampleDepth
if this is negative it is the depth at which we take the sample name, counting from the end
Definition ScanDir.h:163
int m_extraNameComponent
the depth set with extraNameComponent, or 0 otherwise
Definition ScanDir.h:215
int m_absSampleDepth
if m_relSampleDepth is not negative, it is the depth at which we take the sample name,...
Definition ScanDir.h:169
std::size_t m_maxDepth
the value set by maxDepth
Definition ScanDir.h:181
std::regex m_samplePattern
the value set by samplePattern, converted to a regular expression
Definition ScanDir.h:196
std::size_t m_minDepth
the value set by minDepth
Definition ScanDir.h:177
std::regex m_samplePostfix
the value set by samplePostfix, converted to a regular expression
Definition ScanDir.h:201

Member Function Documentation

◆ absSampleDepth()

ScanDir & SH::ScanDir::absSampleDepth ( int val_absSampleDepth)

the index of the file hierarchy at which we gather the sample name.

this differs from sampleDepth in that negative numbers count up in the directory hierarchy from the top of where we scan, while sampleDepth starts counting from the back if the number is negative.

Definition at line 56 of file ScanDir.cxx.

58 {
60 m_absSampleDepth = val_absSampleDepth;
61 return *this;
62 }

◆ addSampleFile()

void SH::ScanDir::addSampleFile ( std::map< std::string, SamplePtr > & samples,
const std::vector< std::string > & hierarchy,
const std::string & path ) const
private

add the given file to the sample based on the hierarchy, creating the sample if necessary

Guarantee
basic
Failures
out of memory II

Definition at line 254 of file ScanDir.cxx.

258 {
259 std::string sampleName;
260
261 if (!m_sampleName.empty())
262 {
264 } else
265 {
268 if (sampleName.empty())
269 return;
270
272 {
273 bool done = false;
274 for (std::size_t iter = 0, end = sampleName.size();
275 iter != end && !done; ++ iter)
276 {
277 if (RCU::match_expr (m_samplePostfix, sampleName.substr (iter)))
278 {
279 if (iter == 0)
280 RCU_THROW_MSG ("sample name matches entire postfix pattern: \"" + sampleName + "\"");
281 sampleName.resize (iter);
282 done = true;
283 }
284 }
285 }
286
287 if (m_extraNameComponent != 0)
288 {
289 std::string component = findPathComponent
291 if (component.empty())
292 return;
293 sampleName += "_" + component;
294 }
295
297 return;
298
299 {
300 bool done = false;
301 for (SampleRenameIter iter = m_sampleRename.begin(),
302 end = m_sampleRename.end(); !done && iter != end; ++ iter)
303 {
304 if (RCU::match_expr (iter->first, sampleName))
305 {
306 sampleName = iter->second;
307 done = true;
308 }
309 }
310 }
311 }
312
313 std::map<std::string,SamplePtr>::iterator iter
314 = samples.find (sampleName);
315 if (iter == samples.end())
316 {
317 SamplePtr sample (new SampleLocal (sampleName));
318 samples[sampleName] = sample;
319 iter = samples.find (sampleName);
320 }
321 SampleLocal *sample = dynamic_cast<SampleLocal*>(iter->second.get());
322 RCU_ASSERT (sample != 0);
323 sample->add (path);
324 }
#define RCU_ASSERT(x)
Definition Assert.h:222
#define RCU_THROW_MSG(message)
Definition PrintMsg.h:58
bool match_expr(const std::regex &expr, const std::string &str)
returns: whether we can match the entire string with the regular expression guarantee: strong failure...
ScanDir & sampleName(const std::string &val_sampleName)
a single sample name into which all found files should be placed.
Definition ScanDir.cxx:67
std::string m_sampleName
the value set by sampleName
Definition ScanDir.h:173
std::vector< std::pair< std::regex, std::string > >::const_iterator SampleRenameIter
the list of entries from sampleRename
Definition ScanDir.h:210
std::string findPathComponent(const std::vector< std::string > &hierarchy, int absSampleDepth, int relSampleDepth) const
find the path component at the given depth
Definition ScanDir.cxx:329
std::vector< std::pair< std::regex, std::string > > m_sampleRename
Definition ScanDir.h:211

◆ directoryPattern()

ScanDir & SH::ScanDir::directoryPattern ( const std::string & val_directoryPattern)

the pattern for directories to be visited

See also
directoryPatternRegex

Definition at line 111 of file ScanDir.cxx.

113 {
114 m_directoryPattern = RCU::glob_to_regexp (val_directoryPattern);
115 return *this;
116 }

◆ directoryRegex()

ScanDir & SH::ScanDir::directoryRegex ( const std::string & val_directoryRegex)

the regular expression for directories to be visited

See also
directoryPattern

Definition at line 120 of file ScanDir.cxx.

122 {
123 m_directoryPattern = val_directoryRegex;
124 return *this;
125 }

◆ extraNameComponent()

ScanDir & SH::ScanDir::extraNameComponent ( int val_relSampleDepth)

attach an extra name component to the sample based on a second component of the path

Precondition
val_relSampleDepth != 0

Definition at line 157 of file ScanDir.cxx.

159 {
160 RCU_REQUIRE (val_relSampleDepth != 0);
161 m_extraNameComponent = val_relSampleDepth;
162 return *this;
163 }
#define RCU_REQUIRE(x)
Definition Assert.h:208

◆ filePattern()

ScanDir & SH::ScanDir::filePattern ( const std::string & val_filePattern)

the pattern for files to be accepted

See also
filePatternRegex

Definition at line 93 of file ScanDir.cxx.

95 {
96 m_filePattern = RCU::glob_to_regexp (val_filePattern);
97 return *this;
98 }

◆ fileRegex()

ScanDir & SH::ScanDir::fileRegex ( const std::string & val_fileRegex)

the regular expression for files to be accepted

See also
filePattern

Definition at line 102 of file ScanDir.cxx.

104 {
105 m_filePattern = val_fileRegex;
106 return *this;
107 }

◆ findPathComponent()

std::string SH::ScanDir::findPathComponent ( const std::vector< std::string > & hierarchy,
int absSampleDepth,
int relSampleDepth ) const
private

find the path component at the given depth

Returns
the path componenent, or NULL if it doesn't exist
Guarantee
strong
Failures
out of memory II

Definition at line 328 of file ScanDir.cxx.

332 {
333 std::string sampleName;
334
335 int myindex = absSampleDepth+1;
336 if (relSampleDepth < 0)
337 myindex = relSampleDepth + hierarchy.size();
338 if (std::size_t (myindex) >= hierarchy.size())
339 return sampleName;
340 if (myindex > 0)
341 {
342 sampleName = hierarchy[myindex];
343 } else
344 {
345 sampleName = hierarchy[0];
346 while (sampleName.empty() ||
347 sampleName[sampleName.size()-1] == '/' ||
348 myindex < 0)
349 {
350 while (!sampleName.empty() && sampleName[sampleName.size()-1] == '/')
351 sampleName.pop_back();
352 if (sampleName.empty())
353 return sampleName;
354 if (myindex < 0)
355 {
356 std::string::size_type split = sampleName.rfind ('/');
357 if (split == std::string::npos)
358 {
359 sampleName.clear ();
360 return sampleName;
361 }
362 sampleName.resize (split);
363 ++ myindex;
364 }
365 if (sampleName.empty())
366 return sampleName;
367 }
368 std::string::size_type split = sampleName.rfind ('/');
369 if (split != std::string::npos)
370 sampleName = sampleName.substr (split + 1);
371 }
372 return sampleName;
373 }
std::vector< std::string > split(const std::string &s, const std::string &t=":")
Definition hcg.cxx:177
ScanDir & absSampleDepth(int val_absSampleDepth)
the index of the file hierarchy at which we gather the sample name.
Definition ScanDir.cxx:57

◆ maxDepth()

ScanDir & SH::ScanDir::maxDepth ( std::size_t val_maxDepth)

the maximum depth for files to make it into the sample

Definition at line 84 of file ScanDir.cxx.

86 {
87 m_maxDepth = val_maxDepth;
88 return *this;
89 }

◆ minDepth()

ScanDir & SH::ScanDir::minDepth ( std::size_t val_minDepth)

the minimum depth for files to make it into the sample

Definition at line 75 of file ScanDir.cxx.

77 {
78 m_minDepth = val_minDepth;
79 return *this;
80 }

◆ recurse()

void SH::ScanDir::recurse ( std::map< std::string, SamplePtr > & samples,
DiskList & list,
const std::vector< std::string > & hierarchy ) const
private

perform the recursive scanning of the directory tree

Guarantee
basic
Failures
out of memory III
i/o errors

Definition at line 209 of file ScanDir.cxx.

212 {
213 using namespace msgScanDir;
214
215 ANA_MSG_DEBUG ("scanning directory: " << list.dirname());
216 while (list.next())
217 {
218 std::unique_ptr<DiskList> sublist (list.openDir());
219
220 if (sublist.get() != 0)
221 {
222 if (!RCU::match_expr (m_directoryPattern, list.fileName()))
223 {
224 ANA_MSG_DEBUG ("directory does not match pattern, skipping directory " << list.path());
225 } else if (hierarchy.size() > m_maxDepth)
226 {
227 ANA_MSG_DEBUG ("maxDepth exceeded, skipping directory " << list.path());
228 } else
229 {
230 ANA_MSG_DEBUG ("descending into directory " << list.path());
231 std::vector<std::string> subhierarchy = hierarchy;
232 subhierarchy.push_back (list.fileName());
233 recurse (samples, *sublist, subhierarchy);
234 }
235 } else
236 {
237 if (hierarchy.size() > m_minDepth &&
238 RCU::match_expr (m_filePattern, list.fileName()))
239 {
240 ANA_MSG_DEBUG ("adding file " << list.path());
241 std::vector<std::string> subhierarchy = hierarchy;
242 subhierarchy.push_back (list.fileName());
243 addSampleFile (samples, subhierarchy, list.path());
244 } else
245 {
246 ANA_MSG_DEBUG ("skipping file " << list.path());
247 }
248 }
249 }
250 }
#define ANA_MSG_DEBUG(xmsg)
Macro printing debug messages.
list(name, path='/')
Definition histSizes.py:38
void recurse(std::map< std::string, SamplePtr > &samples, DiskList &list, const std::vector< std::string > &hierarchy) const
perform the recursive scanning of the directory tree
Definition ScanDir.cxx:210
void addSampleFile(std::map< std::string, SamplePtr > &samples, const std::vector< std::string > &hierarchy, const std::string &path) const
add the given file to the sample based on the hierarchy, creating the sample if necessary
Definition ScanDir.cxx:255

◆ sampleDepth()

ScanDir & SH::ScanDir::sampleDepth ( int val_sampleDepth)

the index of the file hierarchy at which we gather the sample name.

this is positive when it starts counting from the top, and negative when it starts from the back, i.e. -1 uses the file name, 0 denotes the directory inside the top level directory

Definition at line 46 of file ScanDir.cxx.

48 {
49 m_relSampleDepth = val_sampleDepth;
50 m_absSampleDepth = val_sampleDepth;
51 return *this;
52 }

◆ sampleName()

ScanDir & SH::ScanDir::sampleName ( const std::string & val_sampleName)

a single sample name into which all found files should be placed.

if set, this overrides all other naming methods.

Definition at line 66 of file ScanDir.cxx.

68 {
69 m_sampleName = val_sampleName;
70 return *this;
71 }

◆ samplePattern()

ScanDir & SH::ScanDir::samplePattern ( const std::string & val_samplePattern)

the pattern for samples to be accepted

Definition at line 129 of file ScanDir.cxx.

131 {
132 m_samplePattern = RCU::glob_to_regexp (val_samplePattern);
133 return *this;
134 }

◆ samplePostfix()

ScanDir & SH::ScanDir::samplePostfix ( const std::string & val_samplePostfix)

the pattern for the postfix to be stripped from the sampleName

Definition at line 138 of file ScanDir.cxx.

140 {
141 m_samplePostfix = RCU::glob_to_regexp (val_samplePostfix);
142 m_samplePostfixEmpty = val_samplePostfix.empty();
143 return *this;
144 }

◆ sampleRename()

ScanDir & SH::ScanDir::sampleRename ( const std::string & pattern,
const std::string & name )

rename any sample matching pattern to name

Definition at line 148 of file ScanDir.cxx.

150 {
151 m_sampleRename.push_back (std::pair<std::regex,std::string> (std::regex (RCU::glob_to_regexp (pattern)), name));
152 return *this;
153 }

◆ scan() [1/2]

const ScanDir & SH::ScanDir::scan ( SampleHandler & sh,
const std::string & dir ) const

scan the given directory and put the created samples into the sample handler

Returns
*this
Guarantee
basic
Failures
out of memory III
i/o errors
duplicate samples

Definition at line 167 of file ScanDir.cxx.

169 {
170 DiskListLocal list (dir);
171 scan (sh, list);
172 return *this;
173 }
const ScanDir & scan(SampleHandler &sh, const std::string &dir) const
scan the given directory and put the created samples into the sample handler
Definition ScanDir.cxx:168

◆ scan() [2/2]

const ScanDir & SH::ScanDir::scan ( SampleHandler & sh,
DiskList & list ) const

scan the given directory and put the created samples into the sample handler

Returns
*this
Guarantee
basic
Failures
out of memory III
i/o errors
duplicate samples

Definition at line 187 of file ScanDir.cxx.

189 {
190 std::vector<std::string> hierarchy;
191 hierarchy.push_back (list.dirname());
192
193 std::map<std::string,SamplePtr> samples;
194 typedef std::map<std::string,SamplePtr>::iterator samplesIter;
195 recurse (samples, list, hierarchy);
196 for (samplesIter sample = samples.begin(), end = samples.end();
197 sample != end; ++ sample)
198 {
199 if (sample->second.get() != 0)
200 {
201 sh.add (sample->second);
202 }
203 }
204 return *this;
205 }

◆ scanEOS()

const ScanDir & SH::ScanDir::scanEOS ( SampleHandler & sh,
const std::string & eosDir ) const

scan the given directory in EOS and put the created samples into the sample handler

Returns
*this
Guarantee
basic
Failures
out of memory III
i/o errors
duplicate samples

Definition at line 177 of file ScanDir.cxx.

179 {
180 DiskListEOS list (eosDir);
181 scan (sh, list);
182 return *this;
183 }

Member Data Documentation

◆ m_absSampleDepth

int SH::ScanDir::m_absSampleDepth
private

if m_relSampleDepth is not negative, it is the depth at which we take the sample name, counting from the first directory scanned

Definition at line 169 of file ScanDir.h.

◆ m_directoryPattern

std::regex SH::ScanDir::m_directoryPattern
private

the value set by directoryPattern, converted to a regular expression

Definition at line 191 of file ScanDir.h.

◆ m_extraNameComponent

int SH::ScanDir::m_extraNameComponent
private

the depth set with extraNameComponent, or 0 otherwise

Definition at line 215 of file ScanDir.h.

◆ m_filePattern

std::regex SH::ScanDir::m_filePattern
private

the value set by filePattern, converted to a regular expression

Definition at line 186 of file ScanDir.h.

◆ m_maxDepth

std::size_t SH::ScanDir::m_maxDepth
private

the value set by maxDepth

Definition at line 181 of file ScanDir.h.

◆ m_minDepth

std::size_t SH::ScanDir::m_minDepth
private

the value set by minDepth

Definition at line 177 of file ScanDir.h.

◆ m_relSampleDepth

int SH::ScanDir::m_relSampleDepth
private

if this is negative it is the depth at which we take the sample name, counting from the end

Definition at line 163 of file ScanDir.h.

◆ m_sampleName

std::string SH::ScanDir::m_sampleName
private

the value set by sampleName

Definition at line 173 of file ScanDir.h.

◆ m_samplePattern

std::regex SH::ScanDir::m_samplePattern
private

the value set by samplePattern, converted to a regular expression

Definition at line 196 of file ScanDir.h.

◆ m_samplePostfix

std::regex SH::ScanDir::m_samplePostfix
private

the value set by samplePostfix, converted to a regular expression

Definition at line 201 of file ScanDir.h.

◆ m_samplePostfixEmpty

bool SH::ScanDir::m_samplePostfixEmpty
private

whether samplePostfix has been set to the empty string

Definition at line 206 of file ScanDir.h.

◆ m_sampleRename

std::vector<std::pair<std::regex,std::string> > SH::ScanDir::m_sampleRename
private

Definition at line 211 of file ScanDir.h.


The documentation for this struct was generated from the following files: