ATLAS Offline Software
Public Member Functions | Private Types | Private Member Functions | Private Attributes | List of all members
SH::ScanDir Struct Reference

the class used for scanning local directories and file servers for samples More...

#include <ScanDir.h>

Collaboration diagram for SH::ScanDir:

Public Member Functions

 ScanDir ()
 standard constructor More...
 
const ScanDirscan (SampleHandler &sh, const std::string &dir) const
 scan the given directory and put the created samples into the sample handler More...
 
const ScanDirscanEOS (SampleHandler &sh, const std::string &eosDir) const
 scan the given directory in EOS and put the created samples into the sample handler More...
 
const ScanDirscan (SampleHandler &sh, DiskList &list) const
 scan the given directory and put the created samples into the sample handler More...
 
ScanDirsampleDepth (int val_sampleDepth)
 the index of the file hierarchy at which we gather the sample name. More...
 
ScanDirabsSampleDepth (int val_absSampleDepth)
 the index of the file hierarchy at which we gather the sample name. More...
 
ScanDirsampleName (const std::string &val_sampleName)
 a single sample name into which all found files should be placed. More...
 
ScanDirminDepth (std::size_t val_minDepth)
 the minimum depth for files to make it into the sample More...
 
ScanDirmaxDepth (std::size_t val_maxDepth)
 the maximum depth for files to make it into the sample More...
 
ScanDirfilePattern (const std::string &val_filePattern)
 the pattern for files to be accepted More...
 
ScanDirfileRegex (const std::string &val_fileRegex)
 the regular expression for files to be accepted More...
 
ScanDirdirectoryPattern (const std::string &val_directoryPattern)
 the pattern for directories to be visited More...
 
ScanDirdirectoryRegex (const std::string &val_directoryRegex)
 the regular expression for directories to be visited More...
 
ScanDirsamplePattern (const std::string &val_samplePattern)
 the pattern for samples to be accepted More...
 
ScanDirsamplePostfix (const std::string &val_samplePostfix)
 the pattern for the postfix to be stripped from the sampleName More...
 
ScanDirsampleRename (const std::string &pattern, const std::string &name)
 rename any sample matching pattern to name More...
 
ScanDirextraNameComponent (int val_relSampleDepth)
 attach an extra name component to the sample based on a second component of the path More...
 

Private Types

typedef std::vector< std::pair< boost::regex, std::string > >::const_iterator SampleRenameIter
 the list of entries from sampleRename More...
 

Private Member Functions

void recurse (std::map< std::string, SamplePtr > &samples, DiskList &list, const std::vector< std::string > &hierarchy) const
 perform the recursive scanning of the directory tree More...
 
void addSampleFile (std::map< std::string, SamplePtr > &samples, const std::vector< std::string > &hierarchy, const std::string &path) const
 add the given file to the sample based on the hierarchy, creating the sample if necessary More...
 
std::string findPathComponent (const std::vector< std::string > &hierarchy, int absSampleDepth, int relSampleDepth) const
 find the path component at the given depth More...
 

Private Attributes

int m_relSampleDepth
 if this is negative it is the depth at which we take the sample name, counting from the end More...
 
int m_absSampleDepth
 if m_relSampleDepth is not negative, it is the depth at which we take the sample name, counting from the first directory scanned More...
 
std::string m_sampleName
 the value set by sampleName More...
 
std::size_t m_minDepth
 the value set by minDepth More...
 
std::size_t m_maxDepth
 the value set by maxDepth More...
 
boost::regex m_filePattern
 the value set by filePattern, converted to a regular expression More...
 
boost::regex m_directoryPattern
 the value set by directoryPattern, converted to a regular expression More...
 
boost::regex m_samplePattern
 the value set by samplePattern, converted to a regular expression More...
 
boost::regex m_samplePostfix
 the value set by samplePostfix, converted to a regular expression More...
 
bool m_samplePostfixEmpty
 whether samplePostfix has been set to the empty string More...
 
std::vector< std::pair< boost::regex, std::string > > m_sampleRename
 
int m_extraNameComponent
 the depth set with extraNameComponent, or 0 otherwise More...
 

Detailed Description

the class used for scanning local directories and file servers for samples

Originally these was a series of stand-alone function calls, but people kept asking for more and more options, making it unwieldy to call and to maintain. Instead we now have a single class containing all the possible parameters, which makes it easier to configure and extend.

The member functions all return *this, so that usage like this is possible:

.filePattern ("*.root*")
.scan (sh, "/data");

Definition at line 37 of file ScanDir.h.

Member Typedef Documentation

◆ SampleRenameIter

typedef std::vector<std::pair<boost::regex,std::string> >::const_iterator SH::ScanDir::SampleRenameIter
private

the list of entries from sampleRename

Definition at line 210 of file ScanDir.h.

Constructor & Destructor Documentation

◆ ScanDir()

SH::ScanDir::ScanDir ( )

standard constructor

Guarantee
strong
Failures
out of memory I

Definition at line 32 of file ScanDir.cxx.

Member Function Documentation

◆ absSampleDepth()

ScanDir & SH::ScanDir::absSampleDepth ( int  val_absSampleDepth)

the index of the file hierarchy at which we gather the sample name.

this differs from sampleDepth in that negative numbers count up in the directory hierarchy from the top of where we scan, while sampleDepth starts counting from the back if the number is negative.

Definition at line 56 of file ScanDir.cxx.

58  {
59  m_relSampleDepth = 0;
60  m_absSampleDepth = val_absSampleDepth;
61  return *this;
62  }

◆ addSampleFile()

void SH::ScanDir::addSampleFile ( std::map< std::string, SamplePtr > &  samples,
const std::vector< std::string > &  hierarchy,
const std::string &  path 
) const
private

add the given file to the sample based on the hierarchy, creating the sample if necessary

Guarantee
basic
Failures
out of memory II

Definition at line 254 of file ScanDir.cxx.

258  {
259  std::string sampleName;
260 
261  if (!m_sampleName.empty())
262  {
264  } else
265  {
268  if (sampleName.empty())
269  return;
270 
272  {
273  bool done = false;
274  for (std::size_t iter = 0, end = sampleName.size();
275  iter != end && !done; ++ iter)
276  {
277  if (RCU::match_expr (m_samplePostfix, sampleName.substr (iter)))
278  {
279  if (iter == 0)
280  RCU_THROW_MSG ("sample name matches entire postfix pattern: \"" + sampleName + "\"");
281  sampleName.resize (iter);
282  done = true;
283  }
284  }
285  }
286 
287  if (m_extraNameComponent != 0)
288  {
289  std::string component = findPathComponent
291  if (component.empty())
292  return;
293  sampleName += "_" + component;
294  }
295 
297  return;
298 
299  {
300  bool done = false;
301  for (SampleRenameIter iter = m_sampleRename.begin(),
302  end = m_sampleRename.end(); !done && iter != end; ++ iter)
303  {
304  if (RCU::match_expr (iter->first, sampleName))
305  {
306  sampleName = iter->second;
307  done = true;
308  }
309  }
310  }
311  }
312 
314  = samples.find (sampleName);
315  if (iter == samples.end())
316  {
317  SamplePtr sample (new SampleLocal (sampleName));
318  samples[sampleName] = sample;
319  iter = samples.find (sampleName);
320  }
321  SampleLocal *sample = dynamic_cast<SampleLocal*>(iter->second.get());
322  RCU_ASSERT (sample != 0);
323  sample->add (path);
324  }

◆ directoryPattern()

ScanDir & SH::ScanDir::directoryPattern ( const std::string &  val_directoryPattern)

the pattern for directories to be visited

See also
directoryPatternRegex

Definition at line 111 of file ScanDir.cxx.

113  {
114  m_directoryPattern = RCU::glob_to_regexp (val_directoryPattern);
115  return *this;
116  }

◆ directoryRegex()

ScanDir & SH::ScanDir::directoryRegex ( const std::string &  val_directoryRegex)

the regular expression for directories to be visited

See also
directoryPattern

Definition at line 120 of file ScanDir.cxx.

122  {
123  m_directoryPattern = val_directoryRegex;
124  return *this;
125  }

◆ extraNameComponent()

ScanDir & SH::ScanDir::extraNameComponent ( int  val_relSampleDepth)

attach an extra name component to the sample based on a second component of the path

Precondition
val_relSampleDepth != 0

Definition at line 157 of file ScanDir.cxx.

159  {
160  RCU_REQUIRE (val_relSampleDepth != 0);
161  m_extraNameComponent = val_relSampleDepth;
162  return *this;
163  }

◆ filePattern()

ScanDir & SH::ScanDir::filePattern ( const std::string &  val_filePattern)

the pattern for files to be accepted

See also
filePatternRegex

Definition at line 93 of file ScanDir.cxx.

95  {
96  m_filePattern = RCU::glob_to_regexp (val_filePattern);
97  return *this;
98  }

◆ fileRegex()

ScanDir & SH::ScanDir::fileRegex ( const std::string &  val_fileRegex)

the regular expression for files to be accepted

See also
filePattern

Definition at line 102 of file ScanDir.cxx.

104  {
105  m_filePattern = val_fileRegex;
106  return *this;
107  }

◆ findPathComponent()

std::string SH::ScanDir::findPathComponent ( const std::vector< std::string > &  hierarchy,
int  absSampleDepth,
int  relSampleDepth 
) const
private

find the path component at the given depth

Returns
the path componenent, or NULL if it doesn't exist
Guarantee
strong
Failures
out of memory II

Definition at line 328 of file ScanDir.cxx.

332  {
333  std::string sampleName;
334 
335  int myindex = absSampleDepth+1;
336  if (relSampleDepth < 0)
337  myindex = relSampleDepth + hierarchy.size();
338  if (std::size_t (myindex) >= hierarchy.size())
339  return sampleName;
340  if (myindex > 0)
341  {
342  sampleName = hierarchy[myindex];
343  } else
344  {
345  sampleName = hierarchy[0];
346  while (sampleName.empty() ||
347  sampleName[sampleName.size()-1] == '/' ||
348  myindex < 0)
349  {
350  while (!sampleName.empty() && sampleName[sampleName.size()-1] == '/')
351  sampleName.pop_back();
352  if (sampleName.empty())
353  return sampleName;
354  if (myindex < 0)
355  {
356  std::string::size_type split = sampleName.rfind ('/');
357  if (split == std::string::npos)
358  {
359  sampleName.clear ();
360  return sampleName;
361  }
362  sampleName.resize (split);
363  ++ myindex;
364  }
365  if (sampleName.empty())
366  return sampleName;
367  }
368  std::string::size_type split = sampleName.rfind ('/');
369  if (split != std::string::npos)
370  sampleName = sampleName.substr (split + 1);
371  }
372  return sampleName;
373  }

◆ maxDepth()

ScanDir & SH::ScanDir::maxDepth ( std::size_t  val_maxDepth)

the maximum depth for files to make it into the sample

Definition at line 84 of file ScanDir.cxx.

86  {
87  m_maxDepth = val_maxDepth;
88  return *this;
89  }

◆ minDepth()

ScanDir & SH::ScanDir::minDepth ( std::size_t  val_minDepth)

the minimum depth for files to make it into the sample

Definition at line 75 of file ScanDir.cxx.

77  {
78  m_minDepth = val_minDepth;
79  return *this;
80  }

◆ recurse()

void SH::ScanDir::recurse ( std::map< std::string, SamplePtr > &  samples,
DiskList list,
const std::vector< std::string > &  hierarchy 
) const
private

perform the recursive scanning of the directory tree

Guarantee
basic
Failures
out of memory III
i/o errors

Definition at line 209 of file ScanDir.cxx.

212  {
213  using namespace msgScanDir;
214 
215  ANA_MSG_DEBUG ("scanning directory: " << list.dirname());
216  while (list.next())
217  {
218  std::unique_ptr<DiskList> sublist (list.openDir());
219 
220  if (sublist.get() != 0)
221  {
222  if (!RCU::match_expr (m_directoryPattern, list.fileName()))
223  {
224  ANA_MSG_DEBUG ("directory does not match pattern, skipping directory " << list.path());
225  } else if (hierarchy.size() > m_maxDepth)
226  {
227  ANA_MSG_DEBUG ("maxDepth exceeded, skipping directory " << list.path());
228  } else
229  {
230  ANA_MSG_DEBUG ("descending into directory " << list.path());
231  std::vector<std::string> subhierarchy = hierarchy;
232  subhierarchy.push_back (list.fileName());
233  recurse (samples, *sublist, subhierarchy);
234  }
235  } else
236  {
237  if (hierarchy.size() > m_minDepth &&
238  RCU::match_expr (m_filePattern, list.fileName()))
239  {
240  ANA_MSG_DEBUG ("adding file " << list.path());
241  std::vector<std::string> subhierarchy = hierarchy;
242  subhierarchy.push_back (list.fileName());
243  addSampleFile (samples, subhierarchy, list.path());
244  } else
245  {
246  ANA_MSG_DEBUG ("skipping file " << list.path());
247  }
248  }
249  }
250  }

◆ sampleDepth()

ScanDir & SH::ScanDir::sampleDepth ( int  val_sampleDepth)

the index of the file hierarchy at which we gather the sample name.

this is positive when it starts counting from the top, and negative when it starts from the back, i.e. -1 uses the file name, 0 denotes the directory inside the top level directory

Definition at line 46 of file ScanDir.cxx.

48  {
49  m_relSampleDepth = val_sampleDepth;
50  m_absSampleDepth = val_sampleDepth;
51  return *this;
52  }

◆ sampleName()

ScanDir & SH::ScanDir::sampleName ( const std::string &  val_sampleName)

a single sample name into which all found files should be placed.

if set, this overrides all other naming methods.

Definition at line 66 of file ScanDir.cxx.

68  {
69  m_sampleName = val_sampleName;
70  return *this;
71  }

◆ samplePattern()

ScanDir & SH::ScanDir::samplePattern ( const std::string &  val_samplePattern)

the pattern for samples to be accepted

Definition at line 129 of file ScanDir.cxx.

131  {
132  m_samplePattern = RCU::glob_to_regexp (val_samplePattern);
133  return *this;
134  }

◆ samplePostfix()

ScanDir & SH::ScanDir::samplePostfix ( const std::string &  val_samplePostfix)

the pattern for the postfix to be stripped from the sampleName

Definition at line 138 of file ScanDir.cxx.

140  {
141  m_samplePostfix = RCU::glob_to_regexp (val_samplePostfix);
142  m_samplePostfixEmpty = val_samplePostfix.empty();
143  return *this;
144  }

◆ sampleRename()

ScanDir & SH::ScanDir::sampleRename ( const std::string &  pattern,
const std::string &  name 
)

rename any sample matching pattern to name

Definition at line 148 of file ScanDir.cxx.

150  {
151  m_sampleRename.push_back (std::pair<boost::regex,std::string> (boost::regex (RCU::glob_to_regexp (pattern)), name));
152  return *this;
153  }

◆ scan() [1/2]

const ScanDir & SH::ScanDir::scan ( SampleHandler sh,
const std::string &  dir 
) const

scan the given directory and put the created samples into the sample handler

Returns
*this
Guarantee
basic
Failures
out of memory III
i/o errors
duplicate samples

Definition at line 167 of file ScanDir.cxx.

169  {
170  DiskListLocal list (dir);
171  scan (sh, list);
172  return *this;
173  }

◆ scan() [2/2]

const ScanDir & SH::ScanDir::scan ( SampleHandler sh,
DiskList list 
) const

scan the given directory and put the created samples into the sample handler

Returns
*this
Guarantee
basic
Failures
out of memory III
i/o errors
duplicate samples

Definition at line 187 of file ScanDir.cxx.

189  {
190  std::vector<std::string> hierarchy;
191  hierarchy.push_back (list.dirname());
192 
193  std::map<std::string,SamplePtr> samples;
194  typedef std::map<std::string,SamplePtr>::iterator samplesIter;
195  recurse (samples, list, hierarchy);
196  for (samplesIter sample = samples.begin(), end = samples.end();
197  sample != end; ++ sample)
198  {
199  if (sample->second.get() != 0)
200  {
201  sh.add (sample->second);
202  }
203  }
204  return *this;
205  }

◆ scanEOS()

const ScanDir & SH::ScanDir::scanEOS ( SampleHandler sh,
const std::string &  eosDir 
) const

scan the given directory in EOS and put the created samples into the sample handler

Returns
*this
Guarantee
basic
Failures
out of memory III
i/o errors
duplicate samples

Definition at line 177 of file ScanDir.cxx.

179  {
180  DiskListEOS list (eosDir);
181  scan (sh, list);
182  return *this;
183  }

Member Data Documentation

◆ m_absSampleDepth

int SH::ScanDir::m_absSampleDepth
private

if m_relSampleDepth is not negative, it is the depth at which we take the sample name, counting from the first directory scanned

Definition at line 169 of file ScanDir.h.

◆ m_directoryPattern

boost::regex SH::ScanDir::m_directoryPattern
private

the value set by directoryPattern, converted to a regular expression

Definition at line 191 of file ScanDir.h.

◆ m_extraNameComponent

int SH::ScanDir::m_extraNameComponent
private

the depth set with extraNameComponent, or 0 otherwise

Definition at line 215 of file ScanDir.h.

◆ m_filePattern

boost::regex SH::ScanDir::m_filePattern
private

the value set by filePattern, converted to a regular expression

Definition at line 186 of file ScanDir.h.

◆ m_maxDepth

std::size_t SH::ScanDir::m_maxDepth
private

the value set by maxDepth

Definition at line 181 of file ScanDir.h.

◆ m_minDepth

std::size_t SH::ScanDir::m_minDepth
private

the value set by minDepth

Definition at line 177 of file ScanDir.h.

◆ m_relSampleDepth

int SH::ScanDir::m_relSampleDepth
private

if this is negative it is the depth at which we take the sample name, counting from the end

Definition at line 163 of file ScanDir.h.

◆ m_sampleName

std::string SH::ScanDir::m_sampleName
private

the value set by sampleName

Definition at line 173 of file ScanDir.h.

◆ m_samplePattern

boost::regex SH::ScanDir::m_samplePattern
private

the value set by samplePattern, converted to a regular expression

Definition at line 196 of file ScanDir.h.

◆ m_samplePostfix

boost::regex SH::ScanDir::m_samplePostfix
private

the value set by samplePostfix, converted to a regular expression

Definition at line 201 of file ScanDir.h.

◆ m_samplePostfixEmpty

bool SH::ScanDir::m_samplePostfixEmpty
private

whether samplePostfix has been set to the empty string

Definition at line 206 of file ScanDir.h.

◆ m_sampleRename

std::vector<std::pair<boost::regex,std::string> > SH::ScanDir::m_sampleRename
private

Definition at line 211 of file ScanDir.h.


The documentation for this struct was generated from the following files:
xAOD::iterator
JetConstituentVector::iterator iterator
Definition: JetConstituentVector.cxx:68
mergePhysValFiles.pattern
pattern
Definition: DataQuality/DataQualityUtils/scripts/mergePhysValFiles.py:26
SH::ScanDir::m_samplePattern
boost::regex m_samplePattern
the value set by samplePattern, converted to a regular expression
Definition: ScanDir.h:196
SH::ScanDir::SampleRenameIter
std::vector< std::pair< boost::regex, std::string > >::const_iterator SampleRenameIter
the list of entries from sampleRename
Definition: ScanDir.h:210
athena.path
path
python interpreter configuration --------------------------------------—
Definition: athena.py:128
SH::ScanDir::m_filePattern
boost::regex m_filePattern
the value set by filePattern, converted to a regular expression
Definition: ScanDir.h:186
SH::ScanDir::m_samplePostfix
boost::regex m_samplePostfix
the value set by samplePostfix, converted to a regular expression
Definition: ScanDir.h:201
SH::ScanDir::m_minDepth
std::size_t m_minDepth
the value set by minDepth
Definition: ScanDir.h:177
RCU_REQUIRE
#define RCU_REQUIRE(x)
Definition: Assert.h:208
SH::ScanDir::m_samplePostfixEmpty
bool m_samplePostfixEmpty
whether samplePostfix has been set to the empty string
Definition: ScanDir.h:206
SH::ScanDir::m_directoryPattern
boost::regex m_directoryPattern
the value set by directoryPattern, converted to a regular expression
Definition: ScanDir.h:191
SH::ScanDir::findPathComponent
std::string findPathComponent(const std::vector< std::string > &hierarchy, int absSampleDepth, int relSampleDepth) const
find the path component at the given depth
Definition: ScanDir.cxx:329
SH::ScanDir::m_sampleName
std::string m_sampleName
the value set by sampleName
Definition: ScanDir.h:173
mergePhysValFiles.end
end
Definition: DataQuality/DataQualityUtils/scripts/mergePhysValFiles.py:93
PrepareReferenceFile.regex
regex
Definition: PrepareReferenceFile.py:43
FullCPAlgorithmsTest_eljob.sample
sample
Definition: FullCPAlgorithmsTest_eljob.py:113
SH::ScanDir::m_relSampleDepth
int m_relSampleDepth
if this is negative it is the depth at which we take the sample name, counting from the end
Definition: ScanDir.h:163
RCU::Shell
Definition: ShellExec.cxx:28
RCU::match_expr
bool match_expr(const boost::regex &expr, const std::string &str)
returns: whether we can match the entire string with the regular expression guarantee: strong failure...
Definition: StringUtil.cxx:40
histSizes.list
def list(name, path='/')
Definition: histSizes.py:38
SH::ScanDir::absSampleDepth
ScanDir & absSampleDepth(int val_absSampleDepth)
the index of the file hierarchy at which we gather the sample name.
Definition: ScanDir.cxx:57
beamspotman.dir
string dir
Definition: beamspotman.py:623
SH::ScanDir::m_sampleRename
std::vector< std::pair< boost::regex, std::string > > m_sampleRename
Definition: ScanDir.h:211
name
std::string name
Definition: Control/AthContainers/Root/debug.cxx:228
SH::ScanDir::m_absSampleDepth
int m_absSampleDepth
if m_relSampleDepth is not negative, it is the depth at which we take the sample name,...
Definition: ScanDir.h:169
SH::ScanDir::scan
const ScanDir & scan(SampleHandler &sh, const std::string &dir) const
scan the given directory and put the created samples into the sample handler
Definition: ScanDir.cxx:168
SH::ScanDir::m_maxDepth
std::size_t m_maxDepth
the value set by maxDepth
Definition: ScanDir.h:181
SH::ScanDir::sampleName
ScanDir & sampleName(const std::string &val_sampleName)
a single sample name into which all found files should be placed.
Definition: ScanDir.cxx:67
RCU_THROW_MSG
#define RCU_THROW_MSG(message)
Definition: PrintMsg.h:58
RCU_ASSERT
#define RCU_ASSERT(x)
Definition: Assert.h:222
RCU::glob_to_regexp
std::string glob_to_regexp(const std::string &glob)
returns: a string that is the regular expression equivalent of the given glob expression guarantee: s...
Definition: StringUtil.cxx:56
SH::ScanDir::m_extraNameComponent
int m_extraNameComponent
the depth set with extraNameComponent, or 0 otherwise
Definition: ScanDir.h:215
Trk::split
@ split
Definition: LayerMaterialProperties.h:38
SH::ScanDir::addSampleFile
void addSampleFile(std::map< std::string, SamplePtr > &samples, const std::vector< std::string > &hierarchy, const std::string &path) const
add the given file to the sample based on the hierarchy, creating the sample if necessary
Definition: ScanDir.cxx:255
SH::ScanDir::ScanDir
ScanDir()
standard constructor
Definition: ScanDir.cxx:33
SH::ScanDir::recurse
void recurse(std::map< std::string, SamplePtr > &samples, DiskList &list, const std::vector< std::string > &hierarchy) const
perform the recursive scanning of the directory tree
Definition: ScanDir.cxx:210
ANA_MSG_DEBUG
#define ANA_MSG_DEBUG(xmsg)
Macro printing debug messages.
Definition: Control/AthToolSupport/AsgMessaging/AsgMessaging/MessageCheck.h:288